As some of you know, I’m currently writing a book on DevOps. It’s been a good opportunity to practice my procrastination skillz, which were already Mr Miyagi level. Want to make progress on your Netflix queue instead of watching titles pile up on you? Start a book – you’ll find all kinds of time to knock off all those binge-worthy crappy TV programs!
The enjoyable part for me – which took up most of November/December of last year, and is still stretching on – was revisiting some of my favorite DevOps and Agile related books of all time. One of the giants I’ve really enjoyed reading is “Continuous Delivery“, published back in July 2010 by Jez Humble and David Farley. This is a powerful, very lengthy book and I put it up there with the best writings by the “Big 3” (Gene Kim, Martin Fowler, Gary Gruver). Here’s my notes and thoughts for you to enjoy.
First off, this is a massive work. They say 512 pages but trust me, this is a very meaty 512 pages and contains more content than you’ll find in just about any other 1000 page mammoth out there. It will take some work to get through this; the format is designed to be read in any particular order so you will have to endure some repetition from chapter to chapter.
That being said, these guys have “been there”. This isn’t another “How To Do Agile” book written by someone who’s never written a line of code in their life; Jez and David both have decades of real world experiences. I wish, desperately, I would have had the foresight to have read this when it first came out. It would have resolved so many problems for me I didn’t know existed in my delivery pipeline.
The authors clearly drive home the aim of the deployment pipeline:
- Every part of building, deploying, testing, and releasing software visible to everyone involved (increases collaboration)
- Improves and shortens the feedback cycle (problems are identified as early as possible
- Deploy and release any version of their software to any environment at will through a fully automated process
There’s a lot of meat here. In fact, this is likely enough to make a working definition of DevOps by itself. It’s unambiguous; if your artifacts aren’t automated, visible to all at any point in the process, and with a quick practiced release cycle your delivery pipeline needs some work. The book also brings out that repeatability and reliability derives from two principles: automate almost everything, and keep everything you need to build, deploy, test, and release your application in version control. They qualify “almost everything”: exploratory testing relies on experienced testers; demos to customers can’t be done by computers.
Still, the book makes it clear that you want to focus on outcomes, not purity (“doing DevOps”). The outcomes you want are:
- Reduced cycle time, delivering value faster to the business and increasing profitability
- Reduced defects – improve efficiency and spend less on support
- Increased predictability of SDLC to make planning more effective
- The ability to comply with regulations/
- Reduced costs due to better risk management and fewer issues associated with software delivery
Antipatterns to Avoid
Deploying software manually (shown by lengthy documentation, reliance on manual testing, frequent calls to dev team, corrections to release process during a release, diff environment configs, lengthy releases, risky releases)
“We have heard it said that a manual process is more auditable than an automated one. We are completely baffled by this statement.”.. “Performing manual deployments is boring and repetitive and yet needs a significant amount of expertise. Asking experts to do boring and repetitive, and yet technically demanding tasks is the most certain way of ensuring human error that we can think of, short of sleep deprivation or inebriation.”
The documentation and scripts make assumptions about the version or configuration… that are wrong, causing the deployment to fail. The deployment team has to guess about the intentions of the development team. … ad hoc calls, emails, quick fixes… a disciplined team will incorporate into deployment plan, but its rare for this process to be effective. .. common for new bugs to be found, but no time to fix with approaching deadline, and deferring the launch is unacceptable at late stage of project. Most critical bugs are hurriedly patched up, list of defects is stored by PM. … cost of coordination between silos (dev,DBA, Ops, testing) is enormous, stalling the release in tshooting hell. The remedy is to integrate test, deployment and release into the development process. Make them a normal and ongoing part of development. … little to no risk because you have rehearsed it in a a progressively more production-like sequence of test environments. Make sure everyone involved in process (build and release team, testers, devs) work together from the start of the project.
No more “install sql server” as a step. This is symptomatic of a bad relationship between devs and ops, certain that when it comes time for an actual deployment, the process will be painful and drawn out with lots of angry recriminations and short tempers. The first thing is to seek out ops people informally and involve them in the dev process. That way the ops team will have been involved with the software from the beginning, and both sides will have practiced what is going to happen many times before the release – which will be as smooth as a baby’s bottom.
A build and deployment expert is an antipattern – every member of the team should know how to deploy and maintain deployment scripts.
(antipattern – long-lived branches or defer acceptance testing until the end) – CI requires that every time someone commits any change, the entire app is built and a comprehensive set of automated tests run against it. Crucially, if the build or test process fails, the dev team stops whatever they are doing and fixes the problem immediately. The goal of CI is that the software is in a working state all the time.” Later in the book they say “you should be checking in your code several times a day.”)
Branch by feature is not recommended – branches must be short lived, likely less than a few days. Having many long lived branches is bad because of the problem of merging. (How does this reconcile with Git) – can lead to issue where testers are suddenly bombarded with bugs, see Martin Fowler writing on risks of branching by feature. (example, India team working in normal CI, one lucky guy had to handle merge issues each night with US side.
Antipattern: check in by devs stretches to days/weeks between. It is impossible to safely refactor an app unless everyone commits frequently to mainline, and merges are small and acceptable.
Antipattern: separate branch for new functionality and at some point merged to main. (with many devs absurdly complex integration issues, semantic conflicts, hard to refactor) – “a much better answer is to develop new features incrementally and to commit them to trunk in VC on a regular and frequent basis”. Fix concerns with commit test suite (<10 min, unit testing to catch any obvious regression errors against prod like env.); and introduce changes incrementally – checking in at minimum once a day, usually mult times a day. Need to be explicit with commit messages.
Two huge antipatterns – deploying from source control or recompiling binaries with each new environment. It’s essential to use the same process to deploy to every environment to remove deployment process as a potential defect. The environment you deploy to least frequently (prod) is most important.
Progressing In Maturity
I love the table below. It shows such a nice progression in maturity across 6 key areas. (This was figure 15-1 in the book btw)
Build Management and CI
Environments and Deployments
Release Management and Compliance
Level -1: Regressive: Processes unrepeatable, poorly controlled, reactive
Manual processes for building software.
Manual process for deploying software and provisioning environments.
Infrequent and unreliable releases
Manual testing after deployment
Data migrations unversioned and manual
Version control either not used or infrequent checkins
Level 0 – Repeatable: Process documented, partly automated
Regular automated build and testing. Any build can be recreated from source control.
Automated deployment to some environments. All configuration externalized/versioned. Creation of new environments is cheap.
Painful and infrequent, but reliable releases. Limited traceability from requirements to release.
Automated tests written as part of story development
Changes to database done with automated scripts versioned with application
Version control in use for everything required to recreate software
Level 1 – Consistent: Automated processes across lifecycle
Automated build and test cycle every time a change is committed. Managed dependencies.
Fully automated, self service push-button process for deploying software. Same process to deploy to every environment
Change management and approvals process defined and enforced; regulatory and compliance conditions met
Automated unit and acceptance tests. Testing part of development process.
Database changes performed automatically as part of deployment process
Libraries and dependencies managed. Version control usage policies determined by change management process
Level 2: Qualitatively managed: Process measured and controlled
Build metrics gathered, made visible, and acted on. Builds are not left broken.
Orchestrated deployments managed. Release and rollback processes tested.
Environment and application health monitored and proactively managed. Cycle time monitored.
Quality metrics and trends tracked. Non functional requirements defined and measured
Database upgrades and rollbacks tested with every deployment. Database performance monitored and optimized.
Developers check in to mainline once a day. Branching only used for releases.
Level 3: Optimizing: Focus on process improvement
Teams regularly meet to discuss integration problems and resolve them with automation, faster feedback, and better visibility
All environments managed effectively. Provisioning fully automated. (Docker here or virtualization?)
Ops and delivery teams regularly collaborate to manage risks and reduce cycle time
Production rollbacks rare. Defects found and fixed immediately.
Release to release feedback loop of database performance and deployment process.
Regular validatoion that CM policy supports effective collaboration, rapid development, and auditable change management processes
Why Automated Releases
It’s hard to argue with the value of automated releases – but it’s amazing how few production systems we’ve encountered are fully automated. One of the principles described in the book is to use the same script to deploy to every environment. … then the deploy to prod path will have been tested hundreds or even thousands of times before it is needed on release day. If any problems occur upon release, you can be certain they are problems with environment specific config, not your scripts…. If its not automated, its not repeatable, and every time it is done it will be different because of changes in software, config, environments, and the release process itself. Since its manual its error prone, and there is no way to ensure high quality because there’s no way to gain control over the release process. Releasing software too often is an art; it should be an engineering discipline.
Note, I agree with some of the statements below but I think the emphasis on scripted releases a little antiquated. The reasons they give are bogus IMHO – (can be audited, scripts are tidy and easy to understand, understanding and maintenance is easy). One of few variances I have, and – much like the approach in Art of Monitoring which was script based – easy to overlook because of the outstanding content.
This was a bold statement that I highlighted: We can honestly say we haven’t found a build or deployment process that couldn’t be automated with sufficient work and ingenuity. … it should be possible for a new team member to sit down at a new workstation, check out the project’s source code, and run a single command to build and deploy to any environment including local dev.
Why do they stress visibility and accessibility to everyone of a given release? “Most of the waste in software comes from the progress through testing and operations. It’s common to see build/ops teams waiting for documentation, testers waiting on “good” builds, dev teams receiving bug reports weeks after the team has moved on to new fx, discovering late in the game that the app’s architecture will not support nonfx requirements. Software is undeployable because its taken so long to get it into production, and buggy because the feedback cycle is so long. A release process where testers/ops can deploy builds to environments push-button, and devs know bugs early on, and managers can view cycle time/thruput/code quality – that transparency and visibility allows bottlenecks to be identified, optimized, and reviewed – both a faster and a safer delivery process.”
A great quote here: “The simple act of adding your configuration information to your version control system is an enormous step forward. At its simplest, the vC system will alert you to the fact that you’ve changed the config inadvertently. This eliminates at least one very common source of information.” (mentions a horror story – test env configured manually, not in VC as was dev version – so properties different, or missing, no two the same, all diff from production. Which properties were valid, or redundant, which should be unique? – had 5 people responsible for managing config.)
They also mention that it’s bad practice to inject config information at build or packaging time – anything that changes between deployments should be captured as config, not baked in when the app is compiled. Two principles hold true – keep binary files independent of all config info, keep all config info in one place. • Config should use clear naming conventions, config options in same repository as source code but with values elsewhere. Avoid overengineering, keep it as simple as possible.
It’s key to be able to repeatedly recreate every piece of infrastructure used by your application (OS, patches, OS configs, app stack, its config, infra config) “All artifacts relevant to your project and the relationships between them are stored, retrieved, uniquely identified, and modified.” This includes:
- App source code, build scripts, tests, doc, requirements, db scripts, libraries, config files
- Dev, testing, operations toolchains
- All environments used in dev, testing, prod
- Entire app stack – both binaries and config
- Config associated with every app in every environment it runs in
Some key questions:
- How do you represent your config information? How do your deployment scripts access it? How does it vary between environments, apps, and versions?
- How do we handle secrets? The authors recommend having a central service thru which every app can gets the config it needs. They recommend a façade class to access whether file system or DB/REST svc(!) They recommend Escape.
- How do we test configuration? (at very least ping all external svcs and make sure anything the app depends on is unavailable, then smoketests)
- Could you completely recreate your prod system (excluding data) from scratch with the VC assets you store?
- Could you regress to an earlier known good state of the app?
- Can you be sure that each deployed environment is set up in exactly the same way?
How Continuous Is Continuous Deployment?
This means release to production (Timothy Fitz) – The intuitive, immediate response is, “this is too risky!” After all, in order for this to work, your automated tests need to be fantastic, covering the entire app. You have to write all your tests first, so only when a story is complete will checkins pass the acceptance tests. Aaron puts this differently when he says “You can’t cheat shipping” – regular releases to production can be combined with canary releases to roll out to a small group of users first and then rolling out to other users (manually) that there’s no problems.
To counter the “too risky” reaction: More frequent releases lead to lower risk in putting out any particular release. If you release every change, the amount of risk is limited to just the risk in that one change. So CD is a great way to reduce the risk of releases. It also forces you to do the right thing. You can’t do it without automation throughout build, deploy, test, and release. You can’t do it without comprehensive, reliable set of automated test. Can’t do it without writing system tests against a prod like environment. Even if you can’t actually release every set of changes that passes all your tests, you should aim to create a process that would let you if you chose to.
One use case mentioned – “the ops team was strongly pushing back on the schedule. After the meeting the techies hung around afterwards and exchanged phone #’s. Over a few weeks they kept talking, the system was deployed to prod server, a small group of users given access a month later. A member of the deployment team came and worked with dev team to create deployment scripts and writing the installation documentation – so no surprises. In ops team meetings where many systems discussed and scheduled, that team hardly discussed since ops was confident they could deploy it and of the quality of software.”
The most crucial part of release planning is assembling representatives from every part of your org involved in delivery: build, infra, ops teams; dev teams,; testers; dba’s, support personnel. They should continue to meet throughout the life of the projct and continually work to make the delivery process more efficient.
“In our experience, a major contributor to cycle time is people .. waiting to get a “good build” of the application.” Problem is removed with deployment pipeline where everyone can see the builds as they are deployed and be able to perform a build themselves, push-button. Benefits – testers can select older versions in their repros to verify change in behavior over newer version, support staff (repros), operations (DR recovery exercise)
The cardinal sin is checking in on a broken build. If a build breaks, the devs responsible need to identify the cause of the breakage as soon as possible fix it. (the corollary – “never go home on a broken build” – not staying late to fix the build after working hours, but check in regularly and early enough to deal with problems as they occur. “Many experienced developers make a point of not checking in less than an hour before the end of work, and do it first thing the next morning.”
Some simple rules to keep in mind:
- If you can’t fix it quickly, you should revert to previous version. A team rule – if a build can’t e fixed within 10 minutes, revert.
- Another rule – don’t comment out failing tests.
- If you commit a change and all your tests pass, but others break, it is your responsibility to fix all tests not passing as a result of your changes.
In software, when something is painful, the way to reduce the pain is to do it more frequently, not less. “Bring the pain forward” is a mantra of the book. (this made me smile, and reminded me of my good friend Donovan Brown who’s said this often.)
- If integration is painful, do it every time someone checks in from the get-go.
- If Testing is painful – do it from the beginning of project.
- If release is painful, release every time someone checks in achange that passes all tests. If not to real users, maybe a subset, or to a production-like environment. Gradually improve release time until you can hit target (like internal release every 4 weeks for example)
- If documentation is painful, do it as you roll out new features, and make it part of your definition of done.
You should be able to answer “yes” to –
- Can I exactly reproduce any of my env, including version of OS, patches, network config, software stack, apps deployed into it and their config?
- Can I easily make an incremental change too any of these individual items and be able to deploy that change to any env?
- Can I easily see all changes to an env and trace a particular change back to what exactly the change was, who made it, and when?
- Can I satisfy all compliance regs?
- Can everyone on the team get the info they need and make changes they need to make?
And this was another statement that was very emphatic: “If you don’t have every source artifact of your project in version control, you won’t enjoy any of the benefits that we discuss in this book.” Everything including CI, auto testing, push button deployments, depends on this. The book called out three components – automated builds, version control, and – vital – the agreement of team (check in small incremental changes frequently to mainline, the highest priority is fixing any defects that break the app).
You can also fail the build for warnings and code style breaches. “Code Nazi” indeed! It is effective in enforcing good practices. “We removed Checkstyle (with our distributed team) – after a few weeks we started to find more “smells” in the code and doing tidy-up refactorings. In the end we realized that although it came at a cost, CheckStyle helped us to stay on top of the almost inconsequential things that together adds up to the difference between high-quality code and just code.
Four strategies to keep app releasable: hide new fx until finished; make all changes incrementally in small releases; use branch by abstraction for large scale changes; use components to decouple parts of app that change at diff rates.
- Like hiding until ready to release – means you are always integrating and testing entire system even if feature flag turned off.
- Often tempting to branch source control and make changes to the branch. In practice, wiring everything up ends up being the hard part when time to merge. “The bigger the apparent reason to branch, the more you shouldn’t branch.”
- Like component – this is the strangler fig, changing big ball of mud to modular, better structured code. Take part of codebase out as a component and rewrite. You can localize the mess and use branch by abstraction to keep the app running with the old code while you create a new modularized version of the same functionality. (also called “Potemkin village”)
Conways Law states that orgs that design systems are constrained to produce designs which are copies of the communication structures of those organizations.” So open source projects where the devs communicate only by email tend to be very modular with few interfaces. A product developed by a small, collocated team will tend to be tightly coupled and not modular. Be careful how you set up your dev team as this will affect the architecture of your system. (I’ve heard this put as “sooner or later, companies will ship their org structure as a product.”)
It is vital to version dependencies including libraries and components – otherwise you can’t reproduce your builds. When there’s a break, you wont’ be able to find the change the broke it or find the last “good” version in your library. And its best to trigger a new build whenever there is any change to upstream dependencies. Most teams update their dependencies when code is stable – paying a higher cost later at integration time. Unitil your app grows sufficiently large, There is no need to build your components individually – the simplest thing is to have a single pipeline that builds your whole app once as the first stage.
Developers almost always branch on mainline. This ensures that all code is continuously integrated, avoiding merge hell at end of project. How do you manage lg teams of devs working on multiple releases? Good componentization of software, incremental development, feature hiding. More care required here in architecture and dev. Merging branches twds release time is always a complex process that takes an unpredictable amount of time. Each new merge breaks different pieces of existing fx and is followed by stabilization process as people work away on fixing issues in mainline. “Creating long lived branches is fundamentally opposed to a successful continuous integration strategy. Our proposal is not a tech solution but a practice – always commit to trunk, and do it at least once a day. If this seems incompatible with making far-reaching change, perhaps you haven’t tried hard enough. In our experience, altho it sometimes takes longer to implement a feature as a series of small incremental steps that keeps the code in working state, the benefits are immense. Having code that is always working is fundamental – we can’t emphasize enough how important this practice is in enabling continuous delivery of valuable, working software. There are times where this approach won’t work, but they really are very rare…” (see branch hell described in Figure 14.6 – branch in mainline, two teams. A small team had to be dedicated to handle merges!)
The one situation where it might be acceptable is before a release. Creating a branch for release replaces the evil practice of the code freeze, where checking in to source control is switched off for days or even weeks. With a release branch, devs can keep checking into mainline, while changes to the release branch is made for critical bugfixes only (figure 14.2). In this case fixes for critical defects are committed on branches and merged into mainline immediately. T
Dashboarding and Transparency
Being able to react to feedback also means broadcasting information. Big, visible dashboards and other notification mechanisms. Dashboards should be ubiquitous, and at least one should be present in each team room.
Customers can overreact when they see red x’s on buildmonitor. Have to explain – every time a build fails it indicates a problem has been found that otherwise may make it into prod.
Come up with a list of risks, categorized by probability and impact. Could include generic risks (running out of disk space, unauthorized access) and specific risks (transactions not complete) – then work out what to monitor and how to display. Green/yellow/red states;
Hypothesis Driven Development
- The decisionmaker/customer makes guesses about which features and bugfixes will be useful to users. However, until they are in the hands of users who vote by choosing to use the software, they remain hypotheses. It is vital to minimize cycle time so that an effective feedback loop can be established.
Feedback useful criteria:
- Any change needs to trigger feedback process
- Feedback must be delivered as soon as possible
- Delivery teams must receive feedback and act on it
Infrastructure as Code
Manually configured environments are the worst possible decision and the most common. (config info very lg in size, once broken finding it takes a long time with sr personnel, difficult to reproduce a manually config env, and maintain as nodes drift apart). Change from “works of art” (Visible Ops Handbook) to mass-produced objects whose creation is repeatable and predictable. It should always be cheaper to create a new environment than to patch/repair an old one.
The book recommends a holistic approach to managing infrastructure:
- Desired state of your infra should be specified thru version-controlled config
- Infra should be automatic
- You should always know the actual state of your infra thru instrumentation and monitoring
“While in general we are not a fan of locking things down and establishing approval processes, when it comes to your production infra it is essential. And since you should treat your testing env the same way you treat prod – this impacts both. Otherwise it is just too tempting, when things go wrong, to log onto environment and poke around to resolve problems. (this usually leads to svc disruptions with rboots/svc pack at random), and there’s no relaiable record of what was done and when – so you can’t reproduce the cause of the problem you’re creating. “Stabilizing the patient” – without turning off access, ops staff spends all their time firefighting because unplanned changes break things all the time. A good way to set expectations of when work will be done and enforce access control is to create a maintenance window. ”
The best way to enforce auditability is to have all changes made by automatded scripts which can be referenced later (we favor automation over documentation for this reason) Written documentation is never a guarantee that the documented change was performed correctly. … Provisioning new servers is manual, repetitive, resource-intensive, and error-prone process – exactly the kind of problem that can be solved with automation.
The goal of config management process is to ensure its declarative and idempotent – which means you onfigure the desired state of your infra and a system ensures that this config is applied. (means automating applying OS service packs, upgrades, installing new software, changing settings, or performing deployments)
“In our view, no technology can be considered genuinely enterprise-ready unless it is capable of being deployed and configured in an automated way. If you can’t keep vital config info in versioned storage and thus manage changes in a controlled manner, the technology will become an obstacle to delivering high-quality results…”
There is no fundamental reason why cloud based services are less secure than publicly accessible services hosted on infra you own. Compliance is also often mentioned – yet usually the problem isn’t that regs forbid the use of cloud computing as much as they haven’t caught up with the cloud yet. Gien careful planning and risk management its usually possible to reconcile the two, even with health care/banking type issues (esp using data encryption). Vendor lockin is another fear.
It is extremely common for problems with infra services – such as routers, DNS, and directory services – to break software in prod environment that worked perfectly all the way thru the deployment pipeline. (Nygard – InfoQ – a system that dies mysteriously at the same time every day). How to address this?
- Put every part of networking infra config into source control
- Install a good network monitoring system – know when network connectivity is broken
- Logging. Your app should log at a WARNING level every time a connection times out or is closed.
- Smoke test post deployment connectivity.
- Testing envirobnment network topology as similar as possible . This is what staging is for!
The goals of lean manufacturing are to ensure the rapid delivery of high quality products – focusing on the removal of waste and reduction of cost. It’s resulted in huge cost and resource savings, higher-quality products, and faster time to market in several industries. (i.e. DevOps is not small scale, as it comes from Lean which in turn was born in large enterprises)
Build quality in was a mantra of W. Edwards Deming – whose idea was, if you catch defects earlier, they’re cheaper to fix. The cheapest place to fix bugs in code is before they’re ever checked into source control. Auto testing, CI/CD are designed to catch bugs early. It does no good if they’re not fixed, which will require discipline. Lean tells us – testing is not a phase, and it is not the exclusive domain of testers.
Deming Cycle – Plan, Do, Study, Act.
Use the theory of Constraints:
- Identify the part of the build, test, deploy, release process that’s the bottleneck. (say, manual tests)
- Exploit the constraint. Maximize the thruput of that part of the process. (buffer of stories waiting to be manually tested, resources used in manual testing are not distracted)
- Subordinate all other processes to this constraint (have your devs work just hard enough to keep backlog constant and rest of time writing automated tests to catch bugs so that less time is spent manually testing)
- Elevate the constraint (if cycle time too long invest more effort in auto testers or hire more testing)
- Rinse and repeat.
What you choose to measure will have an enormous influence on the behavior of your team (Hawthorne effect) – measure lines of code, and devs will write many short lines of code. Measure # of defects fixed, and testers will log bugs that could be fixed easily. According to Lean – its essential to optimize globally, not locally. If you spend a lot of time removing a bottleneck that is not the true constraint, you make no difference to the delivery process. Its important to have a global metric that can be used to determine if the delivery process as a whole has a problem.
Besides cycle time below:
- Automated test coverage
- Codebase properties (duplication, cyclomatic complexity, coupling, style problems, etc
- # of defects
- Team Velocity
- # of commits each day
- # of builds per day
- # of build failures per day
- Duration of build incl automated tests
There is no one size fits all solution to the complex problem of implementing a deployment pipeline. The crucial point is to create a system of record that manages each change from checkin to release, providing the info you need to discover problems as early as possible in the process. Then drive out inefficiencies to make the feedback cycle faster and more powerful, by adding better config mgmt., or refining auto acceptance tests and parallelizing, etc. Requires discipline – only changes that have passed thru system gets released.
Mary Poppendieck – “How long would it take your org to deploy a change that involves a single line of code? Can this be done on a repeatable reliable basis?” Hard to measure as it covers analysis/dev/release, but it tells you more about your process than any other metric. Focusingo n the reduction of cycle time encourages practices that increase quality, such as the use of automated tests.
How Much do I Need to Change The Org? … and working with Operations/IT as partners
It is essential that everybody involved in the process of delivering software is involved in the feedback process. Devs, testers, operations staff, DBA’s, infra specialists, managers. (Jez recommends cross functional teams, but if not at least work together daily)
Ideally everyone within an organization is aligned with its goals, and people work together to help meet them… we succeed or fail as a team, not individuals. However, in too many projects – dev huck work over wall to testers, who in turn throw work over the wall to ops at release time. When something goes wrong, we spend more time blaming each other than we do fixing defects from this overly siloed approach. .. .if siloed start by getting everyone involved in the release process together and ensure they have an opportunity to collaborate on a frequent basis. Dashboard and everyone can see health of application, builds, state of environments.
Almost all med and large companies separate the activities of dev and infra management (or ops) into separate siloes. Each has its own lines of reporting, a head of ops and a head of software dev. Every time a prod release occurs, these teams work to ensure that any problems that arise are not their fault. Each group wants to minimize deployment risk, but each has their own approach – causing tension. Ops teams measure their effectiveness in terms of MTBF and MTTR for example. Ops teams have SLA’s they need to meet, and likely reg requirements. Any change, including a process change, represents a risk. Ops mrs need to ensure that any change to any environment they control is documented and audited (also Sarbanes-Oxley). Deploying a new version of your app may require CAB meeting. Include details on the risk, impact of change, how to remediate if it fails. The request should be submitted before work starts on new version to be deployed – not a couple hours before go-live. Devs should familiarize themselves with the ops system/processes and comply, make it a part of release plan.
Abnormal conditions – ops managers want to be alerted when these occur so they can minimize downtime. How does the ops team want to monitor your app, make it part of the release plan. Where are they expecting logs to be, how will the app notify ops staff of malfunctions. Treat these as requirements, and add them as stories. Consider ops personnel here to be an important constituency of users. They may also have reqts for service continuity plan. Each service the ops team manages will have a recovery point objective (a length of time prior to a disaster for which data loss is acceptable) and a RTO – recovery time objective – max length of time before svcs restored. This governs backup/restore strategy. Again, good testing of backup/recovery
Your Risk Management strategy might answer the following questions:
- How are you tracking progress?
- How are you preventing defects?
- How are you discovering defects?
- How are you tracking defects?
- How do you know when a story is finished?
- How are you managing your environments?
- How are you managing configuration?
- How often do you showcase working features?
- How often do you do retrospectives?
- How often do you run your automated tests?
- How are you deploying your software?
- How are you building your software?
- How are you ensuring that your release plan is workable and acceptable to the ops team?
- How are you ensuring that your risk and issue log is up to date?
Project Management and Strategy
The most important part of project is inception.
- List of stakeholders and business sponsors. Shouould only be one business owner. Internal stakeholders include ops, sales, marketing, support, dev, testing teams.
- Establish business case and value of project, a list of high level functional and nonfunctional requirements – just enough detail to estimate work involved. Includes capacity, availability, security
- SDLC, Testing and release strategy
- Architectural evaluation – decision on platform/frameworks
- Risk and issue log
Then comes initiation.
- Setup of team room and hardware/software. Whiteboard, paper and pens, printer, internet. Food and drinks
- Version control, setup of CI/CD with hello world sample to dEV->TEST.
- Agreed upon responsibilities, working hours, meeting times (standups, planning sessions, showcases)
- Simple test environments/data
- Starting work on backlog
Failure points of scrum:
- Lack of commitment – relies on transparency, collaboration, discipline – no more hidden flaws.
- Ignoring good engineering – can’t ignore TDD or enforcement of good practices and coding.
Scrumterfall – adapting agile out the gate to fit your org. First follow process as written, then start adapting.
- If its 1-3 week sprints
- Tested and working at end of sprint
- Product owner identified
- Backlog prioritized by business value, estimates in story points, tasks by hours
Form a release working group – across all siloes – tasked with coming up with a release strategy and keep the process working.
- Note authors do not feel a CAB meeting is an antipattern – esp if issues with uncontrolled changes wreaking havoc. CAB team formed with reps from dev, ops, security, CM team, and business
- Decide which environments should be locked down and enforce
- Automated change request management system, and designate owner(s) for each controlled environments
- Push button deployments when approved
- Meet regularly – run a retrospective, and Deming cycle – plan, do, check, act.
- Ensure releases are happening as often as possible to production (or a production like environment). “If it hurts, do it more frequently”
- Big monitors and dashboarding
- The important thing is to evaluate the risk of the change – against the benefit. If the risks outweigh the benefits, the change should not be made – or a less risky option found. (Track this too – how long does it take to have a change be approved? What % are denied? How many changes are awaiting approval?) And validate – what’s the MTBF/MTTR? Cycle time? And hold retrospectives and invite feedback to improve.
- On the above – if you are (as I was) saddled with vast amounts of legacy code to support – remember to get “Working Effectively with Legacy Code” by Michael Feathers and the classic books by Poppendieck.
Value stream mapping
- Poppendieck Lean Software development – recommends going to source, where a customer requests comes in. Your goal is to draw a chart of the average customer request, from arrival to completion. Working with people involved in each acivity, sketch these out and the avg time for each step. This exposes the amt of time work is in waiting state and non value adding activities.
- Should trake half an hour – from checkin to release. Best guesses for time. Maybe look at similar system. A commit stage to build your app and run basic metrics and unit tests, a stage to run acceptance tests, stage to deploy app to a production like environment so you can demo it. This should be part of iteration 0.
While releases can be exhilarating, they can also be exhausting and depressing. Almost every release involves last minute changes, such as fixing the database login details or updating the URL for an external service. There should be a way of introducing such changes that they are both recorded and tested – in VC and then promoted to prod. Imagine if you could do a production deployment with a push of a button 0- and worst that would happen, you could back out the same release in minutes or seconds. Delta is small, so the risk is minimized. No more “I am risking my career” heebie-jeebies!
Two fears – introducing a problem because of a hard-to-detect mistake; 2nd is that because of a problem in the release process you are committed, forcing a clever solution under a severe deadline. Both are fixed by rehearsing the release many times a day, 2nd with a backout strategy. Typically best is to keep previous version of app available to deploy; the biggest obstacle is rolling back data. (only make additive not destructive changes) This way, by the time a release is available to deploy to prod, we know:
- Code compiles
- Passes unit tests – so it works like our devs think it should
- Passes acceptance tests (does what users think it should)
- Configuration of environments is fine (tested in prod-like env)
- All components in place and deployment system is ok
- Version control is working
Traditional approaches … delay the nomination of a release candidate until several lengthy and expensive steps have been taken to ensure quality/fx. In an environment where build/deployment automation is aggressively pursued along with automated testing, there’s need for this (esp at end of project, see Lean). … Delaying testing until after the dev process is, in our experience, a sure-fire way to decrease the quality of your release. Defects are always more expensive to fit later in the process (devs have forgotten, fx changed, and there’s no time to fix bugs late in the game– must be added to a list)
Team definition of done – “Done” means “Released”. No “80% done” – it’s either complete, or not. As its not always possible to release to prod at end of every sprint – “done” could mean “demo’d and tried by rep of user community in a production-like environment”
If any part of the pipeline fails, stop the line. The most important step in achieving rapid repeatable, reliable releases is for your team to accept that every time they check code into vc, it will successfully build and pass every test. The whole team owns a deployment failure – they should stop and fix it before doing anything else.
The release plan contains steps to deploy app, how to smoketest, backup strategy, logs and methods of monitoring, issue log. I’m questioning this one. This is to get the first release going smoothly. The release strategy needs to be documented and kept up to date: (this is a source of both functional and nonfx reqts)
- Parties in charge of deployments and release, masters of each env
- Asset and config mgmt strategy
- Technology used for deployment
- Implementing the pipeline
- Environments to be used for acceptance, capacity, integration and user acceptance testing – and the process which builds moved thru these environments
- Process for deploying into testing/prod environments
- Monitoring requirements and services /APIs the app should use to notify operations team of its state.
- Config mgmt.
- External systems integration points – at what stage and how tested, and how do ops personnel communicate with COTS provider if there’s a problem
- Disaster recovery plan
- SLA for the software – failover, ha, etc
- Production sizing and capacity planning – data, log files, and width and disk space, latency for clients
- Archiving strategy, auditing reqts
- How fixing defects and applying patches
- Upgrades to production environment
- How App support will be handling
With any rollback plan the big constraint is data, and the systems you are tied down to (orchestrated releases). First ensure that state of prod system including db is backed up. 2nd is to practice the rollback plkan, including restoring from backup or migrating the db back. Best plan here is to roll back by deploying the previous good version – including recreating the environment from scratch. (this is cleanest but will lead to a downtime) – you can also do deployment slots for zero-downtime releases. (basically this is the same as a blue-green deployment – two identical environments, each replicas. Run smoke tests against blue env, and when ready – change router config to point to blue env . (put db into read-only mode at beginning)
Canary releases – roll out a subset to prod servers. You can do smoke tests, capacity, and start routing selected users to new version. Rollbacks are easy, some companies measure usage of new features and kill if not being used. Great, low risk way of testing capacity by gradually routing more an dmore users to app and measureing response time and PU usage, I/O, memory, and log files. (esp if you don’t have $ for a realistic prod like env) – harder to sue if installed as fat client on customer servers. (see grid computing – enable app to auto update it to a known good version hosted on your servers.)
Emergency fixes – a critical defect and has to be fixed ASAP. Most important thing to bear in mind is – do not, under any circumstances, subvert your process. Emergency fixes must go thru the same build, deploy, test and release process as any other change. (otherwise env in unknown state that makes it impossible to reproduce and breaks other deployments in unmanageable ways). One more reason to keep cycle time low. Always evaluate – how many people the defect affects, how often it occurs, how severe it is in impact to end users. Never do them late at night, always pair with someone else. Make sure you’ve tested your emg fix process. Only under extreme circumstances circumvent std release process to do a fix. Make sure you have tsted making an emg fix in staging. Sometimes better to roll back vs deploying a fix.
Pieces of a functional feedback system/testing:
- Source code compiles and is valid
- Unit tests (behaves as expected) and Test coverage – runs very fast, tests the behavior of small pieces of app in isolation.
- Functional acceptance tests (delivers business value expected) – should run against whole app in a prod-like environment. Long running, >1 day sometimes. Group into functional areas so you can run tests against a particular aspect/behavior.
- Nonfunctional tests (capacity, availability, security)
- Exploratory testing (manual, smoketesting)
This echoes W. Edward Deming’s 14 points – “cease dependence on mass inspection to achieve quality. Improve the process and build quality into the product in the first place.” Most companies though rely on manual acceptance testing – auto tests are poorly maintained/out of date and are supplemented with manual practices. Good testing though gives safety/confidence software is working as it should; constraint on dev process by encouraging good dev processes.
A common practice in many orgs is to have a separate team dedicate to the production and maintenance of the test suite. Devs then feel they don’t own the acceptance tests, so they don’t pay attention to failure at this late stage, so it stays broken for long periods of time. Acceptance tests written without dev involvement tend to be tightly coupled to the UI and thus brittle and badly factored.
The most common obstacles are lack of testing licenses and an app architecture that prevents the system being run on a dev environment 9so devs own the acceptance testing layer).
It is important to note that acceptance tests are expensive to create and maintain. They are also regression tests. Don’t follow a naive process of taking your acceptance criteria and automating every one.
Another use case – “by not replicating the production environment for capacity testing was a false economy, because we were building a high performant system and the problems we found exhibited at loads we couldn’t apply in our lower spec environments. These problems were expensive to find and fix.”
Ideally production environment to run your manual and automated tests on. And an automated script that does a smoke test to make sure its up and running.
Mike Cohn – Unit > Service > UI. (test automation pyramid) – unit tests form vast majority. Fewer acceptance tests (divided into service and UI tests) these will typically take far longer to execute.
For the purpose of commit tests, don’t test via the UI at all. UI testing involves a lot of components or levels of software – time consuming. Work at human timescales, again desperately slow. Dependency injection or inversion of control is a useful design pattern to create testable units of code.
Commit stage will be exercised several times a day by each dev; if over 5 minutes and complaints will start. 10 minutes is the max. Do everything possible to keep this stage fast while not losing key value; fast feedback on errors too costly to fix later. This is perhaps the biggest bang for your buck – knowing the exact moment when a change is introduced.
Use case – trading system interacting with another system owned by another dev team via a message queue. Lots of interaction and external system meant we didn’t own the full lifecylde – hard to have meaningful end to end acceptance tests. We implemented a reasonably complex stub that simulated operation of the live system. Allowed us to plug the gap in the lifecycle of our system; instead of having to maintain a complex network of distributed systems; we could choosed when to interact with the real thing and when to deal with the simpler stub. This was deployed by environment thru configuration. We tend to use stubbing widely for large scale components an subsystems; mocking for components at code level. Allowed us to simulate difficult edge cases that would have been hard to set up on real systems, broke dependency on parallel dev team.
“We think TDD is essential to enable the practice of continuous delivery. See books Growing Object-Oriented Software, and xUnit Test Patterns.” (this last one I need to read – defines dif between dummy objects, fake obj, stubs, spies, and mocks)
Acceptance tests should be written, and ideally automated, before development starts on a story. Its critical in an agile environment because it answers the question “How do I know when I am done?” for devs, “Did I get what I wanted?” for users. Tools – Concordion, Cucumber, JBehave, Twist – separate test scripts (for users to write) from implementation (devs/testers write code behind the scenes)
Powerful regression test suite esp for lg teams, free up testers, feedback loop is tighter, can autogenerate requirements defn (Cucumber, Twist)
These can be brittle, expensive to maintain if not using good tools/practices. “A good question to ask yourself every now and again is, ‘how often do my acceptance tests break due to real bugs, and how often due to changes in requirements?'” Happy path should be first target for automation, followed by alternate happy path (if stable) or sad path (bugs)
Why aren’t unit tests the same as acceptance tests? Acceptance tests are business, facing, not dev facing, and test whole stories at a time in a prod like environment.
A common complaint too expensive to create and maintain. The cost of this is much lower in our experience than performing frequent manual acceptance and regression testing or releasing poor quality software. They catch serious problems that unit or component tests can never catch. Manual testing usually happens at a late date where teams under extreme pressure to get software out the door. There’s no time to fix these bugs – they’re added to a list. Where defects are found that require complex fixes, odds of integration/regression problems rise.
Some in Agile community say do away almost entirely with acceptance testing, write unit + component tests combined with pair programming, refactoring, analysis/exploratory testing by customers, analysts, testers working together. Jez doesn’t like this – unit and component tests do not test user scenarios. Acceptance tests great at catching threading probllems, architectural mistakes or environmental/config issues. Hard to discover thru manual testing and impossible in unit/component testing. Better protection when making large scale changes to it. And it puts too high of a burden on testers who must do boring, repetitive tasks. Devs are not as good as testers in finding issues in their own work. Its much better to have testers with devs finding defects.
The cost of maintaining complex acceptance tests is a tax, an investment which is repaid many times over in reduced maintenance costs, protection that allows you to make wide ranging changes to the app, and significantly higher quality – “bringing the pain forward”. Without excellent auto test coverage, one of 3 things happens: a lot of time is spent trying to find and fix bugs at the end of the process, you spend time/$ on manual and regression testing, or you release poor quality software.
Manual testing in the software industry is the norm and represents often the only type of testing done by a team. This is both expensive and rarely good enough on its own to ensure high quality. Use manual testing only for exploratory testing, usability testing, showcasing, user acceptance testing.
Proper way to write:
- Crucial that your test implementations use domain language and do not contain detail on how to interact with the app. (UI changes are brittle) The behavior that any given acceptance test is intended to assert is – “If I place an order, is it accepted?” “If I exceed my credit limit, am I informed?”
- Who owns them – not a testing team only. At ethe end of chain of development, so most of our acceptance tests were failing most of their lives. Test team would find out about changes late in process, after developed and checked in. Since the testing team had so many automated tests to repair, it would take some time to fix most recent breakages, so the dev team had moved on to other tasks. As the test team became snow under with new tests to write and older tests to refactor and fix, they fell further behind. We wanted to improve the time here; we made the whole delivery team (devs and testers) responsible for automated acceptance tests. Focused the devs on acceptance criteria, more aware of impact of their changes; better at predicting when their work would cause problems. Can be done thru build masters (tracking own the guilty), or standing up and shouting “who is fixing the build?” – lava lamps or large build monitor also helps.
- Don’t test against GUI – an app written with testability in mind will have an API that both the GUI And the test harness can talk to to drive the application. Running tests against the business layer directly is a reasonable strategy. (This requires discipline for frontend team to keep presentation focused and not straying into realm of business or app logic)
- Typically acceptance testing takes hours to complete vs a few minutes. You could refactor by looking for quick wins – spending time refactoring slowest tests. Test against a public API vs a UI. Parallizing acceptance testing with each test client running its own Selenium instance for example. One company separated out API testing vs UI based testing, for quicker failure detection. Next text was to divide into batches, run alphabetically, ran in parallel.
The role of the BA or analyst/tester
- The role of the business analyst primarily is to represent the customers or users of the system. They work with the customer to identify and prioritize requirements. They work with devs to ensure they have a good understanding of the sapp, and guide them to ensure the app meets business value. Work with testers to ensure acceptance tests are specified properly. Encouraging analysts and testers to collaborate and define acceptance criteria early on is vital. Analyst gains because tester provides experience of what can be measured to define when a story is done; tester benefits by gaining nature of requirements before diving head first into testing. Once acceptance criteria has been defined, and before the requirements are implemented, the analyst and tester sits with the devs along with the customer if available. The analyst describes the requirement and the business context, goes thru acceptance criteria. The tester then works with the devs to agree on a collection of auto acceptance tests that will prove that the acceptance criteria have been met. Short kickoff meetings like this are vital; prevents analyst from gold-plating or creating “ivory tower” requirements that are expensive to implement/test. Prevents testers from raising “false positives” – defects that really aren’t defects. Prevents devs from implementing something no one really wants. Throughout sprint devs will consult with analyst if they’re confused or if there’s a better way to solve the problem.
For a new team –
they should set up some simple ground rules, choose a tech platform and testing tools, an automated build, work out stories that follow INVEST principle -Independent, Negotiable, Valuable, Estimable, Small, Testable with acceptance criteria. Roles defined:
- Customers/analysts/testers define acceptance criteria
- Testers workw tih devs to automate acceptance tests
- Dev code behavior to fulfill this criteria
- If any automated tests fail – unit, component, acceptance – devs will make it a priority to fit.
- Make sure customer / proj mgmt layer buys into this, so they don’t scrap the project – “too much time working on automated acceptance tests”. And each new acceptance criteria should clearly state the business value. “blindly automating badly written acceptance criteria is one of the major causes of unmaintanable acceptance test suites.” It should be possible to write an automated acceptance test proving that the value described is delivered to the user.
- “Following the process we describe changes the way developers write code. Comparing codebases that have been developed using automated acceptance tests from the beginning with those where acceptance testing has been an afterthought, we almost always see better encapsulation, clearer intent, cleaner separation of concerns, and more reuse of code… this really is a virtuous circle, testing at the right time leads to better code.”
If midstream –
- Start with automating high-value use cases – automate happy path tests. Manual testing will dominate – the moment you test the same function manually more than a couple times – if its not going to change – automate the test.
- See Michael Feathers, working effectively with legacy code – “systems that do not have automated tests.” Simple rule of thumb – test the code you change. Create an automated build process, then scaffold automated functional tests. Again target high value paths.
- Legacy code is often not too modular and well structured – so lots of problems with changes in one part adversely impacting another one – meaning you’ll need to validate state of app at completion. Only write tests where it adds value – code that implements features of app, and then code that supports or framework. Most bugs will be in framework, so if you aren’t altering framework, little value in adding comprehensive testing.
- This should run very fast, they recommend 10 minutes is about the limit, 90 secs ideal. They recommend Junit or NUnit to break down long-running tests.
- Should not hit the db, filesystem, external systems, or (in general) interaction between components. (so they use test doubles or mocks)
- Speed comes at a cost – they miss interaction between components.
- Component tests is their phrase for integration testing.
- Common for a bug here to slip through, issue in prod for 3 weeks they weren’t aware of, even with 90% unit test coverage. Fix was introducing simple, automated smoke tests proving the app could perform its most fundamental functions as part of release process.
- Commit stage tests – run fast, as comprehensive as possible (75% or better), if any of them fails do not release. Environment neutral. In comparison, later stages are long running (think parallization), could still be a release candidate even if a test fails (i.e. it fixes a critical bug), and runs production-like environment to check against RM pipeline and prod env changes
Dealing with Technical Debt
Make a backlog visible to everyone. Your release can’t just show pass/fail, green/red – if its always red. Show the number of tests passed and failed, and graph them prominently.
Two approaches –
- zero defects (in the past devs would ignore bugs, deferring them – technical debt. Huge list of bugs pile up. Even worse with no acceptance tests (because of not practicing CI) – team is overwhelmed by huge list of defects, arguments between testers/devs/mgmt., release dates slip, users saddled with buggy software
- Treat defects the same way as features. Have users prioritize – a raredefect with a known workaround could be low priority and deferred. You could categforize as critical, blockers, medium, low. (often customers would rather not fix some bugs)
- Barely mentions beta testing techniques like canary releases
Why automated testing
- Performing manual build, test and deployment processes is boring and repetitive – far from the best use of people. People are expensive and valuable, and they should be focused on producing software that delights its users and then delivering those delights as fast as possible – not on boring, errorprone tasks like regression testing, virtual server provisioning, and deployment, which are best done by machines.
Almost every system has some kind of requirements on capacity and security, or the SLA. It makes sense to run tests to measure how well the system adheres to these requirements; deployments of what is acceptable are often subjective. Present the facts and allow a human to make go/no go release to prod. It is essential to start testing capacity/scaling as soon as possible so you hae an idea whether your app will be releasable.
If a RC fails to pass capacity testing, someone will decide whether its important enough to allow the candidate to be released.
“The crosscutting nature of NFSs makes them hard to handle both in analysis and in implementation. Yet they’re a frequent source of project risk. Discovering late in the project that the app won’t work because of a fundamental security hole or desperately poor performance is a frequent cause of late or cancelled projects. NFRs interact with one another in a very unhelpful matter – very secure systems compromise on ease of use, very flexible systems compromise on performance, etc. While in an ideal world the app will always be highly secure, performant, massively flexible, scalable, easy to use, support, simple to develop and maintain – every one of these characteristics comes at a cost.”
Availability, capacity, security, maintainability are every bit as important and valuable as functional ones, essential to a well functioning systems. The stakeholders a project should be able to make a priority call on whether to implement the feature that allows the system to take credit card payments vs the feature that allows 1000 concurrent users. One may be of more value than another. Its essential to identify these early in the project, the team then needs to find a way to measure it and incorporate regular testing into the pipeline. The team needs to think through the nonfx requts and the impact they have on the system architecture, schedule, test strategy, costs.
They recommend adding these as a specific set of stories or tasks at the beginning of the project. Specify enough detail that you can prioritize and do a cost-benefit analysis. It’s not enough to say “as fast as possible” – no cap on the effort or budget. Its easy to have poorly analyzed NFR’s constrain thinking which in turn leads to overdesign and inappropriate optimization. Devs in particular are generally bad at predicting where a performance bottleneck will be and make code unnecessarily complex in order to achieve doubtful performance gains. Premature optimization is the root of all evil. Use case, asynchronous message queue to display messages – meant to deal with surges of load. Errors picked up from queue, put in a memory list, then polled asynchronously in a separate thread before being placed in a second list, also polled – repeated 7 times. Paranoid focus on capacity but the problem was never there, the message queues was never flooded with errors. Remember YAGNI, You Ain’t Gonna Need It – do the minimum amt of work to achieve the result – guarding against overengineering. Optimizations should be deferred to the point where its clear they are needed. (Knuth’s dictum) We don’t recommend adding capacity testing to the acceptance test requirements. They should be a whole separate stage.
In short people tend to either ignore NFRs until its too late, or overreact with defensive architecture and overengineering. “Technical people are lured towards complete, closed solutions – solutions that are fully automated for as many cases as they can imagine. .. Operations people will want systems that can be redeployed and reconfigured without shutting down, whereas developers will want to defend themselves against every possible future evolution of the app, whether or not it will ever be required. …” they’re the software equiv of a bridge builder making sure that the chosen beams are strong enough to cope with weather and expected traffic. They’re real, and must be considered, but aren’t in mind of people paying for bridge: they just want something that can get them from one side of river to the other and looks nice. This means we need to guard against our tendency to seek technical solutions first. We must work closely with customers and users to determine sensitivity points of our app and define detailed nonfx reqts based on real business value. Then the team decides on the correct architecture and create reqts/acceptance criteria capturing the nonfx reqts in the same way that fx reqts are captured. That way they can be estimated and prioritized.