DevOps Giants Part 1: Continuous Delivery by Jez Humble

As some of you know, I’m currently writing a book on DevOps. It’s been a good opportunity to practice my procrastination skillz, which were already Mr Miyagi level. Want to make progress on your Netflix queue instead of watching titles pile up on you? Start a book – you’ll find all kinds of time to knock off all those binge-worthy crappy TV programs!

The enjoyable part for me – which took up most of November/December of last year, and is still stretching on – was revisiting some of my favorite DevOps and Agile related books of all time. One of the giants I’ve really enjoyed reading is “Continuous Delivery“, published back in July 2010 by Jez Humble and David Farley. This is a powerful, very lengthy book and I put it up there with the best writings by the “Big 3” (Gene Kim, Martin Fowler, Gary Gruver). Here’s my notes and thoughts for you to enjoy.


First off, this is a massive work. They say 512 pages but trust me, this is a very meaty 512 pages and contains more content than you’ll find in just about any other 1000 page mammoth out there. It will take some work to get through this; the format is designed to be read in any particular order so you will have to endure some repetition from chapter to chapter.

That being said, these guys have “been there”. This isn’t another “How To Do Agile” book written by someone who’s never written a line of code in their life; Jez and David both have decades of real world experiences. I wish, desperately, I would have had the foresight to have read this when it first came out. It would have resolved so many problems for me I didn’t know existed in my delivery pipeline.

The authors clearly drive home the aim of the deployment pipeline:

  • Every part of building, deploying, testing, and releasing software visible to everyone involved (increases collaboration)
  • Improves and shortens the feedback cycle (problems are identified as early as possible
  • Deploy and release any version of their software to any environment at will through a fully automated process

There’s a lot of meat here. In fact, this is likely enough to make a working definition of DevOps by itself. It’s unambiguous; if your artifacts aren’t automated, visible to all at any point in the process, and with a quick practiced release cycle your delivery pipeline needs some work. The book also brings out that repeatability and reliability derives from two principles: automate almost everything, and keep everything you need to build, deploy, test, and release your application in version control. They qualify “almost everything”: exploratory testing relies on experienced testers; demos to customers can’t be done by computers.

Still, the book makes it clear that you want to focus on outcomes, not purity (“doing DevOps”). The outcomes you want are:

  1. Reduced cycle time, delivering value faster to the business and increasing profitability
  2. Reduced defects – improve efficiency and spend less on support
  3. Increased predictability of SDLC to make planning more effective
  4. The ability to comply with regulations/
  5. Reduced costs due to better risk management and fewer issues associated with software delivery


Antipatterns to Avoid

Deploying software manually (shown by lengthy documentation, reliance on manual testing, frequent calls to dev team, corrections to release process during a release, diff environment configs, lengthy releases, risky releases)

“We have heard it said that a manual process is more auditable than an automated one. We are completely baffled by this statement.”.. “Performing manual deployments is boring and repetitive and yet needs a significant amount of expertise. Asking experts to do boring and repetitive, and yet technically demanding tasks is the most certain way of ensuring human error that we can think of, short of sleep deprivation or inebriation.”

The documentation and scripts make assumptions about the version or configuration… that are wrong, causing the deployment to fail. The deployment team has to guess about the intentions of the development team. … ad hoc calls, emails, quick fixes… a disciplined team will incorporate into deployment plan, but its rare for this process to be effective. .. common for new bugs to be found, but no time to fix with approaching deadline, and deferring the launch is unacceptable at late stage of project. Most critical bugs are hurriedly patched up, list of defects is stored by PM. … cost of coordination between silos (dev,DBA, Ops, testing) is enormous, stalling the release in tshooting hell. The remedy is to integrate test, deployment and release into the development process. Make them a normal and ongoing part of development. … little to no risk because you have rehearsed it in a a progressively more production-like sequence of test environments. Make sure everyone involved in process (build and release team, testers, devs) work together from the start of the project.

No more “install sql server” as a step. This is symptomatic of a bad relationship between devs and ops, certain that when it comes time for an actual deployment, the process will be painful and drawn out with lots of angry recriminations and short tempers. The first thing is to seek out ops people informally and involve them in the dev process. That way the ops team will have been involved with the software from the beginning, and both sides will have practiced what is going to happen many times before the release – which will be as smooth as a baby’s bottom.

A build and deployment expert is an antipattern – every member of the team should know how to deploy and maintain deployment scripts.

(antipattern – long-lived branches or defer acceptance testing until the end) – CI requires that every time someone commits any change, the entire app is built and a comprehensive set of automated tests run against it. Crucially, if the build or test process fails, the dev team stops whatever they are doing and fixes the problem immediately. The goal of CI is that the software is in a working state all the time.” Later in the book they say “you should be checking in your code several times a day.”)

Branch by feature is not recommended – branches must be short lived, likely less than a few days. Having many long lived branches is bad because of the problem of merging. (How does this reconcile with Git) – can lead to issue where testers are suddenly bombarded with bugs, see Martin Fowler writing on risks of branching by feature. (example, India team working in normal CI, one lucky guy had to handle merge issues each night with US side.

Antipattern: check in by devs stretches to days/weeks between. It is impossible to safely refactor an app unless everyone commits frequently to mainline, and merges are small and acceptable.

Antipattern: separate branch for new functionality and at some point merged to main. (with many devs absurdly complex integration issues, semantic conflicts, hard to refactor) – “a much better answer is to develop new features incrementally and to commit them to trunk in VC on a regular and frequent basis”. Fix concerns with commit test suite (<10 min, unit testing to catch any obvious regression errors against prod like env.); and introduce changes incrementally – checking in at minimum once a day, usually mult times a day. Need to be explicit with commit messages.


Two huge antipatterns – deploying from source control or recompiling binaries with each new environment. It’s essential to use the same process to deploy to every environment to remove deployment process as a potential defect. The environment you deploy to least frequently (prod) is most important.


Progressing In Maturity

I love the table below. It shows such a nice progression in maturity across 6 key areas. (This was figure 15-1 in the book btw)



Build Management and CI 

Environments and Deployments 

Release Management and Compliance 


Data Management 

Configuration Management 

Level -1: Regressive: Processes unrepeatable, poorly controlled, reactive

Manual processes for building software.

Manual process for deploying software and provisioning environments.

Infrequent and unreliable releases

Manual testing after deployment

Data migrations unversioned and manual

Version control either not used or infrequent checkins

Level 0 – Repeatable: Process documented, partly automated

Regular automated build and testing. Any build can be recreated from source control.

Automated deployment to some environments. All configuration externalized/versioned. Creation of new environments is cheap.

Painful and infrequent, but reliable releases. Limited traceability from requirements to release.

Automated tests written as part of story development

Changes to database done with automated scripts versioned with application

Version control in use for everything required to recreate software

Level 1 – Consistent: Automated processes across lifecycle

Automated build and test cycle every time a change is committed. Managed dependencies.

Fully automated, self service push-button process for deploying software. Same process to deploy to every environment

Change management and approvals process defined and enforced; regulatory and compliance conditions met

Automated unit and acceptance tests. Testing part of development process.

Database changes performed automatically as part of deployment process

Libraries and dependencies managed. Version control usage policies determined by change management process

Level 2: Qualitatively managed: Process measured and controlled

Build metrics gathered, made visible, and acted on. Builds are not left broken.

Orchestrated deployments managed. Release and rollback processes tested.

Environment and application health monitored and proactively managed. Cycle time monitored.

Quality metrics and trends tracked. Non functional requirements defined and measured

Database upgrades and rollbacks tested with every deployment. Database performance monitored and optimized.

Developers check in to mainline once a day. Branching only used for releases.

Level 3: Optimizing: Focus on process improvement 

Teams regularly meet to discuss integration problems and resolve them with automation, faster feedback, and better visibility 

All environments managed effectively. Provisioning fully automated. (Docker here or virtualization?)

Ops and delivery teams regularly collaborate to manage risks and reduce cycle time 

Production rollbacks rare. Defects found and fixed immediately. 

Release to release feedback loop of database performance and deployment process.

Regular validatoion that CM policy supports effective collaboration, rapid development, and auditable change management processes 


Why Automated Releases

It’s hard to argue with the value of automated releases – but it’s amazing how few production systems we’ve encountered are fully automated. One of the principles described in the book is to use the same script to deploy to every environment. … then the deploy to prod path will have been tested hundreds or even thousands of times before it is needed on release day. If any problems occur upon release, you can be certain they are problems with environment specific config, not your scripts…. If its not automated, its not repeatable, and every time it is done it will be different because of changes in software, config, environments, and the release process itself. Since its manual its error prone, and there is no way to ensure high quality because there’s no way to gain control over the release process. Releasing software too often is an art; it should be an engineering discipline.

Note, I agree with some of the statements below but I think the emphasis on scripted releases a little antiquated. The reasons they give are bogus IMHO – (can be audited, scripts are tidy and easy to understand, understanding and maintenance is easy). One of few variances I have, and – much like the approach in Art of Monitoring which was script based – easy to overlook because of the outstanding content.

This was a bold statement that I highlighted: We can honestly say we haven’t found a build or deployment process that couldn’t be automated with sufficient work and ingenuity. … it should be possible for a new team member to sit down at a new workstation, check out the project’s source code, and run a single command to build and deploy to any environment including local dev.


Why do they stress visibility and accessibility to everyone of a given release? “Most of the waste in software comes from the progress through testing and operations. It’s common to see build/ops teams waiting for documentation, testers waiting on “good” builds, dev teams receiving bug reports weeks after the team has moved on to new fx, discovering late in the game that the app’s architecture will not support nonfx requirements. Software is undeployable because its taken so long to get it into production, and buggy because the feedback cycle is so long. A release process where testers/ops can deploy builds to environments push-button, and devs know bugs early on, and managers can view cycle time/thruput/code quality – that transparency and visibility allows bottlenecks to be identified, optimized, and reviewed – both a faster and a safer delivery process.”


Configuration Management

A great quote here: “The simple act of adding your configuration information to your version control system is an enormous step forward. At its simplest, the vC system will alert you to the fact that you’ve changed the config inadvertently. This eliminates at least one very common source of information.” (mentions a horror story – test env configured manually, not in VC as was dev version – so properties different, or missing, no two the same, all diff from production. Which properties were valid, or redundant, which should be unique? – had 5 people responsible for managing config.)


They also mention that it’s bad practice to inject config information at build or packaging time – anything that changes between deployments should be captured as config, not baked in when the app is compiled. Two principles hold true – keep binary files independent of all config info, keep all config info in one place. •    Config should use clear naming conventions, config options in same repository as source code but with values elsewhere. Avoid overengineering, keep it as simple as possible.


It’s key to be able to repeatedly recreate every piece of infrastructure used by your application (OS, patches, OS configs, app stack, its config, infra config) “All artifacts relevant to your project and the relationships between them are stored, retrieved, uniquely identified, and modified.” This includes:

  • App source code, build scripts, tests, doc, requirements, db scripts, libraries, config files
  • Dev, testing, operations toolchains
  • All environments used in dev, testing, prod
  • Entire app stack – both binaries and config
  • Config associated with every app in every environment it runs in


Some key questions:

  • How do you represent your config information? How do your deployment scripts access it? How does it vary between environments, apps, and versions?
  • How do we handle secrets? The authors recommend having a central service thru which every app can gets the config it needs. They recommend a façade class to access whether file system or DB/REST svc(!) They recommend Escape.
  • How do we test configuration? (at very least ping all external svcs and make sure anything the app depends on is unavailable, then smoketests)
  • Could you completely recreate your prod system (excluding data) from scratch with the VC assets you store?
  • Could you regress to an earlier known good state of the app?
  • Can you be sure that each deployed environment is set up in exactly the same way?


How Continuous Is Continuous Deployment?

This means release to production (Timothy Fitz) – The intuitive, immediate response is, “this is too risky!” After all, in order for this to work, your automated tests need to be fantastic, covering the entire app. You have to write all your tests first, so only when a story is complete will checkins pass the acceptance tests. Aaron puts this differently when he says “You can’t cheat shipping” – regular releases to production can be combined with canary releases to roll out to a small group of users first and then rolling out to other users (manually) that there’s no problems.


To counter the “too risky” reaction: More frequent releases lead to lower risk in putting out any particular release. If you release every change, the amount of risk is limited to just the risk in that one change. So CD is a great way to reduce the risk of releases. It also forces you to do the right thing. You can’t do it without automation throughout build, deploy, test, and release. You can’t do it without comprehensive, reliable set of automated test. Can’t do it without writing system tests against a prod like environment. Even if you can’t actually release every set of changes that passes all your tests, you should aim to create a process that would let you if you chose to.


One use case mentioned – “the ops team was strongly pushing back on the schedule. After the meeting the techies hung around afterwards and exchanged phone #’s. Over a few weeks they kept talking, the system was deployed to prod server, a small group of users given access a month later. A member of the deployment team came and worked with dev team to create deployment scripts and writing the installation documentation – so no surprises. In ops team meetings where many systems discussed and scheduled, that team hardly discussed since ops was confident they could deploy it and of the quality of software.”

The most crucial part of release planning is assembling representatives from every part of your org involved in delivery: build, infra, ops teams; dev teams,; testers; dba’s, support personnel. They should continue to meet throughout the life of the projct and continually work to make the delivery process more efficient.


Continuous Integration

“In our experience, a major contributor to cycle time is people .. waiting to get a “good build” of the application.” Problem is removed with deployment pipeline where everyone can see the builds as they are deployed and be able to perform a build themselves, push-button. Benefits – testers can select older versions in their repros to verify change in behavior over newer version, support staff (repros), operations (DR recovery exercise)


The cardinal sin is checking in on a broken build. If a build breaks, the devs responsible need to identify the cause of the breakage as soon as possible fix it. (the corollary – “never go home on a broken build” – not staying late to fix the build after working hours, but check in regularly and early enough to deal with problems as they occur. “Many experienced developers make a point of not checking in less than an hour before the end of work, and do it first thing the next morning.”


Some simple rules to keep in mind:

  • If you can’t fix it quickly, you should revert to previous version. A team rule – if a build can’t e fixed within 10 minutes, revert.
  • Another rule – don’t comment out failing tests.
  • If you commit a change and all your tests pass, but others break, it is your responsibility to fix all tests not passing as a result of your changes.


In software, when something is painful, the way to reduce the pain is to do it more frequently, not less. “Bring the pain forward” is a mantra of the book. (this made me smile, and reminded me of my good friend Donovan Brown who’s said this often.)

  • If integration is painful, do it every time someone checks in from the get-go.
  • If Testing is painful – do it from the beginning of project.
  • If release is painful, release every time someone checks in achange that passes all tests. If not to real users, maybe a subset, or to a production-like environment. Gradually improve release time until you can hit target (like internal release every 4 weeks for example)
  • If documentation is painful, do it as you roll out new features, and make it part of your definition of done.


You should be able to answer “yes” to –

  • Can I exactly reproduce any of my env, including version of OS, patches, network config, software stack, apps deployed into it and their config?
  • Can I easily make an incremental change too any of these individual items and be able to deploy that change to any env?
  • Can I easily see all changes to an env and trace a particular change back to what exactly the change was, who made it, and when?
  • Can I satisfy all compliance regs?
  • Can everyone on the team get the info they need and make changes they need to make?


And this was another statement that was very emphatic: “If you don’t have every source artifact of your project in version control, you won’t enjoy any of the benefits that we discuss in this book.” Everything including CI, auto testing, push button deployments, depends on this. The book called out three components – automated builds, version control, and – vital – the agreement of team (check in small incremental changes frequently to mainline, the highest priority is fixing any defects that break the app).


You can also fail the build for warnings and code style breaches. “Code Nazi” indeed! It is effective in enforcing good practices. “We removed Checkstyle (with our distributed team) – after a few weeks we started to find more “smells” in the code and doing tidy-up refactorings. In the end we realized that although it came at a cost, CheckStyle helped us to stay on top of the almost inconsequential things that together adds up to the difference between high-quality code and just code.


Four strategies to keep app releasable: hide new fx until finished; make all changes incrementally in small releases; use branch by abstraction for large scale changes; use components to decouple parts of app that change at diff rates.

  • Like hiding until ready to release – means you are always integrating and testing entire system even if feature flag turned off.
  • Often tempting to branch source control and make changes to the branch. In practice, wiring everything up ends up being the hard part when time to merge. “The bigger the apparent reason to branch, the more you shouldn’t branch.”
  • Like component – this is the strangler fig, changing big ball of mud to modular, better structured code. Take part of codebase out as a component and rewrite. You can localize the mess and use branch by abstraction to keep the app running with the old code while you create a new modularized version of the same functionality. (also called “Potemkin village”)


Conways Law states that orgs that design systems are constrained to produce designs which are copies of the communication structures of those organizations.” So open source projects where the devs communicate only by email tend to be very modular with few interfaces. A product developed by a small, collocated team will tend to be tightly coupled and not modular. Be careful how you set up your dev team as this will affect the architecture of your system. (I’ve heard this put as “sooner or later, companies will ship their org structure as a product.”)


It is vital to version dependencies including libraries and components – otherwise you can’t reproduce your builds. When there’s a break, you wont’ be able to find the change the broke it or find the last “good” version in your library. And its best to trigger a new build whenever there is any change to upstream dependencies. Most teams update their dependencies when code is stable – paying a higher cost later at integration time. Unitil your app grows sufficiently large, There is no need to build your components individually – the simplest thing is to have a single pipeline that builds your whole app once as the first stage.


Developers almost always branch on mainline. This ensures that all code is continuously integrated, avoiding merge hell at end of project. How do you manage lg teams of devs working on multiple releases? Good componentization of software, incremental development, feature hiding. More care required here in architecture and dev. Merging branches twds release time is always a complex process that takes an unpredictable amount of time. Each new merge breaks different pieces of existing fx and is followed by stabilization process as people work away on fixing issues in mainline. “Creating long lived branches is fundamentally opposed to a successful continuous integration strategy. Our proposal is not a tech solution but a practice – always commit to trunk, and do it at least once a day. If this seems incompatible with making far-reaching change, perhaps you haven’t tried hard enough. In our experience, altho it sometimes takes longer to implement a feature as a series of small incremental steps that keeps the code in working state, the benefits are immense. Having code that is always working is fundamental – we can’t emphasize enough how important this practice is in enabling continuous delivery of valuable, working software. There are times where this approach won’t work, but they really are very rare…” (see branch hell described in Figure 14.6 – branch in mainline, two teams. A small team had to be dedicated to handle merges!)


The one situation where it might be acceptable is before a release. Creating a branch for release replaces the evil practice of the code freeze, where checking in to source control is switched off for days or even weeks. With a release branch, devs can keep checking into mainline, while changes to the release branch is made for critical bugfixes only (figure 14.2). In this case fixes for critical defects are committed on branches and merged into mainline immediately. T


Dashboarding and Transparency

Being able to react to feedback also means broadcasting information. Big, visible dashboards and other notification mechanisms. Dashboards should be ubiquitous, and at least one should be present in each team room.


Customers can overreact when they see red x’s on buildmonitor. Have to explain – every time a build fails it indicates a problem has been found that otherwise may make it into prod.


Come up with a list of risks, categorized by probability and impact. Could include generic risks (running out of disk space, unauthorized access) and specific risks (transactions not complete) – then work out what to monitor and how to display. Green/yellow/red states;


Hypothesis Driven Development

  • The decisionmaker/customer makes guesses about which features and bugfixes will be useful to users. However, until they are in the hands of users who vote by choosing to use the software, they remain hypotheses. It is vital to minimize cycle time so that an effective feedback loop can be established.
  • Feedback useful criteria:
    • Any change needs to trigger feedback process
    • Feedback must be delivered as soon as possible
    • Delivery teams must receive feedback and act on it


Infrastructure as Code

Manually configured environments are the worst possible decision and the most common. (config info very lg in size, once broken finding it takes a long time with sr personnel, difficult to reproduce a manually config env, and maintain as nodes drift apart). Change from “works of art” (Visible Ops Handbook) to mass-produced objects whose creation is repeatable and predictable. It should always be cheaper to create a new environment than to patch/repair an old one.


The book recommends a holistic approach to managing infrastructure:

  • Desired state of your infra should be specified thru version-controlled config
  • Infra should be automatic
  • You should always know the actual state of your infra thru instrumentation and monitoring


“While in general we are not a fan of locking things down and establishing approval processes, when it comes to your production infra it is essential. And since you should treat your testing env the same way you treat prod – this impacts both. Otherwise it is just too tempting, when things go wrong, to log onto environment and poke around to resolve problems. (this usually leads to svc disruptions with rboots/svc pack at random), and there’s no relaiable record of what was done and when – so you can’t reproduce the cause of the problem you’re creating. “Stabilizing the patient” – without turning off access, ops staff spends all their time firefighting because unplanned changes break things all the time. A good way to set expectations of when work will be done and enforce access control is to create a maintenance window. ”


The best way to enforce auditability is to have all changes made by automatded scripts which can be referenced later (we favor automation over documentation for this reason) Written documentation is never a guarantee that the documented change was performed correctly. … Provisioning new servers is manual, repetitive, resource-intensive, and error-prone process – exactly the kind of problem that can be solved with automation.


The goal of config management process is to ensure its declarative and idempotent – which means you onfigure the desired state of your infra and a system ensures that this config is applied. (means automating applying OS service packs, upgrades, installing new software, changing settings, or performing deployments)


“In our view, no technology can be considered genuinely enterprise-ready unless it is capable of being deployed and configured in an automated way. If you can’t keep vital config info in versioned storage and thus manage changes in a controlled manner, the technology will become an obstacle to delivering high-quality results…”


There is no fundamental reason why cloud based services are less secure than publicly accessible services hosted on infra you own. Compliance is also often mentioned – yet usually the problem isn’t that regs forbid the use of cloud computing as much as they haven’t caught up with the cloud yet. Gien careful planning and risk management its usually possible to reconcile the two, even with health care/banking type issues (esp using data encryption). Vendor lockin is another fear.


It is extremely common for problems with infra services – such as routers, DNS, and directory services – to break software in prod environment that worked perfectly all the way thru the deployment pipeline. (Nygard – InfoQ – a system that dies mysteriously at the same time every day). How to address this?

  • Put every part of networking infra config into source control
  • Install a good network monitoring system – know when network connectivity is broken
  • Logging. Your app should log at a WARNING level every time a connection times out or is closed.
  • Smoke test post deployment connectivity.
  • Testing envirobnment network topology as similar as possible . This is what staging is for!


Lean Manufacturing

The goals of lean manufacturing are to ensure the rapid delivery of high quality products – focusing on the removal of waste and reduction of cost. It’s resulted in huge cost and resource savings, higher-quality products, and faster time to market in several industries. (i.e. DevOps is not small scale, as it comes from Lean which in turn was born in large enterprises)


Build quality in was a mantra of W. Edwards Deming – whose idea was, if you catch defects earlier, they’re cheaper to fix. The cheapest place to fix bugs in code is before they’re ever checked into source control. Auto testing, CI/CD are designed to catch bugs early. It does no good if they’re not fixed, which will require discipline. Lean tells us – testing is not a phase, and it is not the exclusive domain of testers.


Deming Cycle – Plan, Do, Study, Act.


Use the theory of Constraints:

  • Identify the part of the build, test, deploy, release process that’s the bottleneck. (say, manual tests)
  • Exploit the constraint. Maximize the thruput of that part of the process. (buffer of stories waiting to be manually tested, resources used in manual testing are not distracted)
  • Subordinate all other processes to this constraint (have your devs work just hard enough to keep backlog constant and rest of time writing automated tests to catch bugs so that less time is spent manually testing)
  • Elevate the constraint (if cycle time too long invest more effort in auto testers or hire more testing)
  • Rinse and repeat.



What you choose to measure will have an enormous influence on the behavior of your team (Hawthorne effect) – measure lines of code, and devs will write many short lines of code. Measure # of defects fixed, and testers will log bugs that could be fixed easily. According to Lean – its essential to optimize globally, not locally. If you spend a lot of time removing a bottleneck that is not the true constraint, you make no difference to the delivery process. Its important to have a global metric that can be used to determine if the delivery process as a whole has a problem.


Besides cycle time below:

  • Automated test coverage
  • Codebase properties (duplication, cyclomatic complexity, coupling, style problems, etc
  • # of defects
  • Team Velocity
  • # of commits each day
  • # of builds per day
  • # of build failures per day
  • Duration of build incl automated tests


There is no one size fits all solution to the complex problem of implementing a deployment pipeline. The crucial point is to create a system of record that manages each change from checkin to release, providing the info you need to discover problems as early as possible in the process. Then drive out inefficiencies to make the feedback cycle faster and more powerful, by adding better config mgmt., or refining auto acceptance tests and parallelizing, etc. Requires discipline – only changes that have passed thru system gets released.


Mary Poppendieck – “How long would it take your org to deploy a change that involves a single line of code? Can this be done on a repeatable reliable basis?” Hard to measure as it covers analysis/dev/release, but it tells you more about your process than any other metric. Focusingo n the reduction of cycle time encourages practices that increase quality, such as the use of automated tests.


How Much do I Need to Change The Org? … and working with Operations/IT as partners

It is essential that everybody involved in the process of delivering software is involved in the feedback process. Devs, testers, operations staff, DBA’s, infra specialists, managers. (Jez recommends cross functional teams, but if not at least work together daily)


Ideally everyone within an organization is aligned with its goals, and people work together to help meet them… we succeed or fail as a team, not individuals. However, in too many projects – dev huck work over wall to testers, who in turn throw work over the wall to ops at release time. When something goes wrong, we spend more time blaming each other than we do fixing defects from this overly siloed approach. .. .if siloed start by getting everyone involved in the release process together and ensure they have an opportunity to collaborate on a frequent basis. Dashboard and everyone can see health of application, builds, state of environments.


Almost all med and large companies separate the activities of dev and infra management (or ops) into separate siloes. Each has its own lines of reporting, a head of ops and a head of software dev. Every time a prod release occurs, these teams work to ensure that any problems that arise are not their fault. Each group wants to minimize deployment risk, but each has their own approach – causing tension. Ops teams measure their effectiveness in terms of MTBF and MTTR for example. Ops teams have SLA’s they need to meet, and likely reg requirements. Any change, including a process change, represents a risk. Ops mrs need to ensure that any change to any environment they control is documented and audited (also Sarbanes-Oxley). Deploying a new version of your app may require CAB meeting. Include details on the risk, impact of change, how to remediate if it fails. The request should be submitted before work starts on new version to be deployed – not a couple hours before go-live. Devs should familiarize themselves with the ops system/processes and comply, make it a part of release plan.


Abnormal conditions – ops managers want to be alerted when these occur so they can minimize downtime. How does the ops team want to monitor your app, make it part of the release plan. Where are they expecting logs to be, how will the app notify ops staff of malfunctions. Treat these as requirements, and add them as stories. Consider ops personnel here to be an important constituency of users. They may also have reqts for service continuity plan. Each service the ops team manages will have a recovery point objective (a length of time prior to a disaster for which data loss is acceptable) and a RTO – recovery time objective – max length of time before svcs restored. This governs backup/restore strategy. Again, good testing of backup/recovery


Your Risk Management strategy might answer the following questions:

  • How are you tracking progress?
  • How are you preventing defects?
  • How are you discovering defects?
  • How are you tracking defects?
  • How do you know when a story is finished?
  • How are you managing your environments?
  • How are you managing configuration?
  • How often do you showcase working features?
  • How often do you do retrospectives?
  • How often do you run your automated tests?
  • How are you deploying your software?
  • How are you building your software?
  • How are you ensuring that your release plan is workable and acceptable to the ops team?
  • How are you ensuring that your risk and issue log is up to date?



Project Management and Strategy

The most important part of project is inception.

  • List of stakeholders and business sponsors. Shouould only be one business owner. Internal stakeholders include ops, sales, marketing, support, dev, testing teams.
  • Establish business case and value of project, a list of high level functional and nonfunctional requirements – just enough detail to estimate work involved. Includes capacity, availability, security
  • SDLC, Testing and release strategy
  • Architectural evaluation – decision on platform/frameworks
  • Risk and issue log

Then comes initiation.

  • Setup of team room and hardware/software. Whiteboard, paper and pens, printer, internet. Food and drinks
  • Version control, setup of CI/CD with hello world sample to dEV->TEST.
  • Agreed upon responsibilities, working hours, meeting times (standups, planning sessions, showcases)
  • Simple test environments/data
  • Starting work on backlog


Failure points of scrum:

  • Lack of commitment – relies on transparency, collaboration, discipline – no more hidden flaws.
  • Ignoring good engineering – can’t ignore TDD or enforcement of good practices and coding.
  • Scrumterfall – adapting agile out the gate to fit your org. First follow process as written, then start adapting.
    • If its 1-3 week sprints
    • Tested and working at end of sprint
    • Product owner identified
    • Backlog prioritized by business value, estimates in story points, tasks by hours
  • If siloed
    • Form a release working group – across all siloes – tasked with coming up with a release strategy and keep the process working.
      • Note authors do not feel a CAB meeting is an antipattern – esp if issues with uncontrolled changes wreaking havoc. CAB team formed with reps from dev, ops, security, CM team, and business
      • Decide which environments should be locked down and enforce
      • Automated change request management system, and designate owner(s) for each controlled environments
      • Push button deployments when approved
    • Meet regularly – run a retrospective, and Deming cycle – plan, do, check, act.
    • Ensure releases are happening as often as possible to production (or a production like environment). “If it hurts, do it more frequently”
    • Big monitors and dashboarding
    • The important thing is to evaluate the risk of the change – against the benefit. If the risks outweigh the benefits, the change should not be made – or a less risky option found. (Track this too – how long does it take to have a change be approved? What % are denied? How many changes are awaiting approval?) And validate – what’s the MTBF/MTTR? Cycle time? And hold retrospectives and invite feedback to improve.
    • On the above – if you are (as I was) saddled with vast amounts of legacy code to support – remember to get “Working Effectively with Legacy Code” by Michael Feathers and the classic books by Poppendieck.
    • Value stream mapping
      • Poppendieck Lean Software development – recommends going to source, where a customer requests comes in. Your goal is to draw a chart of the average customer request, from arrival to completion. Working with people involved in each acivity, sketch these out and the avg time for each step. This exposes the amt of time work is in waiting state and non value adding activities.
      • Should trake half an hour – from checkin to release. Best guesses for time. Maybe look at similar system. A commit stage to build your app and run basic metrics and unit tests, a stage to run acceptance tests, stage to deploy app to a production like environment so you can demo it. This should be part of iteration 0.


Release Management

While releases can be exhilarating, they can also be exhausting and depressing. Almost every release involves last minute changes, such as fixing the database login details or updating the URL for an external service. There should be a way of introducing such changes that they are both recorded and tested – in VC and then promoted to prod. Imagine if you could do a production deployment with a push of a button 0- and worst that would happen, you could back out the same release in minutes or seconds. Delta is small, so the risk is minimized. No more “I am risking my career” heebie-jeebies!


Two fears – introducing a problem because of a hard-to-detect mistake; 2nd is that because of a problem in the release process you are committed, forcing a clever solution under a severe deadline. Both are fixed by rehearsing the release many times a day, 2nd with a backout strategy. Typically best is to keep previous version of app available to deploy; the biggest obstacle is rolling back data. (only make additive not destructive changes) This way, by the time a release is available to deploy to prod, we know:

  • Code compiles
  • Passes unit tests – so it works like our devs think it should
  • Passes acceptance tests (does what users think it should)
  • Configuration of environments is fine (tested in prod-like env)
  • All components in place and deployment system is ok
  • Version control is working


Traditional approaches … delay the nomination of a release candidate until several lengthy and expensive steps have been taken to ensure quality/fx. In an environment where build/deployment automation is aggressively pursued along with automated testing, there’s need for this (esp at end of project, see Lean). … Delaying testing until after the dev process is, in our experience, a sure-fire way to decrease the quality of your release. Defects are always more expensive to fit later in the process (devs have forgotten, fx changed, and there’s no time to fix bugs late in the game– must be added to a list)


Team definition of done – “Done” means “Released”. No “80% done” – it’s either complete, or not. As its not always possible to release to prod at end of every sprint – “done” could mean “demo’d and tried by rep of user community in a production-like environment”


If any part of the pipeline fails, stop the line. The most important step in achieving rapid repeatable, reliable releases is for your team to accept that every time they check code into vc, it will successfully build and pass every test. The whole team owns a deployment failure – they should stop and fix it before doing anything else.


The release plan contains steps to deploy app, how to smoketest, backup strategy, logs and methods of monitoring, issue log. I’m questioning this one. This is to get the first release going smoothly. The release strategy needs to be documented and kept up to date: (this is a source of both functional and nonfx reqts)

  • Parties in charge of deployments and release, masters of each env
  • Asset and config mgmt strategy
  • Technology used for deployment
  • Implementing the pipeline
  • Environments to be used for acceptance, capacity, integration and user acceptance testing – and the process which builds moved thru these environments
  • Process for deploying into testing/prod environments
  • Monitoring requirements and services /APIs the app should use to notify operations team of its state.
  • Config mgmt.
  • External systems integration points – at what stage and how tested, and how do ops personnel communicate with COTS provider if there’s a problem
  • Disaster recovery plan
  • SLA for the software – failover, ha, etc
  • Production sizing and capacity planning – data, log files, and width and disk space, latency for clients
  • Archiving strategy, auditing reqts
  • How fixing defects and applying patches
  • Upgrades to production environment
  • How App support will be handling


With any rollback plan the big constraint is data, and the systems you are tied down to (orchestrated releases). First ensure that state of prod system including db is backed up. 2nd is to practice the rollback plkan, including restoring from backup or migrating the db back. Best plan here is to roll back by deploying the previous good version – including recreating the environment from scratch. (this is cleanest but will lead to a downtime) – you can also do deployment slots for zero-downtime releases. (basically this is the same as a blue-green deployment – two identical environments, each replicas. Run smoke tests against blue env, and when ready – change router config to point to blue env . (put db into read-only mode at beginning)


Canary releases – roll out a subset to prod servers. You can do smoke tests, capacity, and start routing selected users to new version. Rollbacks are easy, some companies measure usage of new features and kill if not being used. Great, low risk way of testing capacity by gradually routing more an dmore users to app and measureing response time and PU usage, I/O, memory, and log files. (esp if you don’t have $ for a realistic prod like env) – harder to sue if installed as fat client on customer servers. (see grid computing – enable app to auto update it to a known good version hosted on your servers.)


Emergency fixes – a critical defect and has to be fixed ASAP. Most important thing to bear in mind is – do not, under any circumstances, subvert your process. Emergency fixes must go thru the same build, deploy, test and release process as any other change. (otherwise env in unknown state that makes it impossible to reproduce and breaks other deployments in unmanageable ways). One more reason to keep cycle time low. Always evaluate – how many people the defect affects, how often it occurs, how severe it is in impact to end users. Never do them late at night, always pair with someone else. Make sure you’ve tested your emg fix process. Only under extreme circumstances circumvent std release process to do a fix. Make sure you have tsted making an emg fix in staging. Sometimes better to roll back vs deploying a fix.




Pieces of a functional feedback system/testing:

  • Source code compiles and is valid
  • Unit tests (behaves as expected) and Test coverage – runs very fast, tests the behavior of small pieces of app in isolation.
  • Functional acceptance tests (delivers business value expected) – should run against whole app in a prod-like environment. Long running, >1 day sometimes. Group into functional areas so you can run tests against a particular aspect/behavior.
  • Nonfunctional tests (capacity, availability, security)
  • Exploratory testing (manual, smoketesting)


This echoes W. Edward Deming’s 14 points – “cease dependence on mass inspection to achieve quality. Improve the process and build quality into the product in the first place.” Most companies though rely on manual acceptance testing – auto tests are poorly maintained/out of date and are supplemented with manual practices. Good testing though gives safety/confidence software is working as it should; constraint on dev process by encouraging good dev processes.


A common practice in many orgs is to have a separate team dedicate to the production and maintenance of the test suite. Devs then feel they don’t own the acceptance tests, so they don’t pay attention to failure at this late stage, so it stays broken for long periods of time. Acceptance tests written without dev involvement tend to be tightly coupled to the UI and thus brittle and badly factored.


The most common obstacles are lack of testing licenses and an app architecture that prevents the system being run on a dev environment 9so devs own the acceptance testing layer).


It is important to note that acceptance tests are expensive to create and maintain. They are also regression tests. Don’t follow a naive process of taking your acceptance criteria and automating every one.


Another use case – “by not replicating the production environment for capacity testing was a false economy, because we were building a high performant system and the problems we found exhibited at loads we couldn’t apply in our lower spec environments. These problems were expensive to find and fix.”


Ideally production environment to run your manual and automated tests on. And an automated script that does a smoke test to make sure its up and running.


Mike Cohn – Unit > Service > UI. (test automation pyramid) – unit tests form vast majority. Fewer acceptance tests (divided into service and UI tests) these will typically take far longer to execute.


For the purpose of commit tests, don’t test via the UI at all. UI testing involves a lot of components or levels of software – time consuming. Work at human timescales, again desperately slow. Dependency injection or inversion of control is a useful design pattern to create testable units of code.


Commit stage will be exercised several times a day by each dev; if over 5 minutes and complaints will start. 10 minutes is the max. Do everything possible to keep this stage fast while not losing key value; fast feedback on errors too costly to fix later. This is perhaps the biggest bang for your buck – knowing the exact moment when a change is introduced.


Use case – trading system interacting with another system owned by another dev team via a message queue. Lots of interaction and external system meant we didn’t own the full lifecylde – hard to have meaningful end to end acceptance tests. We implemented a reasonably complex stub that simulated operation of the live system. Allowed us to plug the gap in the lifecycle of our system; instead of having to maintain a complex network of distributed systems; we could choosed when to interact with the real thing and when to deal with the simpler stub. This was deployed by environment thru configuration. We tend to use stubbing widely for large scale components an subsystems; mocking for components at code level. Allowed us to simulate difficult edge cases that would have been hard to set up on real systems, broke dependency on parallel dev team.


“We think TDD is essential to enable the practice of continuous delivery. See books Growing Object-Oriented Software, and xUnit Test Patterns.” (this last one I need to read – defines dif between dummy objects, fake obj, stubs, spies, and mocks)


Acceptance tests should be written, and ideally automated, before development starts on a story. Its critical in an agile environment because it answers the question “How do I know when I am done?” for devs, “Did I get what I wanted?” for users. Tools – Concordion, Cucumber, JBehave, Twist – separate test scripts (for users to write) from implementation (devs/testers write code behind the scenes)


Powerful regression test suite esp for lg teams, free up testers, feedback loop is tighter, can autogenerate requirements defn (Cucumber, Twist)


These can be brittle, expensive to maintain if not using good tools/practices. “A good question to ask yourself every now and again is, ‘how often do my acceptance tests break due to real bugs, and how often due to changes in requirements?'” Happy path should be first target for automation, followed by alternate happy path (if stable) or sad path (bugs)


Why aren’t unit tests the same as acceptance tests? Acceptance tests are business, facing, not dev facing, and test whole stories at a time in a prod like environment.


A common complaint too expensive to create and maintain. The cost of this is much lower in our experience than performing frequent manual acceptance and regression testing or releasing poor quality software. They catch serious problems that unit or component tests can never catch. Manual testing usually happens at a late date where teams under extreme pressure to get software out the door. There’s no time to fix these bugs – they’re added to a list. Where defects are found that require complex fixes, odds of integration/regression problems rise.


Some in Agile community say do away almost entirely with acceptance testing, write unit + component tests combined with pair programming, refactoring, analysis/exploratory testing by customers, analysts, testers working together. Jez doesn’t like this – unit and component tests do not test user scenarios. Acceptance tests great at catching threading probllems, architectural mistakes or environmental/config issues. Hard to discover thru manual testing and impossible in unit/component testing. Better protection when making large scale changes to it. And it puts too high of a burden on testers who must do boring, repetitive tasks. Devs are not as good as testers in finding issues in their own work. Its much better to have testers with devs finding defects.


The cost of maintaining complex acceptance tests is a tax, an investment which is repaid many times over in reduced maintenance costs, protection that allows you to make wide ranging changes to the app, and significantly higher quality – “bringing the pain forward”. Without excellent auto test coverage, one of 3 things happens: a lot of time is spent trying to find and fix bugs at the end of the process, you spend time/$ on manual and regression testing, or you release poor quality software.


Manual testing in the software industry is the norm and represents often the only type of testing done by a team. This is both expensive and rarely good enough on its own to ensure high quality. Use manual testing only for exploratory testing, usability testing, showcasing, user acceptance testing.


Proper way to write:

  • Crucial that your test implementations use domain language and do not contain detail on how to interact with the app. (UI changes are brittle) The behavior that any given acceptance test is intended to assert is – “If I place an order, is it accepted?” “If I exceed my credit limit, am I informed?”
  • Who owns them – not a testing team only. At ethe end of chain of development, so most of our acceptance tests were failing most of their lives. Test team would find out about changes late in process, after developed and checked in. Since the testing team had so many automated tests to repair, it would take some time to fix most recent breakages, so the dev team had moved on to other tasks. As the test team became snow under with new tests to write and older tests to refactor and fix, they fell further behind. We wanted to improve the time here; we made the whole delivery team (devs and testers) responsible for automated acceptance tests. Focused the devs on acceptance criteria, more aware of impact of their changes; better at predicting when their work would cause problems. Can be done thru build masters (tracking own the guilty), or standing up and shouting “who is fixing the build?” – lava lamps or large build monitor also helps.
  • Don’t test against GUI – an app written with testability in mind will have an API that both the GUI And the test harness can talk to to drive the application. Running tests against the business layer directly is a reasonable strategy. (This requires discipline for frontend team to keep presentation focused and not straying into realm of business or app logic)
  • Typically acceptance testing takes hours to complete vs a few minutes. You could refactor by looking for quick wins – spending time refactoring slowest tests. Test against a public API vs a UI. Parallizing acceptance testing with each test client running its own Selenium instance for example. One company separated out API testing vs UI based testing, for quicker failure detection. Next text was to divide into batches, run alphabetically, ran in parallel.


The role of the BA or analyst/tester

  • The role of the business analyst primarily is to represent the customers or users of the system. They work with the customer to identify and prioritize requirements. They work with devs to ensure they have a good understanding of the sapp, and guide them to ensure the app meets business value. Work with testers to ensure acceptance tests are specified properly. Encouraging analysts and testers to collaborate and define acceptance criteria early on is vital. Analyst gains because tester provides experience of what can be measured to define when a story is done; tester benefits by gaining nature of requirements before diving head first into testing. Once acceptance criteria has been defined, and before the requirements are implemented, the analyst and tester sits with the devs along with the customer if available. The analyst describes the requirement and the business context, goes thru acceptance criteria. The tester then works with the devs to agree on a collection of auto acceptance tests that will prove that the acceptance criteria have been met. Short kickoff meetings like this are vital; prevents analyst from gold-plating or creating “ivory tower” requirements that are expensive to implement/test. Prevents testers from raising “false positives” – defects that really aren’t defects. Prevents devs from implementing something no one really wants. Throughout sprint devs will consult with analyst if they’re confused or if there’s a better way to solve the problem.

For a new team –

  • they should set up some simple ground rules, choose a tech platform and testing tools, an automated build, work out stories that follow INVEST principle -Independent, Negotiable, Valuable, Estimable, Small, Testable with acceptance criteria. Roles defined:
    • Customers/analysts/testers define acceptance criteria
    • Testers workw tih devs to automate acceptance tests
    • Dev code behavior to fulfill this criteria
    • If any automated tests fail – unit, component, acceptance – devs will make it a priority to fit.
  • Make sure customer / proj mgmt layer buys into this, so they don’t scrap the project – “too much time working on automated acceptance tests”. And each new acceptance criteria should clearly state the business value. “blindly automating badly written acceptance criteria is one of the major causes of unmaintanable acceptance test suites.” It should be possible to write an automated acceptance test proving that the value described is delivered to the user.
  • “Following the process we describe changes the way developers write code. Comparing codebases that have been developed using automated acceptance tests from the beginning with those where acceptance testing has been an afterthought, we almost always see better encapsulation, clearer intent, cleaner separation of concerns, and more reuse of code… this really is a virtuous circle, testing at the right time leads to better code.”

If midstream –

  • Start with automating high-value use cases – automate happy path tests. Manual testing will dominate – the moment you test the same function manually more than a couple times – if its not going to change – automate the test.

If legacy

  • See Michael Feathers, working effectively with legacy code – “systems that do not have automated tests.” Simple rule of thumb – test the code you change. Create an automated build process, then scaffold automated functional tests. Again target high value paths.
  • Legacy code is often not too modular and well structured – so lots of problems with changes in one part adversely impacting another one – meaning you’ll need to validate state of app at completion. Only write tests where it adds value – code that implements features of app, and then code that supports or framework. Most bugs will be in framework, so if you aren’t altering framework, little value in adding comprehensive testing.

Unit testing

  • This should run very fast, they recommend 10 minutes is about the limit, 90 secs ideal. They recommend Junit or NUnit to break down long-running tests.
  • Should not hit the db, filesystem, external systems, or (in general) interaction between components. (so they use test doubles or mocks)
  • Speed comes at a cost – they miss interaction between components.
  • Component tests is their phrase for integration testing.
  • Common for a bug here to slip through, issue in prod for 3 weeks they weren’t aware of, even with 90% unit test coverage. Fix was introducing simple, automated smoke tests proving the app could perform its most fundamental functions as part of release process.
  • Commit stage tests – run fast, as comprehensive as possible (75% or better), if any of them fails do not release. Environment neutral. In comparison, later stages are long running (think parallization), could still be a release candidate even if a test fails (i.e. it fixes a critical bug), and runs production-like environment to check against RM pipeline and prod env changes


Dealing with Technical Debt

Make a backlog visible to everyone. Your release can’t just show pass/fail, green/red – if its always red. Show the number of tests passed and failed, and graph them prominently.

  • Two approaches –
    • zero defects (in the past devs would ignore bugs, deferring them – technical debt. Huge list of bugs pile up. Even worse with no acceptance tests (because of not practicing CI) – team is overwhelmed by huge list of defects, arguments between testers/devs/mgmt., release dates slip, users saddled with buggy software
    • Treat defects the same way as features. Have users prioritize – a raredefect with a known workaround could be low priority and deferred. You could categforize as critical, blockers, medium, low. (often customers would rather not fix some bugs)
  • Barely mentions beta testing techniques like canary releases
  • Why automated testing
    • Performing manual build, test and deployment processes is boring and repetitive – far from the best use of people. People are expensive and valuable, and they should be focused on producing software that delights its users and then delivering those delights as fast as possible – not on boring, errorprone tasks like regression testing, virtual server provisioning, and deployment, which are best done by machines.


Nonfunctional Requirements

Almost every system has some kind of requirements on capacity and security, or the SLA. It makes sense to run tests to measure how well the system adheres to these requirements; deployments of what is acceptable are often subjective. Present the facts and allow a human to make go/no go release to prod. It is essential to start testing capacity/scaling as soon as possible so you hae an idea whether your app will be releasable.

If a RC fails to pass capacity testing, someone will decide whether its important enough to allow the candidate to be released.

“The crosscutting nature of NFSs makes them hard to handle both in analysis and in implementation. Yet they’re a frequent source of project risk. Discovering late in the project that the app won’t work because of a fundamental security hole or desperately poor performance is a frequent cause of late or cancelled projects. NFRs interact with one another in a very unhelpful matter – very secure systems compromise on ease of use, very flexible systems compromise on performance, etc. While in an ideal world the app will always be highly secure, performant, massively flexible, scalable, easy to use, support, simple to develop and maintain – every one of these characteristics comes at a cost.”

Availability, capacity, security, maintainability are every bit as important and valuable as functional ones, essential to a well functioning systems. The stakeholders a project should be able to make a priority call on whether to implement the feature that allows the system to take credit card payments vs the feature that allows 1000 concurrent users. One may be of more value than another. Its essential to identify these early in the project, the team then needs to find a way to measure it and incorporate regular testing into the pipeline. The team needs to think through the nonfx requts and the impact they have on the system architecture, schedule, test strategy, costs.

They recommend adding these as a specific set of stories or tasks at the beginning of the project. Specify enough detail that you can prioritize and do a cost-benefit analysis. It’s not enough to say “as fast as possible” – no cap on the effort or budget. Its easy to have poorly analyzed NFR’s constrain thinking which in turn leads to overdesign and inappropriate optimization. Devs in particular are generally bad at predicting where a performance bottleneck will be and make code unnecessarily complex in order to achieve doubtful performance gains. Premature optimization is the root of all evil. Use case, asynchronous message queue to display messages – meant to deal with surges of load. Errors picked up from queue, put in a memory list, then polled asynchronously in a separate thread before being placed in a second list, also polled – repeated 7 times. Paranoid focus on capacity but the problem was never there, the message queues was never flooded with errors. Remember YAGNI, You Ain’t Gonna Need It – do the minimum amt of work to achieve the result – guarding against overengineering. Optimizations should be deferred to the point where its clear they are needed. (Knuth’s dictum) We don’t recommend adding capacity testing to the acceptance test requirements. They should be a whole separate stage.

In short people tend to either ignore NFRs until its too late, or overreact with defensive architecture and overengineering. “Technical people are lured towards complete, closed solutions – solutions that are fully automated for as many cases as they can imagine. .. Operations people will want systems that can be redeployed and reconfigured without shutting down, whereas developers will want to defend themselves against every possible future evolution of the app, whether or not it will ever be required. …” they’re the software equiv of a bridge builder making sure that the chosen beams are strong enough to cope with weather and expected traffic. They’re real, and must be considered, but aren’t in mind of people paying for bridge: they just want something that can get them from one side of river to the other and looks nice. This means we need to guard against our tendency to seek technical solutions first. We must work closely with customers and users to determine sensitivity points of our app and define detailed nonfx reqts based on real business value. Then the team decides on the correct architecture and create reqts/acceptance criteria capturing the nonfx reqts in the same way that fx reqts are captured. That way they can be estimated and prioritized.





Azure DevOps Projects, from the amazing Donovan Brown.

Notes below from “Zero to DevOps” epic presentation by Donovan Brown showing what was behind the scenes: (note the first comment is about adding SSDT/SSIS as part of buildout, I would love this). This is a breakdown minute by minute of key points in his presentation – a little more detail than in my previous post on this. This pulls back the veil on the presentation he gave at PADNUG back in August of 2017 – showing there’s really no magic or months of tinkering happening behind the scenes.

  • If you are using Visual Studio – First go to Tools, Extensions and Updates – select Continuous Delivery Tools for Visual Studio. This allows you to r-click on the project and select Configure Continuous Delivery.
  • Create a new project in VSTS
  • Right click in VSTS and add a new project – in this case an Azure Resource Group.
  • Here you choose from a template – choose Web App.
    • Note all the cool template options. Docker host, Logic Apps, Linux VM’s – it’s all here. No need to reinvent the wheel!
    • VSTS will now generate all the tooling you need to work in Azure.
  • When this is done – check out website.json. These are all the resources you’ll need. On the left is how we’ll navigate. We can easily add or change parameters (see that json file) and destroy and recreate environments at will.
  • R-click in Visual Studio and select Configure Continuous Delivery. You’ll need to know where you are going to deploy to. Note his release project names – BikeSharing360, BikeSharing360D, …P, and ….Q.
  • 8:02 – Release definition – click on Releases tab. Template – click on browse, this is just the output – navigate window on the left. You can override the template parameters here and set it to a variable.
  • ARM templates – your Ops team can be on this in generating templates for Infrastructure as Code. A resource group, inside of it is a web app. Click on it in the portal – Continuous Delivery. This is a lot more mature than just doing a Git repo push. (note the difference between onprem TFS and VSTS – VSTS is updated every 3 wks, TFS every 3 months.)
  • 16:31 – YoTeam, Donovan’s pet project. It’s script based as Jez Humble seems to favor.
  • 20:55 – Handles Grunt, Bower with aplomb. Donovan typically uses Node, Mocha, Signon, Istanbul. This gives him good code coverage. “If you can do it in a CLI, I can put it in your build pipeline. You can even wrap it and add it to the Extensions library for everyone to use.”
  • 21:32 – Approval gates.
    • “note I said – automate everything you can. That’s a big difference from ‘Automate everything!’
  • 26:30 – Build agents – these can be installed behind your firewall. Code, resources never see the public internet – all behind the firewall.
    • Stakeholder licensing – does not count against 5 free teammates. That combined with free licensing for MSDN subscribers makes VSTS deployments a steal.
    • “No more ‘VSTS is in the cloud, it will only deploy to the cloud’ thinking”
    • “No more “We’re not a Microsoft/.NET shop”, or “All or nothing” – you can fold in Jenkins or OctopusDeploy or whatever. Use us to pay the integration tax.”

Website resources:

  1. first official announcement    
  2. Hub on Azure Deploy projects –, and an Ignite overview on the topic.


Super cool, need to look into Auto Hotkey for my demos. Thanks Donovan!

Walkthrough notes in creating deployments and the Azure Deploy Project

Recently I’ve been asked to do some complete demos of building out complete release pipelines similar to what Donovan and company have been doing for at least a year now. My craptop has been bottoming out lately and I’ve sworn to “walk the walk” when it comes to making the leap from Visual Studio local on my box to editing/pushing out code using Visual Studio Team Services (VSTS). As VSTS has changed quite a bit since I last looked at this, I thought I’d write up my walkthrough notes so you can do it yourself. Trust me – setting up CI/CD is now LAUGHABLY easy. There’s really no excuse not to try it with your new application.

If you want more information on ARM templates, setting up release definitions, build agents etc – check out the “Zero to DevOps” epic presentation by Donovan Brown showing what was behind the scenes. Note the first comment is about adding SSDT/SSIS as part of the buildout as a suggested feature, I would love this!

In Brief

  • First lets set up some code to import.
  • Create four websites in the Azure portal you want to point to. Let’s create a D, Q, and P set of sites.
  • Now let’s set up a build.
    • Set up build – ASP.NET. Call it “XXX_CI”, select repository
    • Click on the Trigger tab, and select “Enable CI”
    • Click “Save and Queue”
  • Associate this build with a release:
    • Release tab – create a new release definition. When it asks you for the target environment name, give it a name – “Dev” – Azure App Service Deployment template, and select Apply.
    • Click on Artifacts – select your project, and the “XXX_CI” build definition.
    • Enable CI by clicking on the lightning bolt on the artifact, top right.
    • On the Artifacts object – select the “dev” environment. Select the dropdown on the subscription to pull in your Azure subscription. Here we are going to point ot “Phoenix360D”, our destination dev website.
    • Then we edit the index.cshtml file and add some nonsense verbiage. Pull it up in the site – and voila! Any changes we are making flow instantly through to dev where they can be tested.

See below for notes on setting up multiple deployments and creating a DevOps Project.

Full Walkthrough Notes

Note there’s not one original step in here, I just walked thru the steps in this doc like a good, obedient zombie.

Creating Your Environments

First go into the Azure portal     

  1. Log onto
  2. Create three new websites by clicking New -> Web + Mobile -> Web App.

According to the notes here – – you only need to create a new app service plan if a given app is resource intensive or you need to scale it independently. That’s not the case here – we can use one app service plan for all four environments (XXXAPPNAME + D, Q, P). In contrast, the idea behind resource groups are – you update them as a group. They share the same lifecycle, permissions, and policies – you can update security to this batch of resources as a group for example. So we’ll be creating one app service plan, four diff resource groups. We’ll create three websites – see below for “Phoenix360D” – with the appendix -D, Q, P – dev, QA, production.

Depending on current demands Azure should spin each of these up in a few minutes. Now we’re good to go, all 4 environments have been spun up and are running on Azure. And we have a build running successfully.


Getting Our Build Started – Single Path

Next, we need some code to work with. If you don’t have your own, no worries, we can give you a very nice working sample complete with test scripts.

  1. Log onto VSTS and click on your Code tab. Import using – see the nice screenshot below:

  2. Once this is done – you should be on the Code tab – select the “Setup build”button on the right. Select “ASP.NET (PREVIEW)”, and select APPLY.


Side note, check out all the steps it stubs out for you below on the left. Whoa!

  1. Give the resulting template a good name– I chose Phoenix360_CI – and select your repository you just created. Here I’m using a Hosted 2017 build agent but you could also use your onprem TFS2017 build agent if you so desire.
  2. And, last, I select the repository we just imported:

  3. From here I can almost see the end of the tunnel – click on Triggers and enable the trigger for continuous integration. (Note you can also set scheduled builds at a particular time of day on this tab.)

  4. Click on Save & Queue, top right. Enter some notes on your commit, and on the popup window click Save & Queue again.

  1. If you’ve done this right – you’ll see the following in your build definition:


  1. Who, that build URL there is just begging to be clicked on. Let’s click on this:

  1. To test if this is working – go into Code again and make a hand edit to a web.config file, in the header. If we’ve done this right, we should be seeing a build kick off after our commit of this change:

Click on Build and Release tab. Sure enough, our code commit triggered a build:

  1. This is really quite nice – click on the latest build, and select the Test tab for example. It shows us the tests and the run length:





  1. Click on the Release tab and add a new release definition. There’s a few templates to choose from here but it’s definitely easy to start with a precreated template vs rolling your own. Let’s click on “Azure App Service Deployment” and select Apply.

  1. Don’t get overwhelmed – just click on the Add Artifact option. Enter in the following values – the Build Definition you created earlier. Note the different version options as well in the dropdown:

  1. On the Artifacts node you just created – left side – notice that little lightning bolt on the top right? That’s our continuous integration trigger. Let’s click on this and make sure CI is all set up:


    And then click again on that Artifacts node on the right, and set up your environment including the destination endpoint:


    Create a new release – as you see below – and save it to the default folder. We’re golden!



    Does it work? Let’s go into the code view and make a change. It should populate out to dev:








Setting Up Multiple Deployment Paths

Continuing with this wiki:


See below. Clone your dev item – and set it up so the pre-event trigger (lighting bolt, left side) is set to “After environment”. This is also where you can set up approvers and manual stage gates.



Clicking on the pre deployment conditions lets me set the deployment to trigger after the environment is ready or based on a release trigger (i.e. a simultaneous rollout to DEV/QA). You could also set your production rollouts to a less busy time of day for example.

Then I go into each task for the new cloned environments above and change the deployment pointer to QA (or Prod).

Let’s get fancy and change the deployment to prod to be manual.

Now when I create a release – look how nice this is:


And sure enough when it hits prod I get this nice little alert that I need to review the changes and approve a move to prod.


Sure enough, now any changes to my source control kicks off a full set of releases out to all 3 environments. Noice!!!!






DevOps Projects

Log in to VSTS.

Create a new DevOps Project. New (top left), Everything, and filter by “devops”. You should see the DevOps Project below appear.

Let’s select .NET below. But we could import our app or use a ASP.NET site based on PHP, NodeJs, Java, etc.


On the next screen choose either ASP.NET or ASP.NET Core. Select Web App in the next window – it’s the only option as yet. Lastly choose your existing MSDN subscription – assuming you have one – and a new project name.


I’ll next see a “Deployment in Progress” notice in the taskbar. Super cool!

… and at last I get this shininess.


There’s no magic going on here. You can browse to the Build definition and inspect what its doing – and then click on the Dev release and edit the properties. See? It’s exactly what we created before, manually.

Really I’m very glad that I took the time to do this manually first. It really gives me more of a comfort level when it comes to setting up release pipelines manually.


Dead ends and miscellany

An annoying issue was in the Artifacts, when I’m trying to point to the correct environment, it kept blowing up with “Couldn’t authorize to AAD. If popup blocker is enabled in the browser, disable it and retry again.” I tried changing adblocker in Chrome but that didn’t fix anything; same with Edge. But, classic IE got me a little further. This doc gave two options – I tried to log in but the “User May Add Integrated Applications” under the classic portal was already set to Yes. Tried again in Chrome, went to Release and tried adding a new Azure Resource Manager Service Endpoint. Turns out that wasn’t what it needed anyway.

I also had some subscription issues, where my default directory needed to be changed – that was really killing this walkthrough. Good news was – I submitted a ticket, Sev A, and got a call back from a very competent subscriptions helpdesk person in about two hours. Excellent. Really, I was quite impressed, it totally fixed a very longstanding issue.


Helpful sites to do this yourself

Connect KeyNote with Brian Harry. Devops Projects – awesome scaffolding for your release management project!

Still going through videos from Connect, there’s a lot of stuff to wade through! Definitely enjoyed Brian Harry’s keynote address – especially the awesome Abel Wang helping out as copresenter. Here’s my notes.

The takeaway stuff is this –

  1. you can use Azure DevOps Projects to create a fully functional CI/CD pipeline as a starting point to any project, then extend it. This is way cool and I can’t believe it hasn’t existed before. The release visualizations are definitely top knotch.
  2. YAML is now supported. (Several of my customers have been asking about this!)
  3. There’s some real goodies here about how Microsoft handles its releases – hint its not Dev-> Prod with one click, using scale units.
  4. Automated gateways are now possible in VSTS. This is definitely a huge win…

In more detail:

  • (roughly 1 min in) – Food for thought: “DevOps is all about shortening feedback lops… automated deployments are often the first thing we think about.” There’s a lot of plumbing though – it can be daunting.
  • (minutes into broadcast) 4:30 – Azure DevOps Projects- easily create a full end to end RM pipeline, using Node.Js, .Net, Java, PHP, Python.
  • 5:59 – dashboarding – click on links for code / build/dev pipeline. To customize, clone it onto your HD – delete all files, copy in your code, then push it back to VSTS using Git. Easy!

  • 13:44 – YAML support now included in our CI system
  • 14:41 – No one actually pushes a button and code goes out the door to production – “I call that the fastest way of deploying bugs to your customers.” At Microsoft we have 15-20 different scale units (subsets of customers). We use Release Management to gradually roll these out across environments. First we roll out to one scale unit, watch twitter for sentiment downturns, check feedback, use it – etc. Then we wait 24 hours before deploying to the next ring. That’s responsible CI/CD. If we have a blocking bug – we pull the cord and roll back.
  • 23:00 – demo of build agents running natively on all 3 platforms – Win/Linux/Mac. You could use one release to all 3 environments if you wanted to. I thought this was amazing:

  • 26:18 – automated gate creation. These are automatically created post deployment monitors – using Azure monitor, functions, REST API, and service bus to stress test/check your new system’s health.

  • 27:48 – creating a YAML build
  • 32:27 – fork into private repo vs a branch.


For a full writeup including a new walkthrough on Azure DevOps Projects, click here. There’s a quickstart here.

Full list of Connect DevOps vids and my writeups: (this will grow)


New VSTS features coming up – hawt fresh Agile changes y’all!

Connect() 2017 is all done and wrapped up for the season. If you weren’t able to make it – as I wasn’t (sniffle) – all the content is available on demand. Click here for an overall list of DevOps focused talks.

I wanted to post a little about one of the great webcasts I viewed this morning, Agile Project Management with VSTS, with Aaron Bjork and Sondra Batbold. This is a really great walkthrough of the full capabilities – including some hawt new features – coming up in Visual Studio Team Services (VSTS). Below are the key features I noticed – broken down by where they appear in the webcast so you can skip to the good stuff.

  • 5:09 – Notice the custom Kanban board, with columns for Backlog | Dev Design | Implementing | Code Review and Verify | Closed. There’s a definition of done showing the team’s standards on the info icon – in this case “doing” means fully designed and implementation started; “done” means unit tests written, fx tests updated, and its ready for code review. Nice as well to show the WIP limit on the top right. (Side comment, I love Kanban and how it helps us avoid the myth of multitasking by limiting our Work in Progress. I actually use this at home so I don’t get overwhelmed with my chores around the farm! I do feel, very strongly, that Kanban should be the default starting place and maybe the endpoint for 90% of the teams out there struggling with their Agile implementation.)

  • 6:40 – using swimlanes to separate out important items. (Settings icon, Board > Columns)
  • 8:05 – Setting a styling rule to have high priority bugs turn red (for example). You can also add tags, if the priority is high enough – and highlight in pink.

  • 10:11 – Click on lower left corner of board to add tasks
  • 14:14 – “my activity” query for busy project managers off the Work Items hub.
  • 14:42 – Scrum team setup with 1 week sprints. Notice the division of work here, from New | Next Sprint | Current Sprint | End Game | Closed.

  • 17:02 – Most scrum teams focus on velocity – the forecasting feature.

  • 19:38 – Adding a column to the backlog (customizing display)
  • 20:59 – Capacity planning. Note what it says at 21:34 – “Note this feature is for you and your scrum team, not for management to look down on you. This allows you to make a strong commitment to the upcoming sprint.”

  • 22:15 – task board and burndown chart you can use on a monitor in your daily standups (DSU’s)
  • 23:49 – filter by person (to show your work only for example, I use this all the time)
  • 24:15 – dashboards. Check out the list of widgets in this nice display –
    • current sprint
    • burndown
    • cycle time (closed / new / active) – i.e. “how long it taking us to start working on an item”? this is a key pain point mentioned in the Phoenix Project.
    • Stories by state
    • Team velocity – in this example it shows the team improving in their completion rate by doing better planning.
    • KPI’s – including Open User Stories, Active Bugs, Active Tasks, Ready for Testing, Completed User Stories

  • 25:38 – Very configurable new burndown chart vs the OOTB widget.
  • 28:31 – Delivery Plans – a new feature showing work across all teams. In this case we’ve got three teams working on different schedules. You can expand this to dig into work being done by a specific team, and zoom in/out.
  • 31:29 – Plans – You could put a specific milestone – say a release date – on the chart.

  • 32:19 – How does Microsoft use delivery plans with their product teams? In the VSTS case, the leads for all 4 teams meet regularly. They talk about what’s currently going on, what’s 3 weeks out. There’s a lot of “A-Ha!” moments here as cross dependencies get exposed. (Pro tip – use “T” to show succinct view)
  • 33:32 – new Wiki feature. (Could this take the place of an emailed retrospective?) You could add a new sub page, etc. Very customizable, I like it. Use a pound (#) to add a reference to another work item.

  • 35:53 – Add a new work item type to a custom template inherited from the standard Agile template. In this sample they force people to add a custom color and a icon to a new work item to visually differentiate it from others. (I’m questioning this one, does this really add value?)
  • 38:43 – Adding a “followup owner” so code reviews are enforced.
  • 40:30 – Queries are simplified and redesigned
  • 45:00 – Customizing the dashboard, in this case show a different color if WIP is excessive.
  • 47:15 – I love this part – Extensions. There’s a lot of custom extensions for builds, burndowns, etc. They walk through two paid extensions, one for the Backlog Essentials (quick in-place edits of a work item from the list, why isn’t this standard??!) and TimeTracker (for orgs that want to report/track time on dev hours) These are all available from the shopping cart icon, top right in VSTS. Note you need to add the Analytics extension to really kick up your burndown chart’s capabilities, see Greg Boer’s recent presentation on Channel9 including PowerBI features on Channel9.


  • 51:12 – Q&A:
    • Can we display a burndown chart across projects? (not yet, but soon) Note the comment at 54:13 – “I will tell you – we recommend one cadence to rule them all. We run on a 3 week cadence for our 700 people. It adds so much simplicity and clarity when we’re talking about dates.”
    • View Only (vs modify) permissions yet? (that’s coming also, we are working on joining multiple accounts together so we can view on an org level). Note on permissions, MSFT uses Area Path permissions for security to hide work on sensitive projects (a la HoloLens)
    • Hey, there’s lots of clutter on my PBI’s. Can we clean this up? (We’re working on a Personal view so you can pin only the fields a particular person is working on.)

Anyway that’s a lot of content for me to go through and think about. Should keep me busy for the next week or so as work on my book progresses!

Other Connect sessions I will be checking out –




Release Management

Source Control