Agile

New VSTS features coming up – hawt fresh Agile changes y’all!

Connect() 2017 is all done and wrapped up for the season. If you weren’t able to make it – as I wasn’t (sniffle) – all the content is available on demand. Click here for an overall list of DevOps focused talks.

I wanted to post a little about one of the great webcasts I viewed this morning, Agile Project Management with VSTS, with Aaron Bjork and Sondra Batbold. This is a really great walkthrough of the full capabilities – including some hawt new features – coming up in Visual Studio Team Services (VSTS). Below are the key features I noticed – broken down by where they appear in the webcast so you can skip to the good stuff.

  • 5:09 – Notice the custom Kanban board, with columns for Backlog | Dev Design | Implementing | Code Review and Verify | Closed. There’s a definition of done showing the team’s standards on the info icon – in this case “doing” means fully designed and implementation started; “done” means unit tests written, fx tests updated, and its ready for code review. Nice as well to show the WIP limit on the top right. (Side comment, I love Kanban and how it helps us avoid the myth of multitasking by limiting our Work in Progress. I actually use this at home so I don’t get overwhelmed with my chores around the farm! I do feel, very strongly, that Kanban should be the default starting place and maybe the endpoint for 90% of the teams out there struggling with their Agile implementation.)

  • 6:40 – using swimlanes to separate out important items. (Settings icon, Board > Columns)
  • 8:05 – Setting a styling rule to have high priority bugs turn red (for example). You can also add tags, if the priority is high enough – and highlight in pink.

  • 10:11 – Click on lower left corner of board to add tasks
  • 14:14 – “my activity” query for busy project managers off the Work Items hub.
  • 14:42 – Scrum team setup with 1 week sprints. Notice the division of work here, from New | Next Sprint | Current Sprint | End Game | Closed.


  • 17:02 – Most scrum teams focus on velocity – the forecasting feature.

  • 19:38 – Adding a column to the backlog (customizing display)
  • 20:59 – Capacity planning. Note what it says at 21:34 – “Note this feature is for you and your scrum team, not for management to look down on you. This allows you to make a strong commitment to the upcoming sprint.”

  • 22:15 – task board and burndown chart you can use on a monitor in your daily standups (DSU’s)
  • 23:49 – filter by person (to show your work only for example, I use this all the time)
  • 24:15 – dashboards. Check out the list of widgets in this nice display –
    • current sprint
    • burndown
    • cycle time (closed / new / active) – i.e. “how long it taking us to start working on an item”? this is a key pain point mentioned in the Phoenix Project.
    • Stories by state
    • Team velocity – in this example it shows the team improving in their completion rate by doing better planning.
    • KPI’s – including Open User Stories, Active Bugs, Active Tasks, Ready for Testing, Completed User Stories

  • 25:38 – Very configurable new burndown chart vs the OOTB widget.
  • 28:31 – Delivery Plans – a new feature showing work across all teams. In this case we’ve got three teams working on different schedules. You can expand this to dig into work being done by a specific team, and zoom in/out.
  • 31:29 – Plans – You could put a specific milestone – say a release date – on the chart.

  • 32:19 – How does Microsoft use delivery plans with their product teams? In the VSTS case, the leads for all 4 teams meet regularly. They talk about what’s currently going on, what’s 3 weeks out. There’s a lot of “A-Ha!” moments here as cross dependencies get exposed. (Pro tip – use “T” to show succinct view)
  • 33:32 – new Wiki feature. (Could this take the place of an emailed retrospective?) You could add a new sub page, etc. Very customizable, I like it. Use a pound (#) to add a reference to another work item.

  • 35:53 – Add a new work item type to a custom template inherited from the standard Agile template. In this sample they force people to add a custom color and a icon to a new work item to visually differentiate it from others. (I’m questioning this one, does this really add value?)
  • 38:43 – Adding a “followup owner” so code reviews are enforced.
  • 40:30 – Queries are simplified and redesigned
  • 45:00 – Customizing the dashboard, in this case show a different color if WIP is excessive.
  • 47:15 – I love this part – Extensions. There’s a lot of custom extensions for builds, burndowns, etc. They walk through two paid extensions, one for the Backlog Essentials (quick in-place edits of a work item from the list, why isn’t this standard??!) and TimeTracker (for orgs that want to report/track time on dev hours) These are all available from the shopping cart icon, top right in VSTS. Note you need to add the Analytics extension to really kick up your burndown chart’s capabilities, see Greg Boer’s recent presentation on Channel9 including PowerBI features on Channel9.

 

  • 51:12 – Q&A:
    • Can we display a burndown chart across projects? (not yet, but soon) Note the comment at 54:13 – “I will tell you – we recommend one cadence to rule them all. We run on a 3 week cadence for our 700 people. It adds so much simplicity and clarity when we’re talking about dates.”
    • View Only (vs modify) permissions yet? (that’s coming also, we are working on joining multiple accounts together so we can view on an org level). Note on permissions, MSFT uses Area Path permissions for security to hide work on sensitive projects (a la HoloLens)
    • Hey, there’s lots of clutter on my PBI’s. Can we clean this up? (We’re working on a Personal view so you can pin only the fields a particular person is working on.)

Anyway that’s a lot of content for me to go through and think about. Should keep me busy for the next week or so as work on my book progresses!

Other Connect sessions I will be checking out –

General:

Database

Containers

Release Management

Source Control

Testing

 

Advertisement

DevOps – how we do it at Microsoft, with Aaron Bjork.

Two great interviews I found today and enjoyed very much, between Donovan Brown and Aaron Bjork (who is head of the DevOps part of VSTS).

From the first video:

  • We treat our development teams as adults – “I’ll trust you until you give me reason not to.”
  • How does Microsoft structure their product teams?
  • How do we keep to a consistent UI? i.e. checkboxes versus radio buttons – what design guidelines do we use?
  • How to tie together 50 teams with one weekly meeting of six people
  • How big of an advantage it is to dogfood it – be your own first customer.
  • “Our engineers write code, test code, and deploy code.”
  • Feature flags – we don’t use these for just one customer. The boundary to turn on the flag is during the current sprint, and then turn off the following sprint.

Second video notes:

  • (minutes into program) 1:27 – Why 3 week sprints? Goldilocks principle. Donovan mentions a physician product owner where 4 weeks was the best he could get.
  • 5:46 – Dogfooding, and customer driven features.
  • 13:09 – We used to have all these plans that helped me sleep at night – but did we ever ship on time? (i.e. waterfall and requirements planning as a security blanket)
  • 16:32 – importance of monitoring. DevOps is not automation. Most of their DevOps discussions end with Agile.
  • 19:03 – distrust between Dev and Ops. The chasm grows. Now we are improving with TDD, unit testing – but the distrust is still there.
  • 20:27 – treat your deployment pipeline as a feature of your product.
  • 27:54 – Do we deliver mid-sprint at Msft?
  • 32:57 – “You can’t cheat shipping.” Scenarios, features, stories and tasks.
  • 39:10 – Feature flag litter and documentation.

DevOps – are you stuck on where to start? Building out 4 Release Pipelines in an Hour with Donovan Brown

Such, SUCH a great treat having Donovan come all the way from Houston for this presentation. He’s a fantastic presenter and his hands-on knowledge blows my socks off. I can’t thank him enough for making the trip.

IF YOU ATTENDED and you enjoyed the presentation – or even if you didn’t enjoy it – WE NEED YOUR HELP!
Take a few minutes and click on this link – it will tell us how we can improve next time, and hopefully lay the groundwork for Donovan to come back in a year or so.

There’s a streaming video available here of Donovan’s Portland Area .NET User Group (PADNUG) presentation. It was outstanding! If you want to see one CI/CD release pipeline built out releasing a website to Azure in one hour – well sorry, that is not what he did. Instead, Donovan built out FOUR pipelines in an hour – using Node.js, Java, .Net and .Net Core., while describing step by step what he was doing and why – using VSTS, the Azure Portal, Visual Studio, and a cool command-line tool I hadn’t heard about called Yo Team!

The key takeaways from his presentation are as follows:

  • If you only get one thing from our time together – Microsoft is for any language, any platform.
  • Licensing costs should not be a blocker for us. It’s like $10/person per month, and that’s only for large teams over 5. And with MSDN you get these benefits included.
  • Lastly, if you attended and enjoyed the presentation – give a shout out to Donovan on Twitter. He’s thrown down the gauntlet – “we have the best build and release tools on the market, if you don’t agree reach out to me on Twitter and be specific!” So, if you don’t think Microsoft’s build tools will work with Jenkins, or Java apps, Node.js, SonarQube etc – throw down and enter the squared circle my man!

My notes are below. These aren’t notes of his presentation as they were no-slide demos, but comments Donovan made during the day that gives DevOps context to his presentation. As the saying goes, “If I had more time it would have been shorter!”

 

 

 

Thoughts on Strategy

Make incremental changes in ramping up your maturity. It’s safer. If you are deploying manually, there’s one part that you fear the most. Focus on that.

“If it hurts, do it more often.” (You are incurring more risk by deploying less frequently.) Lean forward into that and think about, how can I fix this. Focus on scariest part first.

If you need prescriptive – hire a consultant. (Like, I don’t know, Microsoft Premier for Developers maybe!) They can ask questions, pinpoint pain points for your org and provide a specific roadmap.

Why do you have to ask permission from your manager? Your job as an engineer is to continuously deliver value. You should need to ask permission if you DON’T want to set up CI. No need to stop and learn for months – focus on what hurts the most, and focus on that one thing. What is the thing – every time you run a deployment – why it breaks. Otherwise – you end up paying a tax doing it manually. Find something small to focus on – don’t try to do it all at once. It might take you months – after which you may have an amazing portion of your pipeline that was manual that is now automated. That will get people excited. And after that last build that blows up when people start playing the “Not it” game – guess what, you get to go home early!

And if you’re stuck on exactly where do we start? – http://donovanbrown.com/post/How-do-we-get-started-with-DevOps (really, check this post out. It’s good enough to mention twice!)

 

Agile Is Your Starting Place

The key question is – “Can you produce releasable bits of code?” Until your Agile processes are strong – not perfect but mature – you can’t see significant gains with DevOps. The business needs to understand that we need to produce increments of shippable code. DevOps is the “second decade of Agile.” Once Agile is in place you will realize that you’re not able to ship as fast as you can produce features.

What limits us so far – four years after “The Phoenix Project?” Really it’s still slowed by adoption of Agile. 40% of orgs have adopted Aguke. That’s a dead number, as its only dev teams that are championing this. The rest of the org is expecting waterfall, requirements, dates.

DBA’s are frequent blocking point. If you slice vertically and not horizontally. I’ll give you 3 months, if you promise not to make any further changes to the schema, ever. “That’s crazy!” “If you’re going to change it anyway why should I invest 3 months in this now. Your first step could start with login functionality – in 3 weeks we have a login page, username and credentials, authentication and identification.” Don’t let a recalcitrant DBA stop you in your tracks.

One week sprints really worked well (at one company) for Donovan in his consulting past- with product owner/business owner. After 8 weeks, needed to get a knowledgeable person (replacing old business owner) – and a sigh of relief, finally back on track. And hire a scrum master. That person needs to be able to say no to the boss. For a good product owner, all they have to do is understand the business they are trying to serve – and tell me of two backlog items which one is a priority. Scrum Masters must protect team from unrealistic expectations, on this date, I want these features – perfect quality. Quality is not negotiable – either move date or lose features.

 

Database Goodies

SQL Server Data Tools (SSDT) – version every object on SQL Server including permissions and roles. Stored in a database project. (this is free. Evaluate, see if it meets your need.) An amazing tool, been around for 10 years, very little known – our best kept secret! If SSDT for some reason isn’t scaling or working for you – ReadyRoll from RedGate is also a good option.

 

Last but not least – this amazing site brought to my attention from Jorge Segarra. An awesome resource for all things DevOps for you DBA guys! https://www.microsoft.com/en-us/sql-server/developer-get-started/sql-devops/

This may require breaking up into diff schemas (like a virtual namespace) vs one huge project. Schemas are here now and available; will allow you to segment databases that you create – no 3000 tables/etc in one untranslatable blob. (BACPAC is schema and data, DACPAC is just schema. Can run post/pre scripts- make sure its idempotent so if it runs twice you don’t double your db size!)

Migrating your CRUD statements from sprocs to Entity Framework may be a win to move away from thousands of unwieldy and clunky INS/UPD/Del type statements.

 

Integration Testing (Selenium etc)

DevOps – one company said “DevOps = Automation”. So they had thousands unit and integration tests. Code would flow with no human intervention from commit to production. As soon as they release a CSS change though to prod, they start losing money. As soon as they started looking at ecommerce website – summary page made final buttons and the text the same color- this resulted in a blank button! Automation clicked it because the ID of the button did not change. So, automate everything you CAN – don’t try to automate things you shouldn’t.

Best practices for building integration tests: the #1 thing is making sure assumptions you made when you recorded your automation are still valid. Usually the database in QA is the failing point: the code isn’t broken but the data it runs against doesn’t fit assumptions. See Redgate’s excellent tool SqlClone here.) They are a few versions behind with their Selenium driver, using IE.

“We spend more and more time with peer testing than we do with auto testing on the Microsoft VS team. As we’re our own first customer, there’s a lot of A/B testing that happens. Integration testing happens very little, unless there’s a performance issue – then we move it to GoBig environment where we can perf test.” Per Brian Harry – “Unapologetically we do testing in production.”

Donovan ran a team in Akron where they were not allowed to write a line of code until they wrote a UI test (in CodedUI, but now we do this in Selenium). They wrote automation before they wrote code – wrote so it fails, then refine until it passes. This and a manual test case – which can be given to a stakeholder in plain English (“oh, I didn’t expect I should be pressing that button”) – there’s nothing better than a manual test case to be your new acceptance criteria. Automation and UI tests are brittle – they become brittle/break, that didn’t go away, but having it there forced my teams to do the right thing early. (These were one week sprints, Weds to Weds.) Do it more often if it hurts! Finding clever ways of making BACPAC/data layer work.

VS teams – forced them to move more functionality to behind the scenes, UI testing. Make unit testing viable. To those nerd architects out there claiming that “Unit tests are worthless”! – we run 41000 unit tests in VSTS and it takes 6 minutes. You can’t run integration tests that fast.

 

Infrastructure as Code (ARM templates) and Configuration Management (PowerShell DSC)

ARM (Azure Resource Manager) is the name of the game here to get Infrastructure as Code working. This is a much better process than spinning up resources manually for example. (Not configuration here – doesn’t include IIS, etc. )

Recommendation: Introduce your Infra team to ARM templates. ARM templates are Azure only but you can use Terraform to point to Azure/multi-cloud solution so you can move resources to Google/AWS etc.

Configuration as Code – using PowerShell DSC (Desired State Configuration) – take configuration of that server and you codify it as well. (I demand you have port 80 open, .NET 4.5, IIS available, these security roles etc.) Server installs everything you need to bring it to ready state. Every 15 minutes – checks to make sure it fits standard config. So, if it finds that IIS is turned off, it will detect if the configuration is drifting and make a note in the log. You can also configure it to bring it back to standard state.

PowerShell DSC works for both Linux and Windows environments. And, it’s free! (p.s. Chef uses DSC as well so it’s a layer on top of a layer.)

Recommendation: Start with what you have and what’s free with PowerShell DSC and THEN make an evaluation if there’s features that are missing. This could take <1 hour. From Donovan – “I encourage you to ping me on Twitter if DSC is not getting the job done. If we can’t fix it fast enough, then go to Chef or a competitor.”

DSC can be done onprem/etc. It’s idempotent, which means you can run it blindly a million times in a row and the server will stay exactly the way it needs to.

 

 

Self Service VM’s with Dev/Test Labs

Struggling with self service VM’s and exposing costs of resources to the consumers you service? (I.E. that 5 9’s coming at a horrendous price that is shielded from end users/business?) Think about Dev/Test Lab. This is Self service – no blank check, you can spin up a sandbox (using allocated $ your customers buy each month) and enforces the VM images they are allowed to spin up.

Dev/Test Lab documentation – https://docs.microsoft.com/en-us/azure/devtest-lab/

Overview and trial site hub – https://azure.microsoft.com/en-us/services/devtest-lab/

Starting documentation, very good – https://docs.microsoft.com/en-us/azure/devtest-lab/devtest-lab-overview

 

Microservices

Of advantage because releases now are micro sizes. No more having to track down 6 months back with a dictionary of changes. Would eliminate webmethods – ESB. Can rev very quickly on microservices without jeopardizing the application as its very small, atomic pieces of functionality.

Meant to be combined with containers.

Not a silver bullet. Evaluate it. Microsoft is going there on VSTS team. Docker support is built into VSTS – can publish to a registry. Even TFS 2015 update 3-4 has Docker support as well. Need to also run Win Server 2016 to support containers, or Linux containers. Anniversary edition of Windows 10.

 

Starting and Scaling DevOps in the Enterprise – review

As many of you know, I’m a huge fan of the work Gary Gruver has done – in particular his book “Leading the Transformation” on his experiences at HP trying to transform a very traditional enterprise. (See my earlier mention of his book on this blog, here.) His newest work is out – Starting and Scaling DevOps in the Enterprise. I am recommending it very highly to all my customers that are following DevOps! I think its unique – by far the best I’ve read so far when it comes to putting together specific metrics and the questions you’ll need to know in setting your priorities.

Gary notes that there are three types of work in an enterprise:

  1. New work – Creating new features or integrating/building new applications
    1. new work can’t be optimized (too much in flux)
    2. Best you can hope for here is to improve the feedback loop so you’re not wasting time polishing features that are not needed (50%+ in most orgs!)
  2. Triage – finding the source of defects and resolving
    1. Here DevOps can help by improving level of automation. Smaller batch sizes means fewer changes to sort through when bugs crop up.
  3. Repetitive – provisioning environments, building, testing, configuring the database or firewall, etc.
    1. More frequent runs, smaller batches, feedback loop improved. All the DevOps magic really happens in #2 and #3 above as these are the most repetitive tasks.

Notice of the three types above – the issues could be in one of five places:

  1. Development
    1. Common pain point here is Waterfall planning – i.e. requirements inventory and a bloated, aging inventory)
  2. Building Test Environments
    1. Procurement hassles across server, storage, networking, firewall. Lengthy handoffs between each of these teams and differing priorities.
    2. Horror story – 250 days for one company to attempt to host a “Hello World” app. It took them just 2 hours on AWS!
  3. Testing and Fixing Defects – typically QA led
    1. Issues here with repeatability of results (i.e. false positives caused by the test harness, environment, or deployment process)
    2. Often the greatest pain point, due to reliance on manual tests causing lengthy multi-week test cycles, and the time it takes to fix the defects discovered.
  4. Production Deployment – large, cross org effort led by Ops
  5. Monitoring and Operations

The points above are why you can’t just copy the rituals from one org to another. For any given company, your pain points could be different.

 

So, how do we identify the exact issue with YOUR specific company?

  1. Development (i.e. Requirements)
    1. Metrics:
      1. What % of time is spent in planning and documenting requirements?
      2. How many man-hours of development work are currently in the inventory for all applications?
      3. What % of delivered features are being used by customers and fit the expected results?
    2. An important note here – organizations often commit 100% of dev resources to address work each sprint. This is terrible as a practice and means that the development teams are too busy meeting preset commitments to respond to changes in the marketplace or discoveries during development. The need here is for education – to tell the business to be reasonable in what they expect, and how to shape requirements so they are actual minimum functionality needed to support their business decisions. (Avoid requirements bloat due to overzealous business analysts/PM’s for example!)

  1. Provisioning environments
    1. Metrics:
      1. How much time does it take to provision environments (on avg)
      2. How many environments are requested per month/sprint
      3. % of time these environments require manual fixing before they are complete
      4. % of defects associated with non-code – i.e. environments, deployments, data layer, etc.
    2. The solution here for provisioning pinch points is infrastructure as code. Here there is no shortcut other than developers and IT/operations working together to build a working set of scripts to recreate environments and maintaining them jointly. This helps with triage as changes to environments now show up clearly in source control, and prevents DEV-QA-STG-PROD anomalies as it limits variances between environments.
    3. It’s critical here for Dev and Ops to use the same tool to identify and fix issues. Otherwise strong us vs them backlash and friction.
    4. This requires the organization to have a strong investment in tooling and think about their approach – esp with simulators/emulators for companies doing embedded development.

  1. Testing
    1. Metrics
      1. What is the time it takes to run a full set of tests?
      2. How repeatable are these? (i.e. what’s the % of false errors)
      3. What % of defects are found with testing (either manual, automated, or unit testing)
      4. What is the time it takes to approve a release?
      5. What’s the frequency of releases?
    2. In many organizations this is the most frequent bottleneck – the absurd amount of time it takes to complete a round of tests with a reasonable expectation the release will work as designed. These tests must run in hours, not days.
    3. You must choose a well-designed automation framework.
    4. Development is going to have to change their practices so the code they write is testable. And they’ll need to commit to making build stability a top priority – bugs are equal in priority (if not higher than) tasks/new features.
    5. This is the logical place to start for most organizations. Don’t just write a bunch of automated tests – instead just a few automated Build Acceptance Tests that will provide a base level of stability. Watch these carefully.
      1. If the tests reveal mostly issues with the testing harness, tweak the framework.
      2. If the tests are finding mostly infrastructure anomalies, you’ll need to create a set of post-deployment tests to check on the environments BEFORE you run your gated coding acceptance test. (i.e. fix the issues you have with provisioning, above).
      3. If you’re finding coding issues or anomalies – congrats, you’re in the sweet spot now!
    6. Horror story here – one company boasted of thousands of automated tests. However, these were found to not be stable, maintainable, and had to be junked.
    7. Improve and augment over time these BATs so your trunk quality gradually moves closer to release in terms of near-produciton quality.
      1. Issue – what about that “hot” project needed by the business (which generally arrives with a very low level of quality due to high pressure?
        1. Here the code absolutely should be folded into the release, but not exposed to the customer until it fits the new definition of done: “All the stories are signed off, automated testing in place and passing, and no known open defects.”

  1. Release to Production
    1. If a test cycle takes 6 weeks to run, and management approval takes one day – improving this part just isn’t worth it. But if you’re trying to do multiple test cycles a week and this is the bottleneck, absolutely address this with managers that are lagging in their approval or otherwise not trusting the gated testing you’re doing.
    2. Metrics
      1. Time and effort to release to production
      2. Number of issues found categorized by source (code, environment, deployment process, data, etc)
      3. Number of issues total found in production
      4. MTTR – mean time to restore service
      5. # of green builds a day
      6. Time to recover from a red build
      7. % of features requiring rework before acceptance
      8. Amt of effort to integrate code from the developers into a buildable release
    3. For #1-4 – Two areas that can help here are feature toggling (which you’ll be using anyway), and canary releases where key pieces of new functionality are turned on for a subset of users to “test in production.”
    4. For #5-6 – here Continuous Integration is the healer. This is where you avoid branching by versioning your services (and even the database – see Refactoring Databases book by Scott)
    5. For #7-8 – If you’re facing a lot of static here likely a scrum/agile coach will help significantly.

 

So – how to win, once you’ve identified the pain points? You begin by partitioning the issue:

  • Break off pieces that are tightly coupled versus not developed/tested/deployed as a unit. (i.e. HR or Purchasing processes)
  • Segment these into business critical and non-business critical.
  • Split these into tightly coupled monoliths with common code sharing requirements vs microservices (small, independent teams a la Amazon). The reality is – in most enterprises there’s very valid reasons why these applications were built the way they are, You can’t ignore this complexity, much as we’d like to say “microservices everywhere!”

I really admire Gary’s very pragmatic approach as it doesn’t try to accomplish large, difficult things all at once but it focuses on winnable wars at a company’s true pain points. Instead of trying to force large, tightly coupled organizations to work likely loosely coupled orgs – you need to understand the complex systems and determine together how to release code more frequently without sacrificing quality. Convince these teams of DevOps principles.

DevOpoly!

This is the fourth of a series on DevOps. The first focused on the three ways explored in the Phoenix Project, and I stuck in some thoughts from the Five Dysfunctions of a Team by Lencioni. The second discussed the lessons taught by GM’s failure in adopting Toyota’s Lean processes with their NUMMI plant. The third went through some great lessons I’ve learned from “Visible Ops” by Gene Kim.

“The single largest improvement an IT organization can benefit from is implementing repeatable system builds. This can’t be done without first managing change and having an accurate inventory. When you convert a person-centric and heavily manual process to a quick and repeatable mechanism, the reaction is always positive. Even a partially automated release/build process greatly improves the ability for individuals to be freed from firefighting and focus on their areas of real value. And by making it more efficient to rebuild than repair, you also get much faster systems downtime and significantly reduced downtime.” (Joe Judge, Adero)

 

 

So I am putting together a presentation for PADNUG tomorrow on DevOps. I’ve reworked this presentation like three times, and I’ve never been very happy with it. Let’s just say Steve Jobs would have rolled his eyes at something like this:

Look at that crap above. I mean, there’s information here – but way too MUCH information. There’s no way any audience is going to absorb this. I’ll lose them halfway through the second bullet point.

So, I was struggling with this a few weeks ago, trying to come up with a better idea. And I was watching my kids play Monopoly. And I started to think – since there’s no recipe for DevOps, and you can choose your own course, and some amount of it is up to chance or your individual circumstances – well, isn’t that a game? (And isn’t that a more fun way of learning than using an endless stream of bullet points?)

So, DevOpoly was born!

Let’s take a look at this in blocks shall we?

  • MTTR – Mean Time to Repair. This indicates how robust you are, how quickly you can respond and react to an issue.
  • Stakeholder Signoff – this is after you inventory your applications – instituting any change management policy and change window will require the business to provide signoff.
  • Inventory Apps – listing applications, servers, systems and services in tiers. This is a prereq for getting your problem children identified and frozen, see below.
  • CAB Weekly Meetings – I used to think these were a complete and total waste of time. In fact several books I have claim that they don’t measurably reduce defects and slow down development – bureaucracy at its worst. But, Gene Kim swears by it – and he thinks it’s a base level requirement for change management culture.

  • Versioned Patches – Putting any software patches into source control
  • Security Auditing – having controls that are visible, verifiable, regularly reported
  • Configuration Management – Infrastructure as Code, a key part of implementing repeatable system builds, using software like Puppet, Chef, Octopus etc.
  • Golden Build – The end goal and the building block of a release library, a set of ‘golden builds’ that are verifiable and QA’d. The length of time that these builds stay stable is another metric helpful in determining reliability of your apps.

  • Feed to Trouble Ticket – Creating a system where any changes – authorized or unauthorized – show up in trouble ticket for first responders to access. % Success rate in first response in diagnosis is a key metric for DevOps.
  • Dashboarding – creating visibility around these metrics (see stage 3 of the Phoenix Project post) is the only way you’ll know if you’re making progress – and securing management support.
  • Form RM Team – This is part of the process in moving more staff away from firefighting and early in the release process. Mature, capable orgs have more personnel assigned to protect quality early on versus catching defects late.

 

  • MTBF – Mean Time Between Failures. As configuration management knocks out snowflake servers and fragile artifacts are frozen, this number should go up.
  • Automated Release – creating a release management pipeline of dev bits from DEV-QA-STG-PROD, with as much automated signoff as possible using automated tests, is a great step forward.
  • Gated Builds – See above, but having functional/integration testing and unit tests run on checkin is key to prevent failures.
  • Continuous Integration – bound up with testing and the RM cycle – having any dev changes get checked in and validated and merged safely with other development changes. (And, remember, CI means the barest amount of release branching possible. It’s a tough balance.)

  • Eliminate Access – Actually I don’t know many devs (besides the true cowboys) that really WANT access to production. But, removing access to all but change managers is a key step. And when you’re done with that…
  • Electrify the Fence – Have change policy known and discipline the (inevitable) slow learners. Not fire them. Maybe have a few “disappear” in suspicious accidents, to warn the others!
  • Monitor Changes – Use some software (like Tripwire maybe?) to monitor any and all changes to the servers.
  • Server to Admin Ratio – Typically this is a 15:1 ratio – but for high performing orgs with an excellent level of change management, 100:1 or greater is the norm.

  • Document Policy – Writing out the change management policy is a key to electrifying the fence and preventing the org from slipping back into bad habits.
  • Rebuild Not Repair – With a great release library of golden builds and a minimal amount of unique configs and templates, infrastructure is commonly rebuilt – not patched and limping along.

  • Find Fragile Artifacts – Once you’ve done your systems inventory, you can document the systems that have the lowest uptime, the highest impact to the business when its down, and the most expensive infrastructure.
  • Enforce Change Window – Set a change window for each set of your applications, and freeze any and all changes outside of that window. It must be documented and stakeholders must provide signoff.
  • Soft Freeze Fragile Systems – These fragile artifacts have to be frozen, one by one, until the environments can be safely replicated and maintained. This soft freeze can’t last long until the systems are part of configuration management/IAC.

  • Accountability – #1 of the two failure points in any change. True commitment and accountability from each person involved.
  • Firefighting Tax – Less than 5% of time spent in firefighting is a great metric to aim for. Most organizations are at about 40%.
  • Management Buy-In – DevOps can be started as a grassroots effort, but for it to be successful- it must have solid buy-in from the top. Past a pilot effort, you must secure management approval by publicizing your dashboards and key metrics.

Anyway, this was fun. I have some cards on the way for both the Gene Kim Chest – yes, not Jez Humble, but I’m thinking about it – and Chance. Lots of chance in the whole DevOps world.

(I tried this back in August with Life but it never worked by the way.)