DevOps

DevOpoly!

This is the fourth of a series on DevOps. The first focused on the three ways explored in the Phoenix Project, and I stuck in some thoughts from the Five Dysfunctions of a Team by Lencioni. The second discussed the lessons taught by GM’s failure in adopting Toyota’s Lean processes with their NUMMI plant. The third went through some great lessons I’ve learned from “Visible Ops” by Gene Kim.

“The single largest improvement an IT organization can benefit from is implementing repeatable system builds. This can’t be done without first managing change and having an accurate inventory. When you convert a person-centric and heavily manual process to a quick and repeatable mechanism, the reaction is always positive. Even a partially automated release/build process greatly improves the ability for individuals to be freed from firefighting and focus on their areas of real value. And by making it more efficient to rebuild than repair, you also get much faster systems downtime and significantly reduced downtime.” (Joe Judge, Adero)

 

 

So I am putting together a presentation for PADNUG tomorrow on DevOps. I’ve reworked this presentation like three times, and I’ve never been very happy with it. Let’s just say Steve Jobs would have rolled his eyes at something like this:

Look at that crap above. I mean, there’s information here – but way too MUCH information. There’s no way any audience is going to absorb this. I’ll lose them halfway through the second bullet point.

So, I was struggling with this a few weeks ago, trying to come up with a better idea. And I was watching my kids play Monopoly. And I started to think – since there’s no recipe for DevOps, and you can choose your own course, and some amount of it is up to chance or your individual circumstances – well, isn’t that a game? (And isn’t that a more fun way of learning than using an endless stream of bullet points?)

So, DevOpoly was born!

Let’s take a look at this in blocks shall we?

  • MTTR – Mean Time to Repair. This indicates how robust you are, how quickly you can respond and react to an issue.
  • Stakeholder Signoff – this is after you inventory your applications – instituting any change management policy and change window will require the business to provide signoff.
  • Inventory Apps – listing applications, servers, systems and services in tiers. This is a prereq for getting your problem children identified and frozen, see below.
  • CAB Weekly Meetings – I used to think these were a complete and total waste of time. In fact several books I have claim that they don’t measurably reduce defects and slow down development – bureaucracy at its worst. But, Gene Kim swears by it – and he thinks it’s a base level requirement for change management culture.

  • Versioned Patches – Putting any software patches into source control
  • Security Auditing – having controls that are visible, verifiable, regularly reported
  • Configuration Management – Infrastructure as Code, a key part of implementing repeatable system builds, using software like Puppet, Chef, Octopus etc.
  • Golden Build – The end goal and the building block of a release library, a set of ‘golden builds’ that are verifiable and QA’d. The length of time that these builds stay stable is another metric helpful in determining reliability of your apps.

  • Feed to Trouble Ticket – Creating a system where any changes – authorized or unauthorized – show up in trouble ticket for first responders to access. % Success rate in first response in diagnosis is a key metric for DevOps.
  • Dashboarding – creating visibility around these metrics (see stage 3 of the Phoenix Project post) is the only way you’ll know if you’re making progress – and securing management support.
  • Form RM Team – This is part of the process in moving more staff away from firefighting and early in the release process. Mature, capable orgs have more personnel assigned to protect quality early on versus catching defects late.

 

  • MTBF – Mean Time Between Failures. As configuration management knocks out snowflake servers and fragile artifacts are frozen, this number should go up.
  • Automated Release – creating a release management pipeline of dev bits from DEV-QA-STG-PROD, with as much automated signoff as possible using automated tests, is a great step forward.
  • Gated Builds – See above, but having functional/integration testing and unit tests run on checkin is key to prevent failures.
  • Continuous Integration – bound up with testing and the RM cycle – having any dev changes get checked in and validated and merged safely with other development changes. (And, remember, CI means the barest amount of release branching possible. It’s a tough balance.)

  • Eliminate Access – Actually I don’t know many devs (besides the true cowboys) that really WANT access to production. But, removing access to all but change managers is a key step. And when you’re done with that…
  • Electrify the Fence – Have change policy known and discipline the (inevitable) slow learners. Not fire them. Maybe have a few “disappear” in suspicious accidents, to warn the others!
  • Monitor Changes – Use some software (like Tripwire maybe?) to monitor any and all changes to the servers.
  • Server to Admin Ratio – Typically this is a 15:1 ratio – but for high performing orgs with an excellent level of change management, 100:1 or greater is the norm.

  • Document Policy – Writing out the change management policy is a key to electrifying the fence and preventing the org from slipping back into bad habits.
  • Rebuild Not Repair – With a great release library of golden builds and a minimal amount of unique configs and templates, infrastructure is commonly rebuilt – not patched and limping along.

  • Find Fragile Artifacts – Once you’ve done your systems inventory, you can document the systems that have the lowest uptime, the highest impact to the business when its down, and the most expensive infrastructure.
  • Enforce Change Window – Set a change window for each set of your applications, and freeze any and all changes outside of that window. It must be documented and stakeholders must provide signoff.
  • Soft Freeze Fragile Systems – These fragile artifacts have to be frozen, one by one, until the environments can be safely replicated and maintained. This soft freeze can’t last long until the systems are part of configuration management/IAC.

  • Accountability – #1 of the two failure points in any change. True commitment and accountability from each person involved.
  • Firefighting Tax – Less than 5% of time spent in firefighting is a great metric to aim for. Most organizations are at about 40%.
  • Management Buy-In – DevOps can be started as a grassroots effort, but for it to be successful- it must have solid buy-in from the top. Past a pilot effort, you must secure management approval by publicizing your dashboards and key metrics.

Anyway, this was fun. I have some cards on the way for both the Gene Kim Chest – yes, not Jez Humble, but I’m thinking about it – and Chance. Lots of chance in the whole DevOps world.

(I tried this back in August with Life but it never worked by the way.)

 

 

“All Happy Families Are Alike” – Visible Ops by Gene Kim review

This is the third of a series of three posts I’ve done on DevOps recently. The first focused on the three ways explored in the Phoenix Project, and I stuck in some thoughts from the Five Dysfunctions of a Team by Lencioni. The second discussed the lessons taught by GM’s failure in adopting Toyota’s Lean processes with their NUMMI plant. This one will go through some great lessons I’ve learned from a terrific – and very short and readable – little book entitled “Visible Ops” by Gene Kim. Please, order this book (just $17 on Amazon!) and give it some thought.

“The single largest improvement an IT organization can benefit from is implementing repeatable system builds. This can’t be done without first managing change and having an accurate inventory. When you convert a person-centric and heavily manual process to a quick and repeatable mechanism, the reaction is always positive. Even a partially automated release/build process greatly improves the ability for individuals to be freed from firefighting and focus on their areas of real value. And by making it more efficient to rebuild than repair, you also get much faster systems downtime and significantly reduced downtime.” (Joe Judge, Adero)

I was always struck by the phrase from Tolstoy – “All happy families are alike, every unhappy family is unhappy in its own way.” Turns out that’s true of DevOps as well. Successful companies, it turns out, have some very common threads in terms of IT:

  • High service levels and availability
    • Mean Time To Repair (MTTR)
    • Mean Time Between Failures (MTBF)
  • High throughput of effective change
    • Change success rate >99% (for example, amazon with 1500+ changes a week)
  • Tight collaboration between dev, Ops/IT, QA team, and security auditors
    • Controls are visible, verifiable, regularly reported
  • Low amt of unplanned work
    • <5% of time spent firefighting – typical is 40%
  • Systems highly automated and hands-free
    • Server to System Admins ratio 100:1 or greater (typical 15:1)

 

So what are the common factors with the happy families” that have these highly efficient, repeatable RM culture?

  • A change management culture
    • Management by fact versus belief
    • All changes go through a formal change management process
      • “The only acceptable number of unvetted change is zero.”
      • “Change management is important to us, because we are always one change away from being a low performer.”
      • “Perceptions of nimbleness and speed are a delusion if you are tied down in firefighting.”
      • “The biggest failure in any process engineering effort is accountability and true management commitment to the process.”
  • No voodoo – causality over gut feel
    • Trouble ticket systems – inside each ticket are all scheduled changes and all detected changes with the system.
      • This leads to 90% first fix rate and 80% success rate in initial diagnosis
  • Human Factors Come First in Continual Improvement
    • Strong desire to find production variance early
    • Controls to find variance, preventative and detective.

Every unhappy family though is unhappy in their own way. You’ll hear sayings like the following in these “DevOps won’t work for us, we’re unique and special” type organizations:

  • “80% of our outages are due to changes – and 80% of the time we take in implementing a repair is trying to find that change” – Gartner
  • Data and continual improvement takes a back seat to intuition, gut feel, highly skilled IT Ops staff
  • SLA not met
  • “Most of our work is caused by self-inflicted problems and uncontrolled changes. Each sprint I start with a blank slate, and each sprint ends with 50% of my development firepower getting sucked away into firefighting.”
  • Infrastructure is repaired not rebuilt- “priceless works of art”
  • System failures happening at worst possible time, IT’s rep is damaged
  • Changes have a long fuse
  • One change can undo a series of change(s)

So how does an unhappy family move towards becoming more functional? Gene Kim has broken it down into four logical steps.

  • Phase 1 – Stabilize the Patient
    • Freeze changes outside maintenance window
    • First responders have all change related data at hand
  • Phase 2 – Find the Problem Child
    • Inventory your systems and identify systems with low change success, high repair time, high downtime business impact
  • Phase 3 – Grow your Repeatable Build Library
  • Phase 4 – Enable continuous Improvement

In a little more detail:

  • Phase 1 – Stabilize The Patient
    • Beginning of step for Goal is to allow highest possible change throughput with least amount of bureaucracy possible. No rubber stamping, change request tracking system feeds info to first responders, ensure solid backup plan.
    • Inventory applications and identify stakeholders and systems
    • Document new change management policy and change window with stakeholders
    • Institute weekly change management meetings
    • Eliminate access to all but authorized change managers
    • Electrify the fence with instrumentation, monitoring
      • you’ll be shocked at what you find!
      • this prevents org from falling back into bad old habits, like a rock climber with a ratchet and rope
    • Failure Points
      • We won’t be able to get anything done!
      • The business pays us to make changes. Not to sit in boring CM meetings.
      • We trust our own people – they’re professionals and don’t need micromanaging.
      • We already tried that – it didn’t work
      • We believe there are no unauthorized changes.
  • Phase 2 – Find The Problem Children
    • Analyze assets, find fragile artifacts (use list from Phase 1)
    • Must be fast. Can’t freeze changes forever.
    • Soft freeze, where truly urgent changes during this period go through CAB.
    • Failure Points
      • Pockets of knowledge and proficiency
      • Servers are snowflakes – irreplaceable artifacts of mission critical infrastructure
  • Phase 3 – Grow Your Repeatable Build Library
    • Create a RM team. (Shifts team to pre-prod activities)
    • Take fragile artifacts in priority – create golden builds stored in software library
    • Separation of roles – devs have no access to production
    • Amount of unplanned changes (and related work) further drops
    • # of unique configurations in deployment drops, increasing server/admin ratio
    • Mitigated the “patch and pray” dilemma, updates integrated into the RM process for patches to be tested and safely rolled out
  • Phase 4 – Enable Continuous Improvement
    • This has to do with gathering metrics and measuring improvement along three lines – release, controls, and resolution.

  • Release – how efficiently and effectively can we generate and provision infrastructure?
    • Time to provision known good builds
    • Number of turns to a known good build
    • Shelf life of a build
    • % of systems that match known good builds
    • % of builds with security signoff
    • # of fast-tracked builds
    • Ratio of Release Engineers to System Admins
  • Controls – how effectively do we make good change decisions that keep infrastructure available, predictable and secure?
    • # of changes authorized per week
    • # of actual changes made per week
    • # of unauthorized changes
    • Change success rate
    • Number of unauthorized changes
    • Changes submitted vs changes reviewed
    • Change success rate
    • Number of service-affecting outages
    • Number of emergency changes or “special” changes
    • Change management overhead (measure bureaucracy, lower is better!)
  • Resolution – when things go wrong, how effectively do we diagnose and resolve issue?
    • MTTR – Mean Time To Repair
    • MTBF – Mean Time Between Failure

The Five Dysfunctions of DevOps

I remember laughing at the American car companies in the 80’s that – panicked by the unmatched quality coming out of Toyota – sent spies and emissaries out to Japan to emulate what was being done in the factories. They were given complete access, took it back to America – and it fell flat on its face. The Japanese product managers implementing Lean in the manufacturing floors snickered that they were copying the image of Buddha without the spirit. How could they ever implement something they didn’t understand? In part, those American car company manufacturers missed the essence of kata, or continuous improvement through repetition. By neglecting culture, any tool or process they tried just ended up in the same dead end.

I just finished reading the Phoenix Project by Gene Kim etc – oddly enough, on a trip out to Phoenix – and found myself wanting to smack myself in the forehead. There are so MANY things about DevOps that I did not understand, even a year ago. It would have made the last five years of my life immeasurably smoother – if I had understood the principles and thoughts behind what I was trying to do. (Insert cargo cult joke here)

We don’t have a DevOps Manifesto yet – and one is badly needed. In the meantime, we have two books that sum things up. If you haven’t read the Old Testament and New Testament of DevOps – that’s the Phoenix Project and Continuous Delivery by Jez Humble – you are missing out. The Phoenix Project is ¾ management-speak – a hero leader who steps in and methodically saves a failing company, you know the old story of the guy pointing majestically at the sunset on a horse? But, here’s the thing – if you want to convince CxO type people of the importance of DevOps, you need to read this book. It speaks the language of management – so it will help you tell a story that your management and the CIO would want to hear. And buried in its pages are some real depth.

Creating Flow – the Three Ways

I used to show a picture of Hillsboro, Oregon on the 26 during rush hour. This is, no one will argue, a fully utilized freeway. But, is it efficient?

The key to the Toyota Lean principles – and Kanban and Agile and everything else – is creating that flow. That means a buffer in every day, in every week, where we have space to think about how work is done – not just the what.

What

How

The First Way: Maximizing flow with small batch sizes and intervals of work, never passing defects downstream, global goals

Continuous build, integration and deployment

Creating environments on demand

Limiting Work in Process

Building systems that are safe to change

The Second Way: Establishing quality at the source. Constant feedback from right to left, ensuring we prevent problems from happening again and enabling faster detection/recovery.

Stopping the line when builds/tests fail

Fast automated test suites

Shared goals/pain between Dev / IT

Pervasive production telemetry showing if customer goals are met

The Third Way: Creating a culture that fosters experimentation and risk.

High trust culture versus command and control

Allocation >20% of Dev/Ops cycles towards nonfunctional requirements

Constant reinforcement of DevOps CoE and improvements kata

 

This is really illuminating. For example, think of the “stopping the line” item above for the Second Way. How many times did I – in previous assignments – take any bugs from the previous release and kick it to last in order, behind the fun stuff that I really wanted to work on? Even in smaller teams of three – where I thought “we’ll never step on each others toes” – how many integration issues did we have, right before important demos? And by neglecting automated testing, how many defects did I end up passing downstream – creating systems that were inherently difficult to change?

Dysfunction and DevOps – The Importance of culture

This has been mentioned before in my posts – but notice (courtesy of Puppet from a study done by Westrum in 2004) the illuminating chart of the three types of organizations:

Now notice the Five Dysfunctions of a Team by Patrick Lencioni:

  1. Absence of trust (unwilling to be vulnerable within the group)
  2. Fear of conflict (seeking artificial harmony over constructive passionate debate)
  3. Lack of commitment (feigning buy-in for group decisions, creating ambiguity)
  4. Avoidance of accountability (ducking the responsibility to call peers on counterproductive behavior, which sets low standards)
  5. Inattention to results (focusing on personal success, status and ego before the team)

 

Notice anything interesting? Let’s match up the sick organization on the far left – the power-based Pathological one – with that list of five dysfunctions:

So now we get a glimmer of light on why organizations with high-performing IT departments tend to be high-performing organizations – and why the reverse is also true, a sick IT shop, or one enslaved to the business or at the mercy of a cowboy group of developers, is a good indicator of underperformance. Companies that embrace DevOps as a culture tend to be high-trust, risk-friendly. They’re not afraid of differing opinions or radical ideas like Netflix’s Evil Chaos Monkey. People tend to waste less energy taking potshots at other teams/departments – and more attention to the common shared goal.

As the Phoenix Project brings out, the relationship between a CEO and a CIO is like a dysfunctional marriage – both sides feel powerless and held hostage by the other. This is true of Dev and Ops as well – and I’ve been in that sick marriage more than once. The essence of the book is forming a strong bond where the union becomes much closer by sharing goals and work based on company needs.

Other Thoughts from The Book

Common Agile Myths

  1. DevOps is just automation or infrastructure as code (no, that’s the tool – it’s part of it, but not the whole)
  2. DevOps replaces Agile (DevOps is meant to complete Agile – where bits aren’t just going into QA, but out the door to production)
  3. DevOps replaces ITIL/ITSM (It embodies ITIL concepts)
  4. DevOps means NoOps (DevOps means a truly empowered, nonsiloed Ops)
  5. DevOps is only for startups
  6. DevOps is only for open source software

 

On #5 and #6 above, this comes down to “We can’t do DevOps, because we’re special/unique/a snowflake.” But think of some of the companies today that are leading in the DevOps world and where they were just a few years ago:

  • Amazon until 2001 ran on OBIDOS content delivery service, dangerous and problematic to maintain
  • Twitter struggled to scale frontend monolithic Ruby on Rails system – took multiple years to rewrite
  • LinkedIn in 2011 six months after IPO had to freeze features for massive overhaul
  • Etsy in 2009 was “living in a sea of their own engineering filth”
  • Facebook in 2009 at breaking point, staff continually firefighting, releases painful and dangerous

 

I must read The Goal by Eli Goldratt. I love the thought – there is always a constraint or bottleneck in any organization (men, material, machines) that dictates the output of the entire system. Until you create a system that manages the flow of work to the constraint, the constraint is constantly wasted – and likely drastically underutilized. Technical debt skyrockets, and you can’t deliver to the business at full capacity. A following step is to exploit the constraint – where its not allowed to waste any time, ever. It should never be waiting on anything, and it should always be working on the highest priority commitment IT made to the rest of the enterprise.

DevOps and the game of Life.

Remember the old game of Life?

Pretty discouraging by the way. You work, you work, go to college / don’t go to college, pick up fellow travelers/family members – and at the end they add up your score. You either end up in a nice big house or a smaller one, and – what? Is that “winning”? Valuing things just by the money you’ve earned along the way, or the house you get with creaky knees and an enlarged prostate – well, that seems pretty empty to me.

The last time I gave a presentation on DevOps, I remember thinking how short I came up. I was talking about how certain cultures are very resistant to change. Most of the audience were died-in-the-wool developers, and had no problems jumping on the DevOps bandwagon. But they were frustrated at the lack of power they had to change the culture they were in. I remember making some noises about “keeping on trying” and the like.

I can say that I have seen even very resistant cultures change over time. And there’s been some great articles on building up a community of practice on DevOps from the ground up. So, free thinking a little, I went thru that blog post on guerilla-type subversive DevOps efforts – and combined it with the excellent writeup on some anti-patterns, and tried to make a game of it.

I was only mildly successful. See below – that’s as far as I’m getting for now. It’s a pretty lame game. Needs some work. But, there it is…

Between this and the articles I’m looking through on our new Release Management capabilities – it’s busy times here! Hope you are doing well as well.

 

DevOps – Cats and Dogs Living Together!

DevOps is a worthwhile investment to make. I think sometimes people hesitate, because it can be some time away from your regular duties to invest in new skills, to start doing test-driven development, to move your infrastructure over to a configuration management system like Puppet. But what I’ve found is, it’s really a worthwhile investment that not only improves the quality of what’s being built, but the job satisfaction of everyone who’s doing it. Bess Sadler, Digital Library Systems at Stanford University

Before a few months ago, I wasn’t sure what DevOps was in reality and I was a little suspicious of the term. Wasn’t ALM enough? I was kicking my bits out to QA every day using continuous integration – what did it matter if it took a few weeks more to get out the door to production? That’s the Ops people’s responsibility, not mine!

Without knowing it, I was stumbling up against a major shortcoming of ALM. Ken Schwaber said the purpose of Agile was to produce potentially releasable software as quickly as possible. And we took that and ran with it. I remember one project where we thought we made AMAZING progress when we kicked out a very robust and interactive survey site for our customers in only two sprints – getting it out the door to QA in record time. But what did that really mean for our customers, if it took Ops 10 weeks to build out the environments to support it on production? As devs, we failed to treat Operations and infrastructure as equal partners in the process – and we missed an opportunity that cost the business valuable time to market.

 

“To Heck With Them, I’ll Do It Myself” – the Rise of Shadow-IT

As a result of this friction, something called “Shadow-IT” has arisen over the past few years. If provisioning software can take weeks or months to implement, it makes any thought of Agile development a pipe dream. It’s become increasingly more easy for developers to build up VM’s and entire QA/PROD environments on the cloud. When a developer is faced with an apparently stubborn and slow-moving Ops organization, it’s understandable that some have made the decision to take on the job of building out production environments and doing an end-run around Operations.

There’s some psychology at work here. Fundamentally developers have a core set of values and a background that’s very different from Operations/IT. Developers are focused on agility – producing features as quickly as possible and getting them out the door. Devs are evaluated by their ability to produce, and produce quickly. Operations teams typically want to make a plan and execute to a plan – they are evaluated by their track record with stability and security. Stability and agility are on opposite sides of the spectrum, my friends.

As a developer I was hurting myself by not thinking more of the end goal. And put yourself in the shoes of that poor Ops guy that’s given an extensive manual setup script six days before launch. If that first war room deployment is a train wreck with lots of on-the-fly adjustments and long hours, how eager will they be to work with me in the future? This knocks CI right out of the picture, since they’ll think, “If I deploy more things faster to prod, won’t I be shooting myself in foot?”

It comes down to our definition of done. We aren’t done when it builds on our machine and we huck it over the fence. As software cratsmen, our job is to deliver value – and value is software running in production, period. So, for DevOps to work, it’s more than just RM or tooling. We as developers need to be more interested in the end product, and IT needs to be more interested in the beginning.

I believe the above attitude – which I have fallen into several times in my career – is very shortsighted. Operations and tracking down performance bugs or environmental anomalies is a specialized set of tasks – and one that I’m ill-suited for. I grit my teeth and walk through Perfmon logs when I have to – but it’s not a core skillset. Shadow-IT is NOT the answer in improving our time to market long term – and as we called out above, that return trip of information and feedback on usage is suddenly missing, crippling our development efforts.

DevOps Comes on the Scene

Back about six years ago, in 2009, a group of people noted that the shortened release cycle was causing a lot of friction – it exposed a gap between the people writing the software and those responsible for deploying it to production:

The old way of doing things with monthly and quarterly releases worked OK, but with daily releases with CI the Operations side of things is falling behind. Some key points exposed by the above graphic:

  1. We needed tooling to automate releases as much as possible – including tracing and approvals
  2. The old way of “it works on my machine” or being surprised when sites weren’t available – being told reactively with SCOM or worse by external customers – was a no-go.
  3. We were having problems closing the feedback loop – that’s the top part of the cycle. Especially with disconnected business owners, it was VITAL to receive as much specific metrics as possible – from Ops, from usage patterns on the app itself – so we’d know what features we needed to prioritize for the next sprint.

To fill the gaps, the concept that came to be known as DevOps came into focus:

Let’s break this down by People, Process and Tools:

  • People
    • Stronger collaboration not just with Ops teams but also QA, BA’s and end user community
    • Bugs are knocked down immediately not left to mounting technical debt
    • Stronger focus on continuous learning – versus unattainable “zero defect” or unproductive manual Change Board meetings
  • Process
    • Unit testing and functional/integration testing becomes a part of each release cycle.
    • Incident management feeds bugs back to dev team
    • Release branching simplified and feature branching verboten
  • Tools
    • Release management to handle moving bits out the door
    • Tools to handle provisioning of environments – infrastructure as code

As you can see, the assumption many people have when they hear the word “DevOps” – “Oh, that means RM” – isn’t anywhere close to complete. It’s like saying that ALM is having your code in source control. And the goal we have in mind isn’t some kind of unrealistic paradise where we are all around the campfire telling stories and roasting marshmallows. As I heard recently by an Agile architect, “DevOps does not mean Operations and Devs hugging each other”. What he meant was, the point was getting it out in production as quickly as possible – and getting as much metrics back as possible. It’s more about efficiency than it is about making friends.

So there’s a lot to that formal definition of DevOps – “The collaboration of IT Operations and developers in deploying new software to benefit the business.”

 

The Importance of Culture

So if you want to “be DevOps”, don’t create a separate team and don’t search for some magic process or methodology. …your operations team to start speaking up and working with your development team about what could be done to reduce the operational complexity of the software. Figure out how to make your software easier to configure, easier to deploy, and easier to operate in general. Likewise, your development teams need to seriously listen to the operations group. They need to treat the operations team’s concerns just like they would treat any end-user feature request or bug request. After all, your operations team is simply another type of user of the development team’s software. (from http://www.devopsonwindows.com/it-takes-dev-and-ops-to-make-devops/ )

Changing culture is hard. For many developers, their individual influence won’t be enough to make headway in this change. (Although, do check out the Phoenix Project and the great Damien Edwards video “You Can’t Change Culture, but You Can Change Behavior and Behavior Becomes Culture” in my links below.) Based on personal experience, I believe DevOps does not work as a grass-roots or bottom up. You will need a CxO level advocate or an architect as a champion to move ahead.

Start out with a realistic view of what’s possible/achievable given the organizational makeup. Flat organizations do have a better track record of success. And keep in mind the important elements of culture your organization will have to demonstrate:

  • Collaboration
  • No-Blame Postmortems
  • High-trust culture
  • Experimentation and risk
  • Strong focus on continuous improvement
  • Good information flow and bridging between teams

I talked yesterday evening with someone that had an architect that realized their cloud-based infrastructure was costing them thousands of dollars monthly. He told the development team that they were going to be taking DevOps seriously – and that their environments would be completely disposable. Every night they would tear down their machines to a bare minimum – and every morning they’d rebuild them again. It worked brilliantly. That’s the kind of game-changer you need on your side.

A study by Westrum in 2004 divided organizations into three general categories, based on their orientation and base characteristics. Is your company Pathological, Bureaucratic, or Generative?

(courtesy Puppet Labs)

If your organization is Pathological, the best advice I can give is to focus on what you can do, and keep your own house in order. Have CI and RM in place – but know in advance, your odds of instituting meaningful DevOps are fairly low. This is because these types of organizations do not lend themselves well to the cross-team collaboration that will be vital for your success. Bureaucratic organizations are relatively friendlier, but there still will be a significant amount of inertia for you to overcome. Forming a DevOps Center of Excellence here or bringing in outside consultants to help sway decisionmakers may be a difference-maker here. If you’re in a Generative organization, congratulations – your pilot efforts will likely be the first step on a successful road to DevOps that will make your work-life as a developer much, much happier. Automating tedious tasks and not getting drowned in firefighting or unplanned work means a happier YOU.

History reveals some seemingly counterintuitive facts about DevOps adoption rates – Curiously enough, larger organizations and Windows-based vs open-source orgs have a higher rate of success in adopting DevOps. Looser open-source or smaller teams may not have the consistent level of discipline it takes to make DevOps stick. (Maybe too much freedom is a bad thing?) And, some of the best success stories come from the worst starting point. Pain is a powerful catalyst for change. If things are going well or even just OK, there’ll likely be little impetus for something as far-reaching as DevOps.

In discussing with management the key points of DevOps, stress the following benefits:

  • Our ability to ship features quickly and speed of deployment will rise dramatically.
  • We’ll be able to react quicker to your feedback as stakeholders and incorporate them into our feature prioritization.
  • We’ll be able to recover from failed deployments quickly and have smooth rollbacks.
  • Happy cows make better cheese. DevOps practices increase employee satisfaction, which leads to better business outcomes. A strong IT org means a profitable company. DevOps practices lead to a strong IT org. DevOps will lead to a healthier team.

If that doesn’t work, mention metrics. They may not care specifically about RM – but they will care if it drops your defect rate by 50% and increase their profits by 100%. And you can always mention the scare story of Knight Capital, which – by not instituting release management and DevOps earlier, lost $450 million dollars in the course of 90 minutes in August 2012. Studies show that high performing IT orgs are 2x as likely to exceed profitability, market share, and productivity goals. There’s a strong correlation between robust IT organizations and continuous delivery practices and productivity.

There’s other ways to change culture. I’ve heard of some companies seating engineers/ops next to each other, hiring consultant/mediators to streamline implementation, and forcing devs to be responsible with pager duty.

A healthy team = trust. Good feedback loop, crossfunctional collaboration, shared responsibilities, learning from failure. You’ll have a happier, easier, less stressful life as a developer. What’s not to love?

 

From Zero to DevOps in 180 Days

Here’s a sample recipe you could put together in trying to implement DevOps in your company. We’ll be realistic and pragmatic here, and assume that the first step must be taken by us, the developers – and after a trial period we’ll have Operations teams chiming in for a second phase:

First Phase: Getting Your House in Order (Month 1)

  • Devs take first step
  • Basic release management
    • Automatic provisioning of clients / config management
    • Continuous Integration
    • Beginnings of automated testing, version control
    • Elimination of feature branches
  • Peer reviews begin (NOT external change approval

 

Second Phase (Month 3)

  • No-blame postmortems
  • Published dashboarding of automated test coverage
  • Regression bugs identified by tests fixed immediately
  • Toolset experimentation begins
  • DevOps CoE publishes metrics and lessons learned / goals
  • Visibility is key

 

Phase 3: Ops-centric (Month 6)

  • Proactive monitoring, analytics, incident resolution
  • DevOps awareness / training seminars
  • Beginning of automating pain points
  • Next up – Build-Measure-Learn or Hypothesis-Driven Development

 

Wrapping Things Up

We haven’t talked about tooling yet. I’m going to walk through Chef and Puppet Labs integration with Visual Studio in a future blog post, and the same for deploying bits using the new Release Manager application integrated with VS. I have another post on Application Insights as well, which will help you expose to your business partners your test coverage and website performance. All of these things are important – but I believe the people and process side of things are MUCH harder and more fundamental to DevOps than any particular tool you end up selecting.

DevOps means replacing what WAS a vicious, nasty cycle of recriminations and backbiting with a virtuous cycle. Devs work closely with IT early on, stability improves. As stability improves, IT performance improves, and you start getting some meaningful feedback and metrics to prime your backlog. The first step is the necessary one of treating QA people and Operations as equal partners, early on in the process, and getting them involved in the design. Chef environment deployments must be built out along with your code bits, early on – give them the time they need to scale out so they can do their job.

So here’s some closing thoughts on how to make your journey to DevOps as smooth as possible:

Don’t form a single DevOps team. That doesn’t mean don’t begin with a cross-functional pilot team as a proof of concept – that’s actually a good idea – but forming a single DevOps team misses the point. From Jez Humble’s blog: “The DevOps movement addresses the dysfunction that results from organizations composed of functional siles. Thus, creating another functional silo that sites between dev and ops is clearly a poor way to try and solve those problems. (i.e. a cross functional team)”

Go slow to go fast. Recognize IT as an investment, and secure management support as a precursor. Start small and grow – gather data, and iterate. Gauge user response, collect metrics on things like MTTR/MTTD, and rinse and repeat.

Encourage experimentation. Propose a change and promise to reevaluate in six months or some other time limit. Make postmortems blameless. Practice root cause analysis (think of the “Five why’s”), and keep a detailed log of events without fear of punishment. Don’t level personal criticism at anyone, and don’t take feedback personally. Managers, it’s up to you to create a culture where it’s safe to fail.

Choose your tools wisely. There’s a variety of RM and infrastructure tooling out there, and each have pros and cons. Whatever you do, make sure the entire group has input on the tool of choice so you have their buy-in. You’ll want an integrated toolset based on loosely coupled platforms – one that can automate all those formerly painful manual steps, and where you can provide visibility and transparency through application monitoring.

Take that Ops handshake seriously. We need to think as devs about how Operations will monitor and support the software we produce. And pay attention to the little things – keep your promises, foster open communication. I have seen in multiple war rooms good and bad examples of how to act when things go wrong. If you behave predictably and calmly when the wheels come off, your Ops team will begin to take your words of partnership seriously.

Don’t go it alone. Incubate your DevOps movement with a Center of Excellence. Fold in your business analyst and business owner, and track metrics. Hold seminars or devops awareness brownbags. Some of the resources below in the links are very helpful – again, I’m referring to the Phoenix Project book in particular.

If you’re having trouble being taken seriously, think about bringing in some help. I worked at one organization for several years as a development lead and became very frustrated at our slow rate of adoption of ALM and DevOps. Once we brought in some Microsoft resources, they were able to hold workshops and chart out a clear path for us that was reasonable in scope and fit our business and cultural background. Within 18 months, you wouldn’t have recognized the changes in the development teams – we had an integrated set of tools and processes that the developers loved, and got our releases out the door much more quickly to the delight of our customers. Having that independent third voice in the room really made all the difference for us in getting our dev maturity level off the ground. Good luck!

 

Link Goodness – For Devs (specific implementation details)

  1. Excellent courses available here – http://www.microsoftvirtualacademy.com/training-courses/devops-an-it-pro-guide
  2. And here: http://www.microsoftvirtualacademy.com/training-courses/assessing-and-improving-your-devops-capabilities
  3. And get this book – it’s becoming a standard, along with the Phoenix Project from Gene Kim. https://www.safaribooksonline.com/library/view/continuous-delivery-reliable/9780321670250/
  4. This demo is an excellent walkthru- http://blogs.msdn.com/b/visualstudioalm/archive/2014/11/11/using-release-management-vso-service-to-manage-releases.aspx
  5. Channel9 videos on DevOps – http://channel9.msdn.com/Tags/edge-devops
  6. A nifty walkthrough of Puppet integration with Visual Studio Online – http://channel9.msdn.com/Shows/Edge/Edge-Show-110-Puppet-on-Azure
  7. Brian Keller does a 10-minute walkthrough of RM deployments (excellent for getting started with infrastructure as code) – http://channel9.msdn.com/Events/Visual-Studio/Connect-event-2014/214
  8. Great list of resources – http://www.itproguy.com/top-2014-microsoft-devops-learning-resources/
  9. Books that everyone appeared to quote and that seem very influential are the Phoenix Project with Gene Kim et al and Jez Humble’s book on Continuous Delivery. Of the two, if you just want a good yarn, start with the Phoenix Project. For a more technical approach, Jez’s book is excellent.
  10. http://www.microsoftvirtualacademy.com/training-courses/azure-resource-manager-devops-jump-start
  11. Great blog site for DevOps – http://www.donovanbrown.com/

For Business People / Architects (more general or on culture)

  1. A GREAT video on culture changing – “You Can’t Change Culture, But You Can Change Behavior, and Behavior Becomes Culture” http://vimeo.com/51120539 – Damon Edwards
  2. http://www.computerworld.com/article/2851974/microsoft-study-finds-everybody-wants-devops-but-culture-is-a-challenge.html
  3. http://www.citeworld.com/article/2115209/development/what-is-devops.html
  4. “There’s often a gap between devs and operators – devs motivated by innovation, pushing the envelope, and system admins tasked with security/stability. I think of DevOps as the bridge of that gap… since we’ve invested more in Ops, been a lot easier for us to get our services out there and actually delivered…. Expensive and timeconsuming to rollout a new feature, even if meticulously scripted often didn’t get it right – discrepancies between dev/prod systems.” – Bess Sadler, Stanford https://www.youtube.com/watch?v=L9V8oEaZ71I
  5. The PuppetLabs blog on Devops has a wealth of culture change information: http://puppetlabs.com/blog-categories/devops
  6. Puppet Labs 2014 whitepaper detailing org benefits of DevOps – this is very thorough: http://puppetlabs.com/sites/default/files/2014-state-of-devops-report.pdf
  7. MS site on DevOps – http://blogs.technet.com/b/devops/
  8. The rise of Shadow-IT – “A very tangible, negative side effect of this situation is Shadow IT, where developers create their own ways to deploy and run their solutions, in order to maintain the required speed of change and flexibility, and to respond to ever growing demands. One example of this is cloud technologies with easy-deploy options and pay-as-you-go offers foster Shadow-IT, which frequently leads to the drifting apart of development and operations teams, leading to a Shadow-IT scenario.”
  9. VM’s and hands-on labs: http://aka.ms/ALMVMs
  10. Great article on DevOps including a checklist and sumup: http://www.devopsonwindows.com/it-takes-dev-and-ops-to-make-devops/. I love their article on removing config files and the checklist rocks.
  11. A nice interview with Yelp on how they were able to institute change, focusing on listening, being humble, and understanding the patterns your org has in place before advocating widespread, permanent changes – http://puppetlabs.com/blog/change-agents-it-operations-what-it-takes
  12. Gene Kim – how do we Better Sell DevOps? http://vimeo.com/65548399
  13. I also like the no horse crap video https://www.youtube.com/watch?v=g-BF0z7eFoU
  14. DevOps resources and links – http://www.itproguy.com/top-2014-microsoft-devops-learning-resources/