Month: December 2016

Starting and Scaling DevOps in the Enterprise – review

As many of you know, I’m a huge fan of the work Gary Gruver has done – in particular his book “Leading the Transformation” on his experiences at HP trying to transform a very traditional enterprise. (See my earlier mention of his book on this blog, here.) His newest work is out – Starting and Scaling DevOps in the Enterprise. I am recommending it very highly to all my customers that are following DevOps! I think its unique – by far the best I’ve read so far when it comes to putting together specific metrics and the questions you’ll need to know in setting your priorities.

Gary notes that there are three types of work in an enterprise:

  1. New work – Creating new features or integrating/building new applications
    1. new work can’t be optimized (too much in flux)
    2. Best you can hope for here is to improve the feedback loop so you’re not wasting time polishing features that are not needed (50%+ in most orgs!)
  2. Triage – finding the source of defects and resolving
    1. Here DevOps can help by improving level of automation. Smaller batch sizes means fewer changes to sort through when bugs crop up.
  3. Repetitive – provisioning environments, building, testing, configuring the database or firewall, etc.
    1. More frequent runs, smaller batches, feedback loop improved. All the DevOps magic really happens in #2 and #3 above as these are the most repetitive tasks.

Notice of the three types above – the issues could be in one of five places:

  1. Development
    1. Common pain point here is Waterfall planning – i.e. requirements inventory and a bloated, aging inventory)
  2. Building Test Environments
    1. Procurement hassles across server, storage, networking, firewall. Lengthy handoffs between each of these teams and differing priorities.
    2. Horror story – 250 days for one company to attempt to host a “Hello World” app. It took them just 2 hours on AWS!
  3. Testing and Fixing Defects – typically QA led
    1. Issues here with repeatability of results (i.e. false positives caused by the test harness, environment, or deployment process)
    2. Often the greatest pain point, due to reliance on manual tests causing lengthy multi-week test cycles, and the time it takes to fix the defects discovered.
  4. Production Deployment – large, cross org effort led by Ops
  5. Monitoring and Operations

The points above are why you can’t just copy the rituals from one org to another. For any given company, your pain points could be different.

 

So, how do we identify the exact issue with YOUR specific company?

  1. Development (i.e. Requirements)
    1. Metrics:
      1. What % of time is spent in planning and documenting requirements?
      2. How many man-hours of development work are currently in the inventory for all applications?
      3. What % of delivered features are being used by customers and fit the expected results?
    2. An important note here – organizations often commit 100% of dev resources to address work each sprint. This is terrible as a practice and means that the development teams are too busy meeting preset commitments to respond to changes in the marketplace or discoveries during development. The need here is for education – to tell the business to be reasonable in what they expect, and how to shape requirements so they are actual minimum functionality needed to support their business decisions. (Avoid requirements bloat due to overzealous business analysts/PM’s for example!)

  1. Provisioning environments
    1. Metrics:
      1. How much time does it take to provision environments (on avg)
      2. How many environments are requested per month/sprint
      3. % of time these environments require manual fixing before they are complete
      4. % of defects associated with non-code – i.e. environments, deployments, data layer, etc.
    2. The solution here for provisioning pinch points is infrastructure as code. Here there is no shortcut other than developers and IT/operations working together to build a working set of scripts to recreate environments and maintaining them jointly. This helps with triage as changes to environments now show up clearly in source control, and prevents DEV-QA-STG-PROD anomalies as it limits variances between environments.
    3. It’s critical here for Dev and Ops to use the same tool to identify and fix issues. Otherwise strong us vs them backlash and friction.
    4. This requires the organization to have a strong investment in tooling and think about their approach – esp with simulators/emulators for companies doing embedded development.

  1. Testing
    1. Metrics
      1. What is the time it takes to run a full set of tests?
      2. How repeatable are these? (i.e. what’s the % of false errors)
      3. What % of defects are found with testing (either manual, automated, or unit testing)
      4. What is the time it takes to approve a release?
      5. What’s the frequency of releases?
    2. In many organizations this is the most frequent bottleneck – the absurd amount of time it takes to complete a round of tests with a reasonable expectation the release will work as designed. These tests must run in hours, not days.
    3. You must choose a well-designed automation framework.
    4. Development is going to have to change their practices so the code they write is testable. And they’ll need to commit to making build stability a top priority – bugs are equal in priority (if not higher than) tasks/new features.
    5. This is the logical place to start for most organizations. Don’t just write a bunch of automated tests – instead just a few automated Build Acceptance Tests that will provide a base level of stability. Watch these carefully.
      1. If the tests reveal mostly issues with the testing harness, tweak the framework.
      2. If the tests are finding mostly infrastructure anomalies, you’ll need to create a set of post-deployment tests to check on the environments BEFORE you run your gated coding acceptance test. (i.e. fix the issues you have with provisioning, above).
      3. If you’re finding coding issues or anomalies – congrats, you’re in the sweet spot now!
    6. Horror story here – one company boasted of thousands of automated tests. However, these were found to not be stable, maintainable, and had to be junked.
    7. Improve and augment over time these BATs so your trunk quality gradually moves closer to release in terms of near-produciton quality.
      1. Issue – what about that “hot” project needed by the business (which generally arrives with a very low level of quality due to high pressure?
        1. Here the code absolutely should be folded into the release, but not exposed to the customer until it fits the new definition of done: “All the stories are signed off, automated testing in place and passing, and no known open defects.”

  1. Release to Production
    1. If a test cycle takes 6 weeks to run, and management approval takes one day – improving this part just isn’t worth it. But if you’re trying to do multiple test cycles a week and this is the bottleneck, absolutely address this with managers that are lagging in their approval or otherwise not trusting the gated testing you’re doing.
    2. Metrics
      1. Time and effort to release to production
      2. Number of issues found categorized by source (code, environment, deployment process, data, etc)
      3. Number of issues total found in production
      4. MTTR – mean time to restore service
      5. # of green builds a day
      6. Time to recover from a red build
      7. % of features requiring rework before acceptance
      8. Amt of effort to integrate code from the developers into a buildable release
    3. For #1-4 – Two areas that can help here are feature toggling (which you’ll be using anyway), and canary releases where key pieces of new functionality are turned on for a subset of users to “test in production.”
    4. For #5-6 – here Continuous Integration is the healer. This is where you avoid branching by versioning your services (and even the database – see Refactoring Databases book by Scott)
    5. For #7-8 – If you’re facing a lot of static here likely a scrum/agile coach will help significantly.

 

So – how to win, once you’ve identified the pain points? You begin by partitioning the issue:

  • Break off pieces that are tightly coupled versus not developed/tested/deployed as a unit. (i.e. HR or Purchasing processes)
  • Segment these into business critical and non-business critical.
  • Split these into tightly coupled monoliths with common code sharing requirements vs microservices (small, independent teams a la Amazon). The reality is – in most enterprises there’s very valid reasons why these applications were built the way they are, You can’t ignore this complexity, much as we’d like to say “microservices everywhere!”

I really admire Gary’s very pragmatic approach as it doesn’t try to accomplish large, difficult things all at once but it focuses on winnable wars at a company’s true pain points. Instead of trying to force large, tightly coupled organizations to work likely loosely coupled orgs – you need to understand the complex systems and determine together how to release code more frequently without sacrificing quality. Convince these teams of DevOps principles.

Advertisements

AWS to Azure – Making the Leap

I’m not even pretending that this is definitive or comprehensive. But, at 11 pm, here’s a few notes and some helpful links and resources as a companion to a presentation I wrote earlier today.

Migrating Workflows

If you’re an AWS developer and you are thinking of exploring your options in Azure-world – here are some things to keep in mind:

  • You’ve already hit the big challenges in moving to the Cloud, It’s much easier to move workloads from AWS to Azure, than from onprem to the cloud.
  • The majority of AWS functionality has a map in Azure. My take is – AWS started with a 3 year head start in the IAAS space, and that’s their strong point. Azure has a much stronger backbone and pedigree to where the cloud really gets interesting – PAAS/SAAS scenarios. The feature competition between Amazon, Microsoft and Google is a going to continue accelerating – which is a very good thing for you.
  • VM conversion from EC2 is easy; PAAS/SAAS conversion is tougher. None of these are truly apples-to-apples (example, AWS Lambda -> Azure Functions)
  • Availability models are very different
  • Project specific – know the integration points and SLAs and underlying platform services.
  • Deployment models are better in Azure!

 

Migration is remarkably easy – basically you follow some simple steps using Azure Site Recovery.

 

Amazon AWS to Azure – General Resources

TechNet Radio Series from Microsoft:

 
 

Great Pluralsight video – I loved this. An excellent starting point for people new to PAAS architecture.

 
 

 

Architecture Overview

Wonderful set of reference architectures – this is a terrific link: https://azure.microsoft.com/en-us/solutions/architecture/

 (see below for snapshot)

 
 

And a central repository for more whitepapers: https://docs.microsoft.com/en-us/azure/architecture-overview

 
 

Another outstanding book – free – on Cloud Design Patterns. This is a terrific book and it has some outstanding reference works that can pair with this:

Web Development Best Practices Poster (and see the link above for a Scalability poster as well)

 
 

(screenshot below)

 
 

 

 

Now let’s talk a little more about how some of these components in the AWS space map over more into the Azure space.

Azure Functions

 
 

Blob Storage 

Blob Storage – very good walkthrough: https://docs.microsoft.com/en-us/azure/storage/storage-dotnet-how-to-use-blobs

 

And more general notes on storage models in Azure:

  
 

Event Hubs

 
 

Azure Stream Analytics

 
 

Redis

 
 

DocumentDB

  
 

API Management

 
 

WebJobs

ARM Templates

 
 

 
 

 
 

 
 

 
 

Talent is Overrated.

This is a review of Talent is Overrated by Geoff Colvin. I highly recommend this book, it got me to think about my work – and how I go about my work – in an entirely new way.

Check out this video of the great Jerry Rice. Jerry is widely considered the best ever to play the game. His records for total receptions, touchdowns, and receiving yards aren’t just #1 in each category – they’re ahead of the runner up by almost 50%. It’s likely no one will ever beat them.

The odd thing is, Jerry is widely understood to be lacking the one quality agreed to be the essential quality needed for a wide receiver – and the one that can’t be bought. Speed. Jerry Rice was never a particularly fast runner. So how did he stay so dominant – until he retired at 42 years of age?

 

The answer is practice, hard practice. His offseason workouts were legendary. Jerry Rice and his trainer realized that three things were necessary to excel at his position – running precise patterns, evading defenders (outmuscle, outjump) and then outrunning them after the catch. So his offseason workouts focused on that. He did precision running of routes and worked on his hands to help with reception. Trail running helped him change directions on a dime. And his legendary Hill wind sprints helped give him explosive acceleration. He did this 6 days a week, in the offseason. His trainer would not release his regimen to people that asked –afraid they would try it, and hurt themselves. That workout helped him excel beyond his more talented competitors.

Above is a snapshot of Shizuka Arakawa at the 2006 Winter Olympics. Notice about 1 minute 30 seconds in this video what she does:

This is her famous signature “layback ina Bauer” move. This is an incredibly difficult movement – a backward, almost double leaning back that leads to a three jump combo. She spent 19 years of practice on this, and it’s likely that she fell almost 20,000 times trying to execute this. Onto a very hard, unforgiving surface.

Above are three album covers by the Beatles. If you’ve ever listened to Help, or any of their other first three albums – they’re pretty average. Something happened though about the time Rubber Soul came out. (Besides LSD and Timothy Leary!) At this point – and Malcolm Gladwell talks about this in his book Outliers – they had put about 10 years, or 10,000 hours of practice, in starting with those famous Hamburg days where they were playing multiple sets a day. Their music made huge leaps.

The book makes an excellent point that the one thing most people believe about talent – that you have to be talented to make it, and that if you aren’t born with it you’re out of luck
– is wrong, dead wrong. With enough practice, and hard work, you can achieve greatness.

 

But Jerry Rice and the ice skater we mentioned earlier would tell us – hard work by itself is not enough. Both of those athletes learned to practice specifically on their craft in a very planned way. They stayed in a middle zone.

Mr Colvin pointed out studies done where we learn best in a middle zone – where we are challenged, but just out of reach of our current skills and abilities. Think about a teenager learning to drive. At first, we’re terrified – totally out of our depth. That’s the Panic Zone. We’re so out of our known experiences here that we’re a danger to ourselves and others – which is why there’s so many accidents for kid below the age of 24. Then we start to put things together, over time, where we match book learning with real world experience. That’s the optimum point -where we really start to excel and thrive. For most of us, we then move on to the comfort zone, where our abilities plateau. We’re good at driving at this point, but no better – just average. Definitely we’re safe at driving but not good enough to be a NASCAR driver. At this point we’re not reaching our potential.

The key is to stay in that challenging middle zone, where we are always trying to refine our technique or abilities according to an ability. See this graphic below:

Notice above – this is for trained classical violinists, specialists that have been spending hours in individual practice each week since the age of 5. You’ll notice – although the best (genius level) violinists practiced just as much as those considered better, what set them apart was the consistency. They sunk almost 2000 more hours than the level below them. Instead of just slogging along, they practiced what the author calls Deliberate Practice:

 

So the book calls out some common traits of forward thinking organizations.

 

This led me to do some thinking about my own life. I often think of my favorite author, Norman MacLean. He wrote two classics – A River Runs Through It, and a second book published after his death on the 1949 Mann Gulch fire called “Young men and Fire”. Both are classics, and outstanding. I always feel a sense of loss when I read these books, because there’s only two of them. What if he had started to write earlier than his late 60’s?

The book stresses being able to accept mentoring and feedback, and the value of failure – practicing deliberately and steadily making improvement. I will start following these steps in following my writing goals.

 

 

 

 

Appendix – Jerry Rice’s workouts (from NY Times article)

Rice’s six-day-a-week workout is divided into two parts: two hours of cardiovascular work in the morning and three hours of strength training each afternoon. Early in the off-season, the a.m. segment consists

The workouts are the key to Rice’s longevity and endurance. They are brutal because they are so long. And there is no question that they pay off. When he sprinted up the middle and outran the San Diego secondary for his first touchdown in Super Bowl XXIX, he felt the accelerators kick in. When he separated his shoulder only to return to the game and then actually run over a Chargers player, that’s when the weight training came in.

“I never have an easy day,” he said, “because there is never an easy day when the playoffs begin.”

It is what Rice says now — sitting on a bench with ice draped around his shoulder — that may symbolize the man more than anything. “I have to fight for everything,” said the man who came out of Mississippi Valley State, a Division II school. “I always have. I have to prepare myself every year. There is always some young guy who thinks he can take me. And then when the day is done, he realizes he can’t.

Even when I was younger, people were waiting to see if I was a fluke. And I proved time and time again — through hard work — that I was not. Now, as I get older, people are looking for me to slip. They are waiting for me to lose a step. That hasn’t happened and I will get out of it before it does. If anything I’m faster and better than I have ever been.”

of a five-mile trail run near San Carlos on a torturous course called, simply, The Hill. But since five vertical miles can hardly be considered a workout, he pauses on the steepest section to do a series of ten 40-meter uphill sprints. As the season approaches, however, Rice knows it’s time to start conserving energy — so he forgoes The Hill and instead merely does a couple of sprints: six 100-yarders, six 80s, six 60s, six 40s, six 20s, and 16 tens, with no rest between sprints and just two and a half minutes between sets.

For the p.m. sessions he alternates between upper-body and lower-body days. But no matter which half of his body he’s working on, the volume is always the same: three sets of ten reps of 21 different exercises. Yes, your calculator’s right: That’s 630 repetitions a day.