DevOps – are you stuck on where to start? Building out 4 Release Pipelines in an Hour with Donovan Brown

Such, SUCH a great treat having Donovan come all the way from Houston for this presentation. He’s a fantastic presenter and his hands-on knowledge blows my socks off. I can’t thank him enough for making the trip.

IF YOU ATTENDED and you enjoyed the presentation – or even if you didn’t enjoy it – WE NEED YOUR HELP!
Take a few minutes and click on this link – it will tell us how we can improve next time, and hopefully lay the groundwork for Donovan to come back in a year or so.

There’s a streaming video available here of Donovan’s Portland Area .NET User Group (PADNUG) presentation. It was outstanding! If you want to see one CI/CD release pipeline built out releasing a website to Azure in one hour – well sorry, that is not what he did. Instead, Donovan built out FOUR pipelines in an hour – using Node.js, Java, .Net and .Net Core., while describing step by step what he was doing and why – using VSTS, the Azure Portal, Visual Studio, and a cool command-line tool I hadn’t heard about called Yo Team!

The key takeaways from his presentation are as follows:

  • If you only get one thing from our time together – Microsoft is for any language, any platform.
  • Licensing costs should not be a blocker for us. It’s like $10/person per month, and that’s only for large teams over 5. And with MSDN you get these benefits included.
  • Lastly, if you attended and enjoyed the presentation – give a shout out to Donovan on Twitter. He’s thrown down the gauntlet – “we have the best build and release tools on the market, if you don’t agree reach out to me on Twitter and be specific!” So, if you don’t think Microsoft’s build tools will work with Jenkins, or Java apps, Node.js, SonarQube etc – throw down and enter the squared circle my man!

My notes are below. These aren’t notes of his presentation as they were no-slide demos, but comments Donovan made during the day that gives DevOps context to his presentation. As the saying goes, “If I had more time it would have been shorter!”




Thoughts on Strategy

Make incremental changes in ramping up your maturity. It’s safer. If you are deploying manually, there’s one part that you fear the most. Focus on that.

“If it hurts, do it more often.” (You are incurring more risk by deploying less frequently.) Lean forward into that and think about, how can I fix this. Focus on scariest part first.

If you need prescriptive – hire a consultant. (Like, I don’t know, Microsoft Premier for Developers maybe!) They can ask questions, pinpoint pain points for your org and provide a specific roadmap.

Why do you have to ask permission from your manager? Your job as an engineer is to continuously deliver value. You should need to ask permission if you DON’T want to set up CI. No need to stop and learn for months – focus on what hurts the most, and focus on that one thing. What is the thing – every time you run a deployment – why it breaks. Otherwise – you end up paying a tax doing it manually. Find something small to focus on – don’t try to do it all at once. It might take you months – after which you may have an amazing portion of your pipeline that was manual that is now automated. That will get people excited. And after that last build that blows up when people start playing the “Not it” game – guess what, you get to go home early!

And if you’re stuck on exactly where do we start? – http://donovanbrown.com/post/How-do-we-get-started-with-DevOps (really, check this post out. It’s good enough to mention twice!)


Agile Is Your Starting Place

The key question is – “Can you produce releasable bits of code?” Until your Agile processes are strong – not perfect but mature – you can’t see significant gains with DevOps. The business needs to understand that we need to produce increments of shippable code. DevOps is the “second decade of Agile.” Once Agile is in place you will realize that you’re not able to ship as fast as you can produce features.

What limits us so far – four years after “The Phoenix Project?” Really it’s still slowed by adoption of Agile. 40% of orgs have adopted Aguke. That’s a dead number, as its only dev teams that are championing this. The rest of the org is expecting waterfall, requirements, dates.

DBA’s are frequent blocking point. If you slice vertically and not horizontally. I’ll give you 3 months, if you promise not to make any further changes to the schema, ever. “That’s crazy!” “If you’re going to change it anyway why should I invest 3 months in this now. Your first step could start with login functionality – in 3 weeks we have a login page, username and credentials, authentication and identification.” Don’t let a recalcitrant DBA stop you in your tracks.

One week sprints really worked well (at one company) for Donovan in his consulting past- with product owner/business owner. After 8 weeks, needed to get a knowledgeable person (replacing old business owner) – and a sigh of relief, finally back on track. And hire a scrum master. That person needs to be able to say no to the boss. For a good product owner, all they have to do is understand the business they are trying to serve – and tell me of two backlog items which one is a priority. Scrum Masters must protect team from unrealistic expectations, on this date, I want these features – perfect quality. Quality is not negotiable – either move date or lose features.


Database Goodies

SQL Server Data Tools (SSDT) – version every object on SQL Server including permissions and roles. Stored in a database project. (this is free. Evaluate, see if it meets your need.) An amazing tool, been around for 10 years, very little known – our best kept secret! If SSDT for some reason isn’t scaling or working for you – ReadyRoll from RedGate is also a good option.


Last but not least – this amazing site brought to my attention from Jorge Segarra. An awesome resource for all things DevOps for you DBA guys! https://www.microsoft.com/en-us/sql-server/developer-get-started/sql-devops/

This may require breaking up into diff schemas (like a virtual namespace) vs one huge project. Schemas are here now and available; will allow you to segment databases that you create – no 3000 tables/etc in one untranslatable blob. (BACPAC is schema and data, DACPAC is just schema. Can run post/pre scripts- make sure its idempotent so if it runs twice you don’t double your db size!)

Migrating your CRUD statements from sprocs to Entity Framework may be a win to move away from thousands of unwieldy and clunky INS/UPD/Del type statements.


Integration Testing (Selenium etc)

DevOps – one company said “DevOps = Automation”. So they had thousands unit and integration tests. Code would flow with no human intervention from commit to production. As soon as they release a CSS change though to prod, they start losing money. As soon as they started looking at ecommerce website – summary page made final buttons and the text the same color- this resulted in a blank button! Automation clicked it because the ID of the button did not change. So, automate everything you CAN – don’t try to automate things you shouldn’t.

Best practices for building integration tests: the #1 thing is making sure assumptions you made when you recorded your automation are still valid. Usually the database in QA is the failing point: the code isn’t broken but the data it runs against doesn’t fit assumptions. See Redgate’s excellent tool SqlClone here.) They are a few versions behind with their Selenium driver, using IE.

“We spend more and more time with peer testing than we do with auto testing on the Microsoft VS team. As we’re our own first customer, there’s a lot of A/B testing that happens. Integration testing happens very little, unless there’s a performance issue – then we move it to GoBig environment where we can perf test.” Per Brian Harry – “Unapologetically we do testing in production.”

Donovan ran a team in Akron where they were not allowed to write a line of code until they wrote a UI test (in CodedUI, but now we do this in Selenium). They wrote automation before they wrote code – wrote so it fails, then refine until it passes. This and a manual test case – which can be given to a stakeholder in plain English (“oh, I didn’t expect I should be pressing that button”) – there’s nothing better than a manual test case to be your new acceptance criteria. Automation and UI tests are brittle – they become brittle/break, that didn’t go away, but having it there forced my teams to do the right thing early. (These were one week sprints, Weds to Weds.) Do it more often if it hurts! Finding clever ways of making BACPAC/data layer work.

VS teams – forced them to move more functionality to behind the scenes, UI testing. Make unit testing viable. To those nerd architects out there claiming that “Unit tests are worthless”! – we run 41000 unit tests in VSTS and it takes 6 minutes. You can’t run integration tests that fast.


Infrastructure as Code (ARM templates) and Configuration Management (PowerShell DSC)

ARM (Azure Resource Manager) is the name of the game here to get Infrastructure as Code working. This is a much better process than spinning up resources manually for example. (Not configuration here – doesn’t include IIS, etc. )

Recommendation: Introduce your Infra team to ARM templates. ARM templates are Azure only but you can use Terraform to point to Azure/multi-cloud solution so you can move resources to Google/AWS etc.

Configuration as Code – using PowerShell DSC (Desired State Configuration) – take configuration of that server and you codify it as well. (I demand you have port 80 open, .NET 4.5, IIS available, these security roles etc.) Server installs everything you need to bring it to ready state. Every 15 minutes – checks to make sure it fits standard config. So, if it finds that IIS is turned off, it will detect if the configuration is drifting and make a note in the log. You can also configure it to bring it back to standard state.

PowerShell DSC works for both Linux and Windows environments. And, it’s free! (p.s. Chef uses DSC as well so it’s a layer on top of a layer.)

Recommendation: Start with what you have and what’s free with PowerShell DSC and THEN make an evaluation if there’s features that are missing. This could take <1 hour. From Donovan – “I encourage you to ping me on Twitter if DSC is not getting the job done. If we can’t fix it fast enough, then go to Chef or a competitor.”

DSC can be done onprem/etc. It’s idempotent, which means you can run it blindly a million times in a row and the server will stay exactly the way it needs to.



Self Service VM’s with Dev/Test Labs

Struggling with self service VM’s and exposing costs of resources to the consumers you service? (I.E. that 5 9’s coming at a horrendous price that is shielded from end users/business?) Think about Dev/Test Lab. This is Self service – no blank check, you can spin up a sandbox (using allocated $ your customers buy each month) and enforces the VM images they are allowed to spin up.

Dev/Test Lab documentation – https://docs.microsoft.com/en-us/azure/devtest-lab/

Overview and trial site hub – https://azure.microsoft.com/en-us/services/devtest-lab/

Starting documentation, very good – https://docs.microsoft.com/en-us/azure/devtest-lab/devtest-lab-overview



Of advantage because releases now are micro sizes. No more having to track down 6 months back with a dictionary of changes. Would eliminate webmethods – ESB. Can rev very quickly on microservices without jeopardizing the application as its very small, atomic pieces of functionality.

Meant to be combined with containers.

Not a silver bullet. Evaluate it. Microsoft is going there on VSTS team. Docker support is built into VSTS – can publish to a registry. Even TFS 2015 update 3-4 has Docker support as well. Need to also run Win Server 2016 to support containers, or Linux containers. Anniversary edition of Windows 10.


A few thoughts on success.

Hey all, it’s been awhile so I thought I’d write a few words.

I had someone new to Microsoft call me up today and ask me for some thoughts on how to succeed here. I hung up after chatting for a few minutes about time management and learning to say no – modesty – and how to work in larger teams. It’s a topic that is interesting to me – as every company has its own dynamic (one that spills over into the technical arena) – and I often find myself wondering if I could improve in this area.

If you haven’t already read the Four Agreements, I would highly recommend it. It’s one of those short books that’s really easy to overlook. Besides the Bible, I can’t think of another book that’s had such a high impact on my life, for the better. If you want to make a successful career though – it’s hard to give better advice than the following:

  • Be Impeccable with Your Word. This means if you say you’re going to be somewhere or do something, you keep your word. It also means you need to be very careful with what you say yes to. A modest person knows they have limitations, and they’ll be careful not to overload themselves. (Something I’ve had to learn the hard way a few times in my recent assignments!)
  • Don’t take anything personally. Everyone is the hero of their own story and their actions – while sometimes senseless to us or worse – always make sense in their own personal universe. Do your best in working with others to be accepting, to listen first and understand (reflect back actively what they’re saying so they know they’re being heard). Once you understand their point of view and motivations, it’s much easier to work towards a compromise that satisfies both parties.
  • Don’t make assumptions. We assume that others think the we think, feel the way we feel, judge the way we judge. The only way to keep ourselves from making assumptions is to not be afraid to ask questions. And we keep asking questions until we understand clearly. Until you know – this is what I want, this is what you want. The unstated becomes clear, no secrets.
  • Always do your best. This means trying your damndest at everything you take on, and doing it without distraction. Now, your “best” is a relative term – it doesn’t mean perfect, and your best will vary from day to day. Some days just getting up and responding to emails is “your best”, on other days you’ll be a worldbeater and knock down walls. Don’t try to solve problems in a low state. We all have ups and downs due to low sleep, stress, personal problems etc. As my good friend Jim Caballero once told me, “people are the only thing that matters”. I’ve rarely regretted the way I’ve started or left a position, but I do regret many times falling short in my interpersonal relationships. That thoughtless exchange of words with a coworker when I was feeling a little down, the meetings that did not go well, etc. Many times when I am low on sleep now I try to avoid one on one meetings – so I don’t have to spend the following day mending fences I’ve broken down!


I’m definitely not holding myself up as an example in any of these things! But I will say – over the past two decades – learning by experience how to apply these four points has made a huge difference in a smoother, less bumpy career and a higher ceiling. It’s been said that we get hired for our abilities – we get fired for our personality. True words! So keeping in mind that old saying “the tongue is a fire” – guard your words, be careful with your words (especially those committed to writing – emails, Facebook, etc), and show empathy. You’ll get more done and will be happier doing so!

Monitoring, and Why It Matters To You.

I’ve done a few articles on Application Insights – (older ones here and here) – but none yet on Operations Management Suite, because 1) I’m not IT in my background and 2) I’m busy leave me the heck alone! (I kid, we’ll get to it eventually.) These have all been more how-to – and admittedly it’s so easy I hesitate to call it that – versus why to. But last night on a plane coming back from Kansas City, I was mulling this over. (It helps that I had the excellent if somewhat clunky “The Art of Monitoring” on my Kindle.)

Monitoring has long been the secret sauce of DevOps. How else do we get feedback on our priorities, and actual metrics – not guesses – on which features are in use? What’s often overlooked though is that it can actually help you fight back against the wrong kind of change management – one that increases your bureaucratic workload and actually makes your build riskier and harder to fix. How is that possible?

The Blame Game

Let’s start with some basic negative cycles we’ve all seen when there’s very visible production outages. When bad things happen in production, we immediately start seeing the oddest thing happen – the SDLC process starts to dissolve into this negative cycle of blame and recriminations.

Take the example of Knight Capital in 2012. My good friend Donovan Brown often cites this as a warning example. Here, one messy 15 minute deployment led to 440M loss. In the wake of a disaster like this, John Allspaw noted that there are two counterfactual narratives that spring up:

  1. Blame change control. “Hey, better CM practices could have prevented this!”
  2. Blame testing – “If we had better QA, we at least could have taken steps to detect it faster and recover!”

It’s hard to argue with either of these. And it’s true, the RIGHT kind of change controls do need to be implemented. But by clenching like this, as Gene Kim has noted in The DevOps Handbook, “in environments with low-trust, command and control cultures, the outcomes of their change control and testing countermeasures end up hurting more than they help. Builds become bigger, less frequent and more risky.” Why is this?

This is because the devs/QA team begins implementing increasingly more clunky testing suites that take longer to execute, or writing unit tests that frequently don’t catch errors in the user experience. In a pinch, the QA team begins adding a significant amount of manual smoketesting versus automated tests. Management begins imposing long and mandatory change control boards every week to approve releases and go over introduced defects from the previous week(s) – I’ve seen these groups grow into the 100’s, most of whom are very far removed from the application. More controls, remote gatekeepers and a manual approval process leads to increased batch sizes and deployment lead times – which reduces our chances of a successful deployment for both dev and Ops. Our feedback loop – the times stretch out, reducing its value. A key finding of several studies is that high performing orgs relied more on peer review and less on external approval of changes. The more orgs rely on change approval, the worse their IT performance in both stability (MTTR and change fail rate) and throughput (deployment lead times and frequency).

This is often where I tell the story of my dad and I, trying to cut down a few trees for my uncle that had fallen across a local creek in NW Washington in a storm. The river had risen several feet and we city boys were standing below the dam formed by these large tree trunks. I remember looking up at the water swelling and pushing against the trees, as we were cutting into them, and thinking what those several feet of water would do once released. It didn’t take a lot of imagination to picture the outcome – two idiots being swept out to the Pacific Ocean – but the problem was my uncle was standing a few dozen feet away, hand on his hips, watching us with his lips tight in a disapproving line. I told my father, “Dad, I don’t care what it takes, but we need to find a way of breaking that chainsaw!” That’s the kind of backlog that can form that can choke your release cycle, reducing flow and increasing build sizes and risk. (And, by accidentally dunking the chainsaw, we were able to successfully kill the project and earn the lasting contempt of my uncle – “I want to thank you boys – it’s been a long time since I’ve been to the circus!”)

Telemetry To The Rescue

The main issue above is that this overreactive organization was trying to prevent errors and bugs from happening. Sometimes, they even call their recap (punitive!) meetings “Zero Defect Meetings” – as if such a kind of operational perfection is attainable! In contrast, DevOps savvy companies don’t try to focus on MTBF – reducing their failure count. They know outages are going to happen. Instead, they try to treat each failure as an opportunity – what test was missing that could have caught this, what gap in our processes can address this next time? Especially they focus on improving their REACTION time – improving their time to recovery, MTTR (Mean Time to Recover). Testing and automated instrumentation – that famous passage about wanting “cattle not pets”, i.e. blowing away and recreating environments at whim – forms the heart of their adaptive, flexible response strategy.

Puppet Labs – in their excellent 2014 “State of DevOps” report – mentioned that organizations that want to improve on their reaction time (MTTR) benefit the most – and it’s not even close, by an order of magnitude – from two technical tools/approaches:

  • Use of version control for all production artifacts – When an error is identified in production, you can quickly either redeploy the last good state or fix the problem and roll forward, reducing the time to recover.
  • Monitoring system and application healthLogging and monitoring systems make it easy to detect failures and identify the events that contributed to them. Proactive monitoring of system health based on threshold and rate-of-change warnings enables us to preemptively detect and mitigate problems.

We’re going to talk about monitoring above. How can monitoring help turn the tide for us so we don’t overreact because of a production outage?

So above we can see a few fixes that can transform that reactive, vicious cycle into a responsive but measured virtuous cycle that addresses the core problems you’re seeing in PROD. Some are nontechnical or more process related than anything else – and note that fixing the issue starts with purity of code – as early in the process as possible:

  1. Adding or strengthening production telemetry (we can confirm if a fix works – and autodetect next time)
  2. Devs begin pushing code to prod (I can quickly see what’s broken and make decisions to rollback vs patch). Note on this, a rollback – going to a previous version – is almost always easier and less risky. But sometimes fixing forward and rolling out a change using your deployment process is the best way forward.)
  3. Peer reviews. This includes not just code deployments but ops/IT changes to environments! (remember the Phoenix project, 80% of our issues caused by unauthorized changes, often by IT to environments, 80% of our time stuck figuring out what in this soup of changes caused the issue – before we even lift a finger to resolve anything! I’ll write more about how to do a productive peer review – expecially pair programming, which is really a code review on programming – later.)
  4. Better automated testing (again, more on this later. Look at Jez Humble’s excellent Continuous Delivery or Agile Testing for more on this.
  5. Batch sizes get smaller. The secret to smooth and continuous flow is making small, frequent changes.

A key driver here though is information radiators- a term that actually comes from Toyota’s Lean principles. This creates a feedback loop, which broadcasts back issues as quickly as possible, radiating information out on how things are going.

Etsy – just to take one company as an example – takes monitoring so seriously that some of their architects have been quoted as saying their monitoring systems need to be more available and scalable than the systems they’re monitoring. One of their engineers was quoted as saying, “If Engineering at Etsy has a religion, it’s the Church of Graphs. If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving yet, just in case it decides to make a run for it. Tracking everything is the key to moving fast, but the only way to do it is to make tracking anything easy. We enable engineers to track what they need to track, at the drop of a hat, without requiring time-sucking configuration changes or complicated processes.”

Another great thinker in the DevOps space, Ernest Mueller, has said – “One of the first actions I take when starting in an organization is to use information radiators to communicate issues and detail the changes we are making. This is usually extremely well received by our business units, who were often left in the dark before. And for Deployment and Operations groups who must work together to deliver a service to others, we need that constant communication, information and feedback.

I know I found that being true in my career. I discovered this fairly early on in my adoption of Agile with some sportswear companies here in the Oregon region. I worked for some very personality-driven orgs with highly charged, negative dynamics between teams. As I adopted Agile, which meant broadcasting honest retrospectives – including my screw-ups and failure to meet sprint goals – I expected a Donkey Kong type response and falling hammers. The most shocking thing happened though – the more brutally honest and upfront I was on what had gone wrong, I found myself having a better relationship with the business and my IT partners. And, mistakes we made on the team were owned up to – and they typically didn’t repeat, not without the group holding the culprit (including me) responsible. That kind of “government in the sunshine” type transparency and candor was the biggest single turning point of our Agile transformation.

It’s been said, rightly, that every lie we tell ourselves comes with a payoff and a price.
I believe that very much to be the case. For developers or IT, we’ve been very used to thinking we are AWESOME and WONDERFUL and the OTHER GUYS are cowboys/bureaucratic tools and are EVIL. Maybe that story – which has the short term payoff of making us feel virtuous – comes with a heavy price, of limiting our success in rolling out easy to manage and maintain applications and delivering business value faster. By using instrumentation and telemetry, we demonstrate that we are not lying to ourselves or to our customers/the business. And suddenly a lot of those highly charged, politically sensitive meetings you find yourself in lose a lot of their subjectivity and poison – the focus is on improving numbers versus the negative punish/blame scenario.

In Closing

  • Like testing, instrumentation and monitoring seems to be a bolt on or an afterthought in every project. That’s a huge mistake. Make instrumentation and metrics the backbone of your DevOps movement, as it’s the only thing that will tell you if you’re making specific progress and earn you credibility in the eyes of the business.
  • Don’t let your developers tell you that it’s too hard or have it be an afterthought. It takes just a few minutes to make your release and application availability metrics available to all.
  • And if your telemetry system is difficult to implement or doesn’t collect the metrics you need, think about switching. Remember the Etsy lesson – making it easy and quick is the way to go. (which is why I really like App Insights!)


Application Insights, and what it can do for you…

I’m giving some presentations over the next few weeks on dashboarding – specifically using Application Insights. I thought I would write up some of the things I discovered in doing some prep research, including a full walkthrough so you can try it on your own projects.

First off, I think most of us have fooled around with Application Insights – it’s a checkbox on creating a project in Visual Studio for gosh sake. And maybe, we got frustrated with some of its limitations – “It doesn’t aggregate well! It’s not customizable!” – and gave up in frustration. Well, we may have quit on it too soon. Microsoft is quite committed to it as a tool – and from what I’ve seen, compared to its very inflexible and stale early iterations, is light years ahead of what I thought. It’s easy, painless. In short, there is no longer any excuse for not dashboarding your code.

I’ll do a followup post in a few days on WHY this is important. For now, here’s some quick steps to try playing with it yourself.

Getting Started

You may want to begin with some videos to kickstart some interest. First, here’s some good videos to set the stage – here’s a good intro video, another on app availability, and another with a little more detail on usage monitoring.

But let’s step out of the documentation for a second and let’s talk demo.


Short List of Steps

  1. Build a MVC app in Visual Studio and publish it to Azure. Give it a very specific name and publish to East US. (at least, that’s what my subscription allowed!)
  2. Enable Application Insights.
  3. Open up Application Insights.config and look at the info. This is how you do addl perf counters. (see this API doc for more)
  4. R mouse click on the project and select Application Insights. This opens up the portal. Pin it to the dashboard.
  5. Click on some items here and explore modifying the chart, etc. Look at page view load times for example.
  6. Now, notice we don’t have usage data. Click on dashboard, click on Usage. Add this Js script to header on layout.cshtml. Now, when we rebuild, we’ve got actual usage times and can track it.
  7. If you wanted to do A/B testing, look at this page. Per this page, you can add tags to help segment out your errors. This can be done using either Javascript or C#/VB. This is also how you do A/B testing BTW. You put version numbers in the C#/Js. Overall in web.config
  8. Dashboard – cover:
    1. Metrics Explorer – drill down to server log telemetry
    2. Modifying charts (add Server Performance Counters)
    3. Add alerts (browser page load time)
    4. Application Map
    5. Availability (here is where you add a basic ping test every 5 minutes. Make sure you turn this off post-event!)
    6. Overview Timeline


In short, we’ll be covering MOST of the items below:

In Greater Length

Build a MVC app in Visual Studio and publish to Azure. Give it a funky name and publish to East US.

Enable Application Insights. If you have an existing app, no biggie – right mouse click on the project in Solution Explorer and enable it, then copy that single line of Javascript onto your page(s) as needed. In our case though, we’re checking the box to create the project with A.I. installed. That makes it soooo easy. And as lazy programmers – that’s what we want right?

You’ll notice there’s a very lightweight touchpoint here in your project – a few references and a new config file. Open up application insights.config and look at info:

Wow, that’s pretty easy! Check out the API I mentioned above in the short version for a list of all the goodies you can instrument/measure.

Now let’s check out our dashboard. R-mouse click on project – open up Application insights. This brings you to the Azure Portal. Notice you now have a new App Insights dashboard available to you. Go ahead and if you wish r-click to pin it to the dashboard.


Clicking on this reveals some interesting, built in instrumentation. I mean, look at all these goodies!

You can edit the chart – just click on the top right – to add some metrics that you care about.

Go to main dashboard. Notice we have no user data collection. This means we’re blind when it comes to geography/OS/browser – defining our users and the features they are liking. No biggie – clicking on the text in the chart where it says “click here to view more about usage data” or something like that – you’ll see a snippet of code available. Copy that – and let’s paste it into the _Layout.cshtml file header of our app, like so:

Per this page, you can add tags to help segment out your errors/operations. This can be done using either Javascript or C#/VB. This is also how you do A/B testing BTW. It’s extremely powerful – putting a parameter like “Version 2.1” for example. Now, you can tell when your newest version of the application is performing slower than it should for a subset of users on your production boxes. And, using deployment slots or feature toggling, you can safely kill it or fix in place without a widespread service outage.

Experiment with adding a web test as well – which is really where the rubber hits the road. Now you can have different geos- Asia, N America, etc – reporting on the true end user experience. No more guesswork about how your new app is doing in Japan for example!

Note here that you may have to remove spending limits (as I had to) with web tests.

On logging – this is a simple mod to a web.config. See this page for more on this – https://docs.microsoft.com/en-us/azure/application-insights/app-insights-asp-net-trace-logs. If you use NLog, log4Net or System.Diagnostics.Trace for diagnostic tracing in your ASP.NET application, you can send your logs to Application Insights – which can then be merged with the other telemetry coming from your application, so that you can identify the traces associated with servicing each user request, and correlate them with other events and exception reports.

I was very impressed with how quickly Application Insights could allow me to drill down to a particular event for example. Check this screenshot out below:


Click on Metrics Explorer. Experiment with drilling through to stack trace of event logs, adding tags.

Notice how sweet and easy it is to add alerts. Now I can finetune my alerts so they’re actually sending me valuable information – instead of it being “your dog has died” kind of dead data.

Finish editing the charts so that they are showing metrics on Usage like

  • Usage – Browser Page Load
  • Process IO Rate
  • Users


And I do love the simple, easy instrumentation that comes with web tests. Notice – this will cost you over time, so be sure to turn it off (disable) it if not in active use!


A little more detail on my web test settings, see below:

More For The Future

Starting and Scaling DevOps in the Enterprise – review

As many of you know, I’m a huge fan of the work Gary Gruver has done – in particular his book “Leading the Transformation” on his experiences at HP trying to transform a very traditional enterprise. (See my earlier mention of his book on this blog, here.) His newest work is out – Starting and Scaling DevOps in the Enterprise. I am recommending it very highly to all my customers that are following DevOps! I think its unique – by far the best I’ve read so far when it comes to putting together specific metrics and the questions you’ll need to know in setting your priorities.

Gary notes that there are three types of work in an enterprise:

  1. New work – Creating new features or integrating/building new applications
    1. new work can’t be optimized (too much in flux)
    2. Best you can hope for here is to improve the feedback loop so you’re not wasting time polishing features that are not needed (50%+ in most orgs!)
  2. Triage – finding the source of defects and resolving
    1. Here DevOps can help by improving level of automation. Smaller batch sizes means fewer changes to sort through when bugs crop up.
  3. Repetitive – provisioning environments, building, testing, configuring the database or firewall, etc.
    1. More frequent runs, smaller batches, feedback loop improved. All the DevOps magic really happens in #2 and #3 above as these are the most repetitive tasks.

Notice of the three types above – the issues could be in one of five places:

  1. Development
    1. Common pain point here is Waterfall planning – i.e. requirements inventory and a bloated, aging inventory)
  2. Building Test Environments
    1. Procurement hassles across server, storage, networking, firewall. Lengthy handoffs between each of these teams and differing priorities.
    2. Horror story – 250 days for one company to attempt to host a “Hello World” app. It took them just 2 hours on AWS!
  3. Testing and Fixing Defects – typically QA led
    1. Issues here with repeatability of results (i.e. false positives caused by the test harness, environment, or deployment process)
    2. Often the greatest pain point, due to reliance on manual tests causing lengthy multi-week test cycles, and the time it takes to fix the defects discovered.
  4. Production Deployment – large, cross org effort led by Ops
  5. Monitoring and Operations

The points above are why you can’t just copy the rituals from one org to another. For any given company, your pain points could be different.


So, how do we identify the exact issue with YOUR specific company?

  1. Development (i.e. Requirements)
    1. Metrics:
      1. What % of time is spent in planning and documenting requirements?
      2. How many man-hours of development work are currently in the inventory for all applications?
      3. What % of delivered features are being used by customers and fit the expected results?
    2. An important note here – organizations often commit 100% of dev resources to address work each sprint. This is terrible as a practice and means that the development teams are too busy meeting preset commitments to respond to changes in the marketplace or discoveries during development. The need here is for education – to tell the business to be reasonable in what they expect, and how to shape requirements so they are actual minimum functionality needed to support their business decisions. (Avoid requirements bloat due to overzealous business analysts/PM’s for example!)

  1. Provisioning environments
    1. Metrics:
      1. How much time does it take to provision environments (on avg)
      2. How many environments are requested per month/sprint
      3. % of time these environments require manual fixing before they are complete
      4. % of defects associated with non-code – i.e. environments, deployments, data layer, etc.
    2. The solution here for provisioning pinch points is infrastructure as code. Here there is no shortcut other than developers and IT/operations working together to build a working set of scripts to recreate environments and maintaining them jointly. This helps with triage as changes to environments now show up clearly in source control, and prevents DEV-QA-STG-PROD anomalies as it limits variances between environments.
    3. It’s critical here for Dev and Ops to use the same tool to identify and fix issues. Otherwise strong us vs them backlash and friction.
    4. This requires the organization to have a strong investment in tooling and think about their approach – esp with simulators/emulators for companies doing embedded development.

  1. Testing
    1. Metrics
      1. What is the time it takes to run a full set of tests?
      2. How repeatable are these? (i.e. what’s the % of false errors)
      3. What % of defects are found with testing (either manual, automated, or unit testing)
      4. What is the time it takes to approve a release?
      5. What’s the frequency of releases?
    2. In many organizations this is the most frequent bottleneck – the absurd amount of time it takes to complete a round of tests with a reasonable expectation the release will work as designed. These tests must run in hours, not days.
    3. You must choose a well-designed automation framework.
    4. Development is going to have to change their practices so the code they write is testable. And they’ll need to commit to making build stability a top priority – bugs are equal in priority (if not higher than) tasks/new features.
    5. This is the logical place to start for most organizations. Don’t just write a bunch of automated tests – instead just a few automated Build Acceptance Tests that will provide a base level of stability. Watch these carefully.
      1. If the tests reveal mostly issues with the testing harness, tweak the framework.
      2. If the tests are finding mostly infrastructure anomalies, you’ll need to create a set of post-deployment tests to check on the environments BEFORE you run your gated coding acceptance test. (i.e. fix the issues you have with provisioning, above).
      3. If you’re finding coding issues or anomalies – congrats, you’re in the sweet spot now!
    6. Horror story here – one company boasted of thousands of automated tests. However, these were found to not be stable, maintainable, and had to be junked.
    7. Improve and augment over time these BATs so your trunk quality gradually moves closer to release in terms of near-produciton quality.
      1. Issue – what about that “hot” project needed by the business (which generally arrives with a very low level of quality due to high pressure?
        1. Here the code absolutely should be folded into the release, but not exposed to the customer until it fits the new definition of done: “All the stories are signed off, automated testing in place and passing, and no known open defects.”

  1. Release to Production
    1. If a test cycle takes 6 weeks to run, and management approval takes one day – improving this part just isn’t worth it. But if you’re trying to do multiple test cycles a week and this is the bottleneck, absolutely address this with managers that are lagging in their approval or otherwise not trusting the gated testing you’re doing.
    2. Metrics
      1. Time and effort to release to production
      2. Number of issues found categorized by source (code, environment, deployment process, data, etc)
      3. Number of issues total found in production
      4. MTTR – mean time to restore service
      5. # of green builds a day
      6. Time to recover from a red build
      7. % of features requiring rework before acceptance
      8. Amt of effort to integrate code from the developers into a buildable release
    3. For #1-4 – Two areas that can help here are feature toggling (which you’ll be using anyway), and canary releases where key pieces of new functionality are turned on for a subset of users to “test in production.”
    4. For #5-6 – here Continuous Integration is the healer. This is where you avoid branching by versioning your services (and even the database – see Refactoring Databases book by Scott)
    5. For #7-8 – If you’re facing a lot of static here likely a scrum/agile coach will help significantly.


So – how to win, once you’ve identified the pain points? You begin by partitioning the issue:

  • Break off pieces that are tightly coupled versus not developed/tested/deployed as a unit. (i.e. HR or Purchasing processes)
  • Segment these into business critical and non-business critical.
  • Split these into tightly coupled monoliths with common code sharing requirements vs microservices (small, independent teams a la Amazon). The reality is – in most enterprises there’s very valid reasons why these applications were built the way they are, You can’t ignore this complexity, much as we’d like to say “microservices everywhere!”

I really admire Gary’s very pragmatic approach as it doesn’t try to accomplish large, difficult things all at once but it focuses on winnable wars at a company’s true pain points. Instead of trying to force large, tightly coupled organizations to work likely loosely coupled orgs – you need to understand the complex systems and determine together how to release code more frequently without sacrificing quality. Convince these teams of DevOps principles.