“Hands On Kubernetes” – first thoughts

So these are my first thoughts on the Hands on Kubernetes book by Nills Franssens et al…

It’s a solid book! I enjoyed especially the first chapter where it’s setting the stage for the demos / hands on work in the following chapters. Obviously there’s been some new additions in what capabilities Azure Kubernetes Service (AKS) has to offer since, which I’ll try to explain.

In chapter 2, the book asks you to create a new AKS cluster using the Azure Portal. THis could be done with CLI, ARM, or Terraform of course… The new thing here is the ability to create a cluster with Azure Arc. That allows you to use things like Azure Policy and Azure Monitor to control / observe your containers when you’re running them onprem (say using VMWare vSphere or Azure Stack HCl), or using Google Cloud or AWS. For now though we’ll just create a straight up cluster:

… which once you’re done with all the options should take about 5 minutes or so, on US West 3 region. A few notes here – you’re going to NOT want to set up Availability Zones (of course you would do this for a prod workload), and you want a STANDARD setup of 2 nodes (not Dev/Test, which would normally be the best pick) – because we’re going to want to experiment with Azure Monitor to check our observability. The AKS pricing tier you want is “free”, and the node count range is a new option – select “2-5”.

A quick snapshot of the bare bones setup we’re using here:

When that AKS cluster is finished being spun up, you’ll see something like the following:

What happened here exactly though? Select “Go to resource” – and you’ll be able to inspect what was created. For example, the Resources section shows any running deployments / pods, and it’s where you create new resources. In the Node pools, you can scale up/down by adding nodes – and add a new node pool, potentially even with a different (beefed up) VM size. In the Cluster config page, you can upgrade the control plane – and then the individual node pools in a followup step. This is also where you’d enable RBAC or integrate with Azure AD.

Insights though is where we can actually view the cluster’s utilization and how it performs under load:

From here it’s a walk in the park. You COULD download from GitHub the source materials for the book. I found selecting the quick start application gave me a very nice starting point… a simple Voting app that’s easy to create / destroy:

The next option to play with is the last one – “Connect to cluster”. From here you’re given all the information you need to connect via Bash or Azure Cloud Shell to your newly created resources. This is where you can run some of the commands noted in the book as we’re starting to play around with cmd line explorations –

kubectl get node 

az aks get-credentials --resource-group rg-handsonaks --name handsonaks

… And that’s it for now. You can easily go into the control panel again and remove the entire resource group to bring yourself back to a clean start state.

Closing Thoughts and Next Up

I feel like here’s a good point though to talk about the WHY of things… Software development seems to be making these leaps forward about every decade. In the early 2000’s, the big change took the form of a pattern and a practice – the pattern being Scrum and Agile, the technology / process taking the form of source control. Skip forward another 10 years, to say about 2014, and the leap forward was DevOps. Again this change becomes a pattern (Infra as Code), and a practice (CI/CD, and config mgmt)… The upcoming change seems to still be taking shape, but the 2020’s definitely seem to be the era when the pattern of microservices is achieving dominance – with the tech behind this being Kubernetes and Docker. The stuff we used to hear about “microservices only being for the large enterprises” or “Kubernetes isn’t meant for production workloads” is just not holding any water – it’s FUD.

Other things I want to play with down the road:

Here’s a good article on the strangler fig pattern I mention in my book (one of several off Martin Fowler’s site) – this is the “why” of what we’re doing with microservices / kubernetes. The author makes two points essentially – successful efforts don’t try to boil the ocean (we try to migrate as small of fx as we can get away with), and end with the old code path being retired (vs left in place). https://martinfowler.com/articles/break-monolith-into-microservices.html#TheJourneyGuide
Creating an AKS cluster using Terraform – How to create AKS Cluster using Terraform | Setup Azure Kubernetes Cluster(AKS) in Azure Cloud, and a rather lackluster Azure Friday presentation – Provisioning Kubernetes clusters on AKS using HashiCorp Terraform | Azure Friday
I do love this walkthru on using Helm. Helm manages Kubernetes charts, packages of preconfigured Kubernetes resources. It’s lifecycle management for AKS apps.
If you’ve got a lot of subscriptions – all on Azure – you need to manage, or AKS clusters from different regions that need governance – the new(er) Kubernetes Fleet Manager resource could be interesting. There’s a quickstart, and some documentation, here.

Monitoring What Matters – John-Daniel Trask of Raygun

I had a great talk recently with one of my favorite peeps – John-Daniel Trask, CEO of Raygun. We talk about the importance of monitoring and making that connection with the customer experience, and what he’s seen go right – and wrong – in working with companies large and small. We’re huge fans of Raygun and see this company’s growth as a natural byproduct of producing the right product that reinforces all the behaviors we want out of the DevOps movement. Enjoy!

I always enjoy talking with John-Daniel and he was a big factor in monitoring and metrics taking up so much room in my book. Here’s some of the topics we cover:

How to aggregate errors so you don’t feel like you’re putting out a tire fire with a water pistol
Best practices around real user monitoring, crash monitoring and APM
First things first; why crash reporting should be the first thing you port out
How John-Daniel is adjusting to life as a new father (welcome Henry!) and what it was like growing a global business as a young entrepreneur
How Google changed the game around how responsive and user-centric websites and services are
A fact we often forget: software is written ultimately for humans. “Software gives us the power to amplify human ability.”
Another great all-time quote, from his mentor: “It’s not the big that eats the small, it’s the fast that eats the slow.”

One last great quote to end on: “DevOps is about making engineering teams as reliably fast as possible.”

A link to the interview is here – and it’s on the podcast platform of your choice. Apple, Google, Spotify, blah blah…. We’re on all the major platforms now, including Anchor, Apple, Google, Spotify, PocketCasts, and RadioPublic. Please support the podcast, and we’d love to hear your feedback about the book!

Enjoy the podcast!

LaunchDarkly and feature flags

Had a friend ask me for some videos around Feature Flags. There’s no shame in admitting that I’m a huge fan of feature flags; it seems like one of those no-brainers when it comes to making releases faster and safer. Without them, I’m not sure how close we can possibly get to true “continuous delivery” even for smaller sized projects.

As I think some of you might be interested as well – here’s some videos and web references below. I hope to expand on this with some more in-depth demos down the road. This is going to come across like I’m shilling for LaunchDarkly. (In all fairness, I’m not the only person at MSFT that loves them.) But when it comes to FF I’m not sure if there’s another vendor in that space that offers what they do.

How the VSTS team uses LaunchDarkly for their performance tests (12/15/2017). (And why is this a good idea? Check out this postmortem by Brian Harry from 2013)
The decisionmaking behind the Azure DevOps team choosing LaunchDarkly and some recommendations for vetting vendors/making data driven decisions around FF (and a related article on using release rings – Users, Early Adopters, Canaries.)
Flags versus branching. FF definitely help make trunk-based development much more practicable for developers.
A good overview of the positives around FF and how they’ve helped the VSTS team gate releases.
Buck Hodges on using FF to encourage experiments. A GREAT whitepaper. The 17 minute video is good as well. And another video from Ben Waskow is here.
A good lab for engineers to walk thru “Hello World’ style.

And some more references from my book:

[harris] – “Using feature flags in your app release management strategy”, Richard Harris. App Developer Magazine, 4/19/2018.

Posted in DevOps on October 23, 2019 by elvisboats. Leave a comment

Four questions around testing and Microsoft’s progress with DevOps

I was at a conference earlier this week and we got some outstanding questions about how Microsoft went about their transformation – especially with the Azure DevOps team. I want to build on this with a followup post going into more depth on our use of culture and automation – but here’s a good place to start with some great links.

Question #1 – How do we handle planning on a strategic level with the more tactical focus of Agile?

Using features and epics – I love this – https://docs.microsoft.com/en-us/azure/devops/boards/backlogs/define-features-epics?view=vsts&tabs=new-nav

How we handle epic level planning at Microsoft – Cloud9 Donovan Brown interview with Aaron Bjork, 34 minute video. Excellent.

Note there’s a relatively new feature in Azure DevOps to report across teams called Delivery Plans. More information on this and how to set up reporting is here. A detailed walkthru on implementation is here.

Question #2 – How did Microsoft go about their transformation to DevOps from a shared services model?

See my interview with Aaron Bjork, and the outstanding YouTube video he did in the footer. One of the best 45 min videos on the topic I’ve ever seen – anywhere. https://driftboatdave.com/2018/05/30/devops-stories-aaron-bjork-microsoft/

A great article on “What is Agile” from Aaron Bjork. https://docs.microsoft.com/en-us/azure/devops/learn/agile/what-is-agile

Our best DevOps “how and why” articles are by Sam Guckenheimer – great guy – see this central hub for more. https://docs.microsoft.com/en-us/azure/devops/learn/what-is-devops

Question #3 – What about testing? (This is usually one of our biggest blockers to improve release reliability and velocity – an unreliable, flaky test layer)

Great interview page with Munil – https://docs.microsoft.com/en-us/azure/devops/learn/devops-at-microsoft/shift-left-make-testing-fast-reliable

Also on Eliminating Flaky Tests (we want red to mean red) and our unabashed use of testing in production. This caused great angst for our developers but ended up being the single biggest contributor to our success.

Our branching and release strategy. We did experiment with GitHub Flow but found it didn’t work in our specific case. We do work off of trunk – long-lived feature branches (>1 day) are verboten.

We like rotating “F” and “L” teams so some of your people are handling direct (livesite) report, others are focused on development. Note that this doesn’t mean 100% of all support calls hit devs directly – it might just be 5% – but devs must share some operational support for DevOps to work. And you want to tune your alerts so if someone is getting woken up at 5 am, there’s a damn good reason – the Google SRE book and “Practical Monitoring” goes into more detail. See Aaron bjork’s overall presentation I mentioned before, or this page for some great videos on our move to a livesite culture.

We also shifted left on security. Here’s a good walkthru.

How we moved from a monolith to cloud-based microservices, by Buck Hodges. He says point-blank that moving to the cloud the way MSFT did – lifting and shifting – was a big risk, like jumping off a cliff. Looking back, it ended up being the best way forward.

Using feature flags and release rings to control the blast radius.

Question #4 – Production Support. Let’s say we have an Agile team, 8-12 people. How the heck are we supposed to do global support across multiple regions, 24x7x365 in production?

Short answer – the only way this will work is if you 1) make sure you’re only supporting a small sliver of functionality, 2) that you gate the support demands upon your devs so it’s <50% of their time i.e. the SRE model. More than likely you’re going to have some operational support – even offshore or 3rd party – handled externally to the team. 3) alerts are tuned so that only truly important things make it through. I talk about this extensively in my book; the books “The Art of Monitoring” and “Practical Monitoring” also elaborate on this.

Posted in DevOps on October 20, 2019 by elvisboats. Leave a comment

Betsy Beyer and Stephen Thorne

Just published a great podcast interview with Betsy Beyer and Stephen Thorne of Google, coauthors of the incredible “Site Reliability Engineering Workbook”. We cover a lot of ground in this interview, including how Google learns from failures, what toil is and how good organizations try to fight it, and when LESS reliability can be a good thing in software development.

Betsy and Stephen’s work around publicizing and breaking down the myths surrounding SRE have been huge boons to our industry. Their writing (along with a few other Googlers) had a big impact on the Achieving DevOps book; we find their work and thinking incredibly helpful and influential. You’ll love it too!

A link to the interview is here – and it’s on the podcast platform of your choice. Apple, Google, Spotify, blah blah…. We’re on all the major platforms now, including Anchor, Apple, Google, Spotify, PocketCasts, and RadioPublic. Please support the podcast, and we’d love to hear your feedback about the book!

Enjoy the podcast!

Some link goodness:

Link to SRE landing page: https://landing.google.com/sre/books/

SRE workbook online for free: https://landing.google.com/sre/workbook/toc/

On LinkedIn: https://www.linkedin.com/in/betsy-beyer/ | Stephen Thorne

Niall Murphy on Twitter: https://twitter.com/niallm?lang=en

My original interview with them is here

The Google landing site for SRE – https://landing.google.com/sre/book.html – free PDF versions of the revised [sre] text and the followup handbook. This should be must-read material for anyone out there, from IT practitioner to CXO exec.

“Managing Misfortune for Best Results” by Kieran Barry at the SREcon EMEA, https://www.usenix.org/node/218852. This is a great overview of the Wheel of Misfortune exercises in simulating outages for training, and some antipatterns to avoid.

The Google landing page Betsy mentioned with lots of rich content – https://landing.google.com/sre/

Advance copies of Betsy’s next book can be reserved on Amazon. “Building Secure and Reliable Systems: SRE and Security Best Practices”

Posted in DevOps on October 10, 2019 by elvisboats. Leave a comment

Post navigation

← Older posts

Search for:

Recent Posts

“Hands On Kubernetes” – first thoughts

Monitoring What Matters – John-Daniel Trask of Raygun

LaunchDarkly and feature flags

Four questions around testing and Microsoft’s progress with DevOps

Betsy Beyer and Stephen Thorne

Recent Comments

Achieving DevOps – t… on DevOps – Where to S…
John Weers on DevOps Stories –Jon Cwiak,…
Robin Tudball on OwinStartup red herring
cs-cart.com on Walkthrough notes in creating…
Azure DevOps Project… on Walkthrough notes in creating…

Archives

January 2023

November 2019

October 2019

September 2019

August 2019

July 2019

May 2019

March 2019

February 2019

December 2018

November 2018

October 2018

September 2018

August 2018

June 2018

May 2018

January 2018

December 2017

November 2017

September 2017

August 2017

June 2017

April 2017

March 2017

December 2016

November 2016

October 2016

July 2016

June 2016

January 2016

November 2015

October 2015

August 2015

June 2015

May 2015

March 2015

February 2015

January 2015

December 2014

November 2014

October 2014

September 2014

July 2014

June 2014

May 2014

April 2014

March 2014

February 2014

January 2014

December 2013

Categories

Agile

Azure

Configuration Management

Continuous Delivery

Dashboarding

DevOps

Kanban

Microservices

Misc

Monitoring

Operations

OS Layer

Productivity

Programming Nonsense

Release Management

Service Layer

SQL

Uncategorized

Visual Studio

VSTS

Website and UI

Meta

Register

Log in

Entries feed

Comments feed

WordPress.com

/* driftboatdave */

adventures in cloud architecture, DevOps, and configuration management

“Hands On Kubernetes” – first thoughts

Closing Thoughts and Next Up

Monitoring What Matters – John-Daniel Trask of Raygun

LaunchDarkly and feature flags

Four questions around testing and Microsoft’s progress with DevOps

Question #1 – How do we handle planning on a strategic level with the more tactical focus of Agile?

Question #2 – How did Microsoft go about their transformation to DevOps from a shared services model?

Question #3 – What about testing? (This is usually one of our biggest blockers to improve release reliability and velocity – an unreliable, flaky test layer)

Question #4 – Production Support. Let’s say we have an Agile team, 8-12 people. How the heck are we supposed to do global support across multiple regions, 24x7x365 in production?

Betsy Beyer and Stephen Thorne