In writing the book “Achieving DevOps“, we threw away easily as many words as we ended up keeping. I wish space would have allowed us to talk in more depth about waste, Mission Command, and some other principles that we could only skim over at best.
We talk about this in the book as well – but we’re so much in debt to the bright people out there and the lasting work they’ve done. Not all of these were directly referenced in the book, but all influenced us. We didn’t have room for them in the book, but we figure this might be a nice starting point.
In doing our research – which was something we were only able to pull away from with regret and a few sledgehammer whacks by our publisher – some books stood out as being especially amazing. These, I’ve put below with the book cover as an active hyperlink – you can go right to Amazon and buy it from there. (We don’t get paid in any way for this. It’s just to help give back a little.)
But really, the best books I’ve already talked about in my post on “Where To Start?”
OK, on to the hotlinks:
Chapter 2 – Ratcheting Change
- [robha] – “A Counterintuitive Strategy for Building a Daily Exercise Habit”, Rob Hardy. Medium.com, 7/21/2017. https://betterhumans.coach.me/a-counterintuitive-strategy-for-building-a-lifelong-exercise-habit-13471da4e49d. A great article that first got us thinking about bright lines and activation energy.
- [bjfth] – “Tiny Habits”, BJ Fogg, Stanford University, 1/1/2018. https://www.tinyhabits.com/
- [bjthgs] – “Find a good spot in your life”, BJ Fogg. Stanford University, 1/1/2018. https://www.tinyhabits.com/good-spot
-
[jclat] – “Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones”, James Clear. Avery, 10/16/2018. ISBN-10: 0735211299, ISBN-13: 978-0735211292
- [duhigg] – “The Power of Habit: Why We Do What We Do in Life and Business”, Charles Duhigg. Random House, 1/1/2014. ISBN-10: 081298160X, ISBN-13: 978-0812981605
- [baume] – “Willpower: Rediscovering the Greatest Human Strength”, Roy Baumeister and John Tierney. Penguin Books, 8/28/2012. ISBN-10: 0143122231, ISBN-13: 978-0143122234
- [jclub] – “Do Things You Can Sustain”, James Clear. https://jamesclear.com/upper-bound
Chapter 2 – Kanban
-
[hanselman] – “Maslow’s Hierarchy of Needs of Software Development”, Scott Hanselman. Hanselman.com, 1/8/2012. https://www.hanselman.com/blog/MaslowsHierarchyOfNeedsOfSoftwareDevelopment.aspx
- [ferriss] – “The 4-Hour Workweek: Escape 9-5, Live Anywhere, and Join the New Rich”, Timothy Ferriss, December 2019, ISBN-10: 9780307465351, ISBN-13: 978-0307465351
- [drift2] – My original writeup on Timothy Fenriss’ book – https://driftboatdave.com/2014/09/02/being-busy-is-a-form-of-laziness/
- [tdoh] – “The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations”, Gene Kim, Patrick Dubois, John Willis, Jez Humble. IT Revolution Press, 10/6/2016, ISBN-10: 1942788002, ISBN-13: 978-1942788003
[forsgren] – “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations”, Nicole Forsgren PhD, Jez Humble, Gene Kim. IT Revolution Press, 3/27/2018. ISBN-10: 1942788339, ISBN-13: 978-1942788331
Chapter 2 – Reliability First
- [treynor] – “Keys to SRE”, Ben Treynor. SRECon 2014, 5/30/2014. https://www.usenix.org/conference/srecon14/technical-sessions/presentation/keys-sre
- [srex] – “Resources”, unattributed author(s). Google. https://landing.google.com/sre/resources.html – The Google SRE resource page. Many times you can find some of the O’Reilly SRE books free as a download here.
- [lunney] – “Postmortem Action Items: Plan the Work, Work the Plan”, John Lunney, Sue Lueder, Betsy Beyer. ;login, Spring 2017, Vol 42 No 1. https://storage.googleapis.com/pub-tools-public-publication-data/pdf/3eeb4c1d9073ca5910e49f5252cb3cf648487ac2.pdf. This is an outstanding doc for anyone looking to learn from how Google handles postmortems. Note the great checklist on action items post event.
- [Hixson] “The Systems Engineering Side of Site Reliability Engineering”, David Hixson, Betsy Beyer – ;login, June 2015, Vol 40, No 3. https://www.usenix.org/system/files/login/articles/login_june_08_hixson.pdf
- [toil] – “Invent more, toil less”, Betsy Beyer, Brendan Gleason, Dave O’Connor, Vivek Rau. ;login, Fall 2016, Vol 41, #3. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45765.pdf
- [toil3] – “Repairing network hardware at scale with SRE principles”, James O’Keeffe. Google, 8/1/2018. https://cloudplatform.googleblog.com/2018/08/repairing-network-hardware-at-scale-with-sre-principles.html. Another toil reduction case study dealing with repairing network hardware.
-
[log60] – “Invent More, Toil Less”, Betsy Beyer, Brendan Gleasan, Dave O’Connor, Vivek Rau. Google, 8/1/2016. https://www.usenix.org/system/files/login/articles/login_fall16_08_beyer.pdf. A very good expansion on the Toil sections in the original [sre] book.
- [sre] – “Site Reliability Engineering: How Google Runs Production Systems”, Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff, O’Reilly Media; 4/16/2016, ISBN-10: 149192912X, ISBN-13: 978-1491929124
- [vargo3] – “SRE vs. DevOps: competing standards or close friends?”, Seth Vargo. Google Cloud Platform Blog, 5/8/2018. https://cloudplatform.googleblog.com/2018/05/SRE-vs-DevOps-competing-standards-or-close-friends.html
- [lfj] – “SLIs, SLOs, SLAs, oh my!”, Liz Fong-Jones, Seth Vargo. YouTube, 3/8/2018. https://youtu.be/tEylFyxbDLE A great explanation of the use of metrics at Google. The entire series is very entertaining and a must-watch for SRE fans.
-
[okee] – “Repairing network hardware at scale with SRE principles”, James O’Keefe. Google Cloud Platform Blog, 8/1/2018. https://cloudplatform.googleblog.com/2018/08/repairing-network-hardware-at-scale-with-sre-principles.html For those interested in more details on how Google goes about automating its hardware so they are managed as a fleet – the classic “cattle vs pets” – this is one of the best discussions we’ve seen to date.
- [srew] – “The Site Reliability Workbook: Practical Ways to Implement SRE”, edited by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne. O’Reilly Media, 8/4/2018. ISBN-10: 1492029505, ISBN-13: 978-1492029502. The earlier SRE book was outstanding; this is better, as it’s much more applicable outside of Google’s specific use case. Loved the contents; I just wish we’d been aware of this resource earlier in our research. The section on toil is particularly good, filled with practical tips for toil reduction based on real case studies.
Chapter 3 – Continuous Integration
- [naik] – “Enabling Trunk Based Development with Deployment Pipelines”, Vishal Naik. Thoughtworks, 10/17/2015. https://www.thoughtworks.com/insights/blog/enabling-trunk-based-development-deployment-pipelines
- fowlfb] – “FeatureBranch”, Martin Fowler. MartinFowler.com, 9/3/2009. https://martinfowler.com/bliki/FeatureBranch.html
- [gitf] – “Understanding the GitHub flow”, unattributed author(s). GitHub, 11/30/2017. https://guides.github.com/introduction/flow/. The excellent GitHub Flow doc itself.
- [fowlft] – “FeatureToggle”, Martin Fowler. MartinFowler.com, 10/29/2010. https://martinfowler.com/bliki/FeatureToggle.html
- [hodft] – “Feature Toggles (aka Feature Flags)”, Pete Hodgson, MartinFowler.com, 10/9/2017. https://martinfowler.com/articles/feature-toggles.html . A more in depth discussion than [fowlft].
- [boodm] – “How Chromium Works”, Aaron Boodman. Medium, 9/22/2015. https://medium.com/@aboodman/in-march-2011-i-drafted-an-article-explaining-how-the-team-responsible-for-google-chrome-ships-c479ba623a1b . Google Chrome was built using frequent checkins to mainline. – “So, how are the wheels still on the bus? In short: no branches, runtime switches, tons of automated testing, relentless refactoring, and staying very close to HEAD of our dependencies.”
- [chen1] – “Stop cherry-picking, start merging, Part 1: The merge conflict”, Raymond Chen. Microsoft Developer blog, 3/12/2018. https://blogs.msdn.microsoft.com/oldnewthing/20180312-00/?p=98215
- [chen2] – “Stop cherry-picking, start merging, Part 2: The merge conflict that never happened (but should have)”, Raymond Chen. Microsoft Developer blog, 3/13/2018. https://blogs.msdn.microsoft.com/oldnewthing/20180313-00/?p=98225. Both articles are good references on the potential downsides of cherry-picking – so common in Git. As he points out, it could blow up, or worse it could not blow up, leading to issues silently building up and propagating under the surface. This is good to keep in mind but hardly a universal law – the Azure DevOps team uses cherry-picking heavily in rolling out urgent bugfixes.
- [ringms] – “Explore how to progressively expose your Azure DevOps extension releases in production to validate, before impacting all users”, Willy-Peter Schaub and others. Microsoft Docs, 4/25/2018. https://docs.microsoft.com/en-us/azure/devops/articles/phase-rollout-with-rings?view=azure-devops. A good overview on ring deployments at Microsoft and limiting the “blast radius”.
- [dora2015] – “Annual State of DevOps Report”, unattributed author(s). Puppet Labs, 2015. https://puppetlabs.com/2015-devops-report
- [forsgren] – “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations”, Nicole Forsgren PhD, Jez Humble, Gene Kim. IT Revolution Press, 3/27/2018. ISBN-10: 1942788339, ISBN-13: 978-1942788331
- [dora2017] – “Annual State of DevOps Report”, unattributed author(s). Puppet Labs, 2017. https://puppetlabs.com/2017-devops-report
- [kief] – “Infrastructure as Code: Managing Servers in the Cloud”, Kief Morris. O’Reilly Media, 6/27/2016. ISBN-10: 1491924357, ISBN-13: 978-1491924358
- [thoms] – “Release Flow: How We Do Branching on the VSTS Team”, Edward Thomson. MSDN Blogs, 4/19/2018. https://blogs.msdn.microsoft.com/devops/2018/04/19/release-flow-how-we-do-branching-on-the-vsts-team/
- [buchw] – “A Git Workflow for Continuous Delivery”, William Buchwalter. Microsoft TechNet, 6/26/2016. https://blogs.technet.microsoft.com/devops/2016/06/21/a-git-workflow-for-continuous-delivery/
- [fowlbr] – “BranchByAbstraction”, Martin Fowler. MartinFowler.com, 1/7/2014. https://martinfowler.com/bliki/BranchByAbstraction.html
- [wpbstf] – “Explore how to manage branching strategies with a DevOps mindset in Team Foundation Version Control (TFVC)”, Willy-Peter Schaub and others. Microsoft Docs, 4/24/2018. https://docs.microsoft.com/en-us/azure/devops/articles/effective-tfvc-branching-strategies-for-devops?view=vsts Some very solid recommendations here: start with a simple strategy, use a consistent naming convention, and two by-now familiar mantras: encourage consistent peer reviews and gated checkins with automated testing.
- [newm] “Building Microservices: Designing Fine-Grained Systems”, Sam Newman. O’Reilly Media; 2/20/2015. ISBN-10: 1491950358, ISBN-13: 978-1491950357
- [daws1] – “7 Signs You’re Mastering Continuous Integration”, Brian Dawson. DevOps.com, 7/18/2018. https://devops.com/7-signs-youre-mastering-continuous-integration/ .
Chapter 3 – Shift Left on Testing
- [dora2017] – “Annual State of DevOps Report”, unattributed author(s). Puppet Labs, 2017. https://puppetlabs.com/2017-devops-report
-
[clean] – “Clean Code: A Handbook of Agile Software Craftsmanship”, Robert C Martin. Prentice Hall, 8/11/2008. ISBN-10: 9780132350884, ISBN-13: 978-0132350884
- [feathers] – “Working Effectively with Legacy Code”, Michael Feathers. Prentice Hall, 10/2/2004. ISBN-13: 978-0131177055, ISBN-10: 9780131177055. A true masterpiece. Most of us are not blessed with greenfield type projects; I can’t think of many people that wouldn’t benefit greatly from reading this book and understanding how to better tame that monolith looming in the background.
- [refactmf] – “Refactoring: Improving the Design of Existing Code”, Martin Fowler. Addison-Wesley Signature Series, 11/30/2018. ISBN-10: 0134757599, ISBN-13: 978-0134757599
- [crisp] – “Agile Testing: A Practical Guide for Testers and Agile Teams”, Lisa Crispin, Janet Gregory. Addison-Wesley Professional, 1/9/2009. ISBN-10: 9780321534460, ISBN-13: 978-0321534460
- [crisp2] – “More Agile Testing: Learning Journeys for the Whole Team”, Lisa Crispin, Janet Gregory. Addison-Wesley Professional, 10/16/2014. ISBN-10: 9780321967053, ISBN-13: 978-0321967053
- [14pt] – “Dr. Deming’s 14 Points for Management”, unattributed author(s). ASQ.org, https://deming.org/explore/fourteen-points
- [forsgren] – “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations”, Nicole Forsgren PhD, Jez Humble, Gene Kim. IT Revolution Press, 3/27/2018. ISBN-10: 1942788339, ISBN-13: 978-1942788331
- [freem] – “Growing Object-Oriented Software, Guided by Tests”, Steve Freeman, Nat Pryce. Addison-Wesley Professional, 10/22/2009. ISBN-10: 9780321503626, ISBN-13: 978-0321503626
- [mesz] – “xUnit Test Patterns: Refactoring Test Code”, Gerard Meszaros. Addison-Wesley, 5/31/2007. ISBN-10: 9780131495050, ISBN-13: 978-0131495050. Particularly good in its discussion about dummy objects, fake obj, stubs, spies, and mocks.
- [dbnm] – “No more excuses”, Donovan Brown. Donovanbrown.com, 12/12/2016. http://donovanbrown.com/post/no-more-excuses. Our personal battle cry when it comes to “asking for permission” to write unit tests.
- [cohnx] – “The Forgotten Layer of the Test Automation Pyramid”, Mike Cohn. Mountain Goat Software, 12/17/2009. https://www.mountaingoatsoftware.com/blog/the-forgotten-layer-of-the-test-automation-pyramid
- [williams] – “The Costs and Benefits of Pair Programming”, Alistair Cockburn, Laurie Williams, 1/1/2001. https://collaboration.csc.ncsu.edu/laurie/Papers/XPSardinia.PDF
- [gucks] – “Moving 65,000 Microsofties to DevOps on the Public Cloud”, Sam Guckenheimer, 8/3/2017. https://www.visualstudio.com/learn/moving-65000-microsofties-devops-public-cloud/
- [shahxr] – “Shift Left to Make Testing Fast and Reliable”, Munil Shah. Microsoft Docs, 11/8/2017. https://www.visualstudio.com/learn/shift-left-make-testing-fast-reliable/. A must-read for any serious QA devotee.
- [shahyt] – “Combining Dev and Test in the Org”, Munil Shah. YouTube, 10/24/2017. https://www.youtube.com/watch?v=tj5mfW_gtRU. Microsoft’s decision to move to a single engineering organization where testing and development are unified was a game-changer.
- [fowlbu] – “UnitTest”, Martin Fowler. MartinFowler.com, 5/5/2014. https://martinfowler.com/bliki/UnitTest.html
- [fowltp] – “TestPyramid”, Martin Fowler, MartinFowler.com, 5/1/2012. https://martinfowler.com/bliki/TestPyramid.html
- [cohn] – “Testing Pyramids & Ice-Cream Cones”, Alister Scott. Watirmelon, unknown date. https://watirmelon.blog/testing-pyramids/
- [nonderminism] – “Eradicating Non-Determinism in Tests”, Martin Fowler, 4/14/2011. https://martinfowler.com/articles/nonDeterminism.html
- [ddt] – “Defect Driven Testing: Your Ticket Out the Door at Five O’Clock”, Jared Richardson. Dzone.com, 8/4/2010. https://dzone.com/articles/defect-driven-testing-your . Note his thoughts on combating bugs, which tend to come in clusters, with what he calls ‘testing jazz’ – thinking in riffs with dozens of tests checking an issue like invalid spaces in input.
- [stiny] – “You Are Your Software’s Immune System!”, Matt Stine. DZone.com, 7/20/2010. https://dzone.com/articles/you-are-your-softwares-immune
- [molteni] – “Giving Up on test-first development”, Luca Molteni. iansommerville, 3/17/2016. http://iansommerville.com/systems-software-and-technology/giving-up-on-test-first-development/ The author found TDD unsatisfying because it encouraged conservatism, focused on detail vs structure, and didn’t catch data mismatches – which he later elaborated with other weak points, including reliance on a layered architecture, agreed upon success criteria, and a controllable operating environment. We disagree with most of his objections but agree with the cautionary note that there is no single universal engineering method that works in every and all cases.
- [martin] – “The Three Laws of TDD”, Robert Martin. ButUncleBob.com, unknown date. http://butunclebob.com/ArticleS.UncleBob.TheThreeRulesOfTdd
- [martin3] – “When TDD doesn’t work.”, Robert Martin. The Clean Code Blog, 4/30/2014. https://8thlight.com/blog/uncle-bob/2014/04/30/When-tdd-does-not-work.html
- [humbleobj1] – “Refactoring code that accesses external services”, Martin Fowler. MartinFowler.com, 2/17/2015. https://martinfowler.com/articles/refactoring-external-service.html A great implementation of Humble Object and refactoring based on Bounded Contexts in this article.
-
[gruvle] – “Start and Scaling Devops in the Enterprise”, Gary Gruver. BookBaby, 12/1/2016. ISBN-10: 1483583589, ISBN-13: 978-1483583587
- [gruv] – “Leading the Transformation: Applying Agile and DevOps Principles at Scale”, Gary Gruver, Tommy Mouser. IT Revolution Press, 8/1/2015. ISBN-10: 1942788010, ISBN-13: 978-1942788010. An in depth exploration of how HP was able to pull itself out of the mud of long test cycles – even with a labyrinth of possible hardware combinations.
Chapter 3 – Definition of Done, Family Dinner Code Reviews
- [wieg] – “Humanizing Peer Reviews”, Karl Wiegers, Addison-Wesley, 11/2/2001, ISBN-13: 978-0201734850
- [agilep1] – “The Joy of Peer Reviews (Part 1 – Code)”, The Agile Pirate, 4/14/2011, http://theagilepirate.net/archives/117
- [agilep2] – “The Joy of Peer Reviews (Part 2 – Documentation)”, Simon Cromarty. The Agile Pirate, 5/24/2011. http://theagilepirate.net/archives/399 Both are excellent articles, including some nice simple checklists as a sample. “…remember the goal of a review is to share improvement opportunities, not for lazy coders to have someone else find their bugs for them or for staff to step on each other.”
- [scrumdod] – “Walking Through a Definition of Done”, Ian Mitchell. Scrum.org, 5/31/2017. https://www.scrum.org/resources/blog/walking-through-definition-done
- [joshi] – “Better Code Reviews”, Vaidehi Joshi. BetterCode.Reviews, unknown date. http://www.bettercode.reviews/ More antipatterns from informal survey. The comments are very insightful…
- [kemp2] – “Giving better code reviews”, Joel Kemp. Medium, 1/24/2016. https://medium.com/@mrjoelkemp/giving-better-code-reviews-16109e0fdd36 – a plea for more than a brief glance by reviewers.
- [codac] – “Code Review Etiquette”, unattributed author(s). Codacy.com, 10/20/2016. https://blog.codacy.com/code-review-etiquette-da212a7454c – some basic etiquette.
- [cdhpr] – “Code Reviews: Just Do It”, Jeff Atwood. Coding Horror blog, 1/21/2006. https://blog.codinghorror.com/code-reviews-just-do-it. A true classic!
- [jaimc] – “10 facts about code reviews and quality”, unattributed author(s). Codacy.com, 12/15/2016. https://blog.codacy.com/10-facts-about-code-reviews-and-quality-c5adf2e869fe
- [schi1] – “Running an Effective Code Review”, Esther Schindler. CIO.com, 12/22/2008. https://www.cio.com/article/2431557/developer/running-an-effective-code-review.html – They noted that even if you are only “spot checking” some of the code being checked in there’s a measurable increase in quality.
- [schi2] – “How NOT to Run a Code Review”, Esther Schindler. CIO.com, 12/22/2008. https://www.cio.com/article/2431553/developer/how-not-to-run-a-code-review.html – esp like the comments from Oliver Cole on the psychology behind criticism.
- [sm10] – “10 tips to guide you toward effective peer code review”, unattributed author(s). Smartbear.com, unknown date. https://smartbear.com/learn/code-review/best-practices-for-peer-code-review/
- [mcd10] – “10 Principles of a Good Code Review”, Jason McDonald. Dev.to, 12/6/2017. https://dev.to/codemouse92/10-principles-of-a-good-code-review-2eg – A very nice 15 point checklist.
- [atul] – “The Checklist”, Atul Gawande. New Yorker Magazine, 12/10/2007. https://www.newyorker.com/magazine/2007/12/10/the-checklist – Why do pilots use checklists as a standard prereq for any flight, while developers rarely or never use them?
- [ibmrl] – “11 proven practices for more effective, efficient peer code review”, Jason Cohen. IBM, 1/25/2011. https://www.ibm.com/developerworks/rational/library/11-proven-practices-for-peer-review/ – Outstanding article.
-
[jarma] – “Giving and Receiving Great Code Reviews”, Sam Jarman. dev.to, 6/25/2017. https://dev.to/samjarman/giving-and-receiving-great-code-reviews – love the 6 specific questions the author looks for on pull requests.
- [gruvle] – “Start and Scaling Devops in the Enterprise”, Gary Gruver. BookBaby, 12/1/2016. ISBN-10: 1483583589, ISBN-13: 978-1483583587. There’s definitely a lot of books out there with better quality graphics and a slicker presentation than this book. There are none with better content. Gary Gruver wrote one classic in “Leading the Enterprise“, about his transformation efforts at HP. This book is more of a workbook, and after reading it you’ll be in a much better position to analyze the flow of value. I can’t say enough about this book, though I tried on my blog… I’ve read it about three times, every six months or so, and always learn something new. You need to have it in your library.
- [tylerh] – Interview with Tyler Hardison by Dave Harrison, see Appendix.
- [terrja] – “Doing Terrible Things To Your Code”, Jeff Atwood. Coding Horror blog, 7/30/2015. https://blog.codinghorror.com/doing-terrible-things-to-your-code/ – the assumptions we commonly make as programmers about ‘simple’ thinks like names, dates, geography, gender, addresses etc – all are often wrong, and any good tester/mentor can and should expose them in the review process. Or, your users will…
- [nonnenberg] – “Top ten pull request review mistakes”, Scott Nonnenberg. ScottNonnenberg.com, 1/25/2017. https://blog.scottnonnenberg.com/top-ten-pull-request-review-mistakes/
- [rodgw] – “Why I Have Given Up on Coding Standards”, Richard Rodger. RichardRodger.com, 11/3/2012. http://www.richardrodger.com/2012/11/03/why-i-have-given-up-on-coding-standards/#.WxWABPZFxPY . The statement about power-mad architects definitely rings home with some of our past experiences.
- [kief] – “Infrastructure as Code: Managing Servers in the Cloud”, Kief Morris. O’Reilly Media, 6/27/2016. ISBN-10: 1491924357, ISBN-13: 978-1491924358
- The concept of a family dinner can be seen in several places – notably the Death and Company cookbook (“Death & Co: Modern Classic Cocktails”, Kaplan/Fauchald. Ten Speed Press, 1/1/2014. 978-1607745259) and in the Netflix series Chef’s Table (Christina Tosi, Season 4 Episode 1), and the osmosis learning process used by Bestia as described in https://www.huffingtonpost.com/2015/06/23/family-meal-restaurant_n_7566654.html. The way thoughts are asked at these dinners is very informative – for example, “I don’t think the Perry’s Tot and Sherry are playing well together. What about Old Tom?” “What if you split the rye whiskey with something lower proof?” “It’s nice but needs a bump.” This is a great practical example of kindergarten rules – informative, helpful, and not a personal attack.
Chapter 4 – Blameless Postmortems
-
[dora2016] – “Annual State of DevOps Report”, unattributed author(s). Puppet Labs, 2016. https://puppetlabs.com/2016-devops-report
- [lenci] – “The Five Dysfunctions of a Team: A Leadership Fable”, Patrick Lencioni. Jossey-Bass, 4/11/2002. ISBN-13: 978-0787960759, ISBN-10: 0787960756. Great on Audible, and some really thought-provoking content. I often turn back to this; and as mentioned in the book, it pairs up evenly with the Westrum study made famous by DORA.
- [westrum] – “A typology of organisational cultures”, Ron Westrum. BMJ Quality & Safety, 2004;13:ii22-ii27, https://qualitysafety.bmj.com/content/13/suppl_2/ii22
- [doran] – “There’s a S.M.A.R.T. Way to Write Management’s Goals and Objectives”, Doran, G. T. Management Review, Vol. 70, Issue 11, 1/1/1981. https://community.mis.temple.edu/mis0855002fall2015/files/2015/10/S.M.A.R.T-Way-Management-Review.pdf
- [victorops] – “VictorOps Guide to Blameless Post-mortems”, unattributed author(s). VictorOps, 9/30/2014. https://www.slideshare.net/VictorOps/victor-ops-guide-to-blameless-post-mortems A great slideshare on how to set up and run a blameless postmortem.
- [docaf65] – “DevOps Cafe Episode 65 – John interviews Damon”, John Willis, Damon Edwards. DevOps Café, 12/15/2015. http://devopscafe.org/show/2015/12/15/devops-cafe-episode-65-john-interviews-damon.html
- [dekaiz] – “DOES15 – Damon Edwards – DevOps Kaizen Practical Steps to Start & Sustain a Transformation”, Damon Edwards. DevOps Enterprise Summit 2015, YouTube, 11/5/2015. https://www.youtube.com/watch?v=RT542sffJpM
- [sre] – “Site Reliability Engineering: How Google Runs Production Systems”, Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff, O’Reilly Media; 4/16/2016, ISBN-10: 149192912X, ISBN-13: 978-1491929124. Appendix D has an excellent sample postmortem.
- [allspaw] – “Blameless PostMortems and a Just Culture”, John Allspaw. Code as Craft / Etsy, 5/22/2012. https://codeascraft.com/2012/05/22/blameless-postmortems/ John Allspaw’s seminal post on how “blameless postmortems” actually work at Etsy. Note how they openly discuss attribution bias and how they plan to counter it.
- [forsgren] – “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations”, Nicole Forsgren PhD, Jez Humble, Gene Kim. IT Revolution Press, 3/27/2018. ISBN-10: 1942788339, ISBN-13: 978-1942788331
- [zwieb] – “Beyond Blame: Learning From Failure and Success”, Dave Zwieback. O’Reilly Media, 10/29/2015. ISBN-10: 1491906413, ISBN-13: 978-1491906415
- [dickerson] – “Etsy’s Winning Secret: Don’t Play The Blame Game!”, Owen Thomas. Business Insider, 5/15/2012. http://www.businessinsider.com/etsy-chad-dickerson-blameless-post-mortem-2012-5
- [dekker] – “Behind Human Error”, Sidney Dekker, David Woods. CRC Press, 9/30/2010. ISBN-13: 978-0754678342, ISBN-10: 0754678342. Etsy and other companies has mentioned this book and its discussion of First Stories vs Second Stories many times.
- [schauenberg] – “Practical Postmortems at Etsy”, Daniel Schauenberg. InfoQ, 8/22/2015. https://www.infoq.com/articles/postmortems-etsy
- [pullen] – “5 Whys – how we conduct blameless post-mortems after something goes wrong”, Noel Pullen, Hootsuite, http://code.hootsuite.com/blameless-post-mortems/ An excellent war story about how Google handles failure recovery and postmortems.
- [milste] – “How to Run a 5 Whys (With Humans, Not Robots)”, Dan Milstein. The Lean Startup Conference, YouTube, 1/27/2013, https://www.youtube.com/watch?v=78qzrXIPn5Q
- [niseq] – “Why Etsy engineers send company-wide emails confessing mistakes they made”, Max Nisen. Quartz, 9/18/2015. https://qz.com/504661/why-etsy-engineers-send-company-wide-emails-confessing-mistakes-they-made/
- [malpas] – “Fallible Humans”, Ian Malpass. Indecorous.com, 7/20/2014, http://indecorous.com/fallible_humans/ An excellent case study on how failure is handled in practice at Etsy.
- [fostps] – “Tool: Foster psychological safety”, unattributed author(s). re:work, Google, https://rework.withgoogle.com/guides/understanding-team-effectiveness/steps/foster-psychological-safety/
- [dekkt] – “The Field Guide to Understanding Human Error”, Sidney Dekker. CRC Press; 6/28/2006. ISBN-10: 0754648265, ISBN-13: 978-0754648260
- [harll] – “What blameless really means”, Jessica Harllee. JessicaHarllee.com, 3/10/2014. http://www.jessicaharllee.com/notes/what-blameless-really-means/ – One SRE’s thoughts about how blameless postmortems actually work in practice at Etsy.
- [macri] – “Morgue: Helping Better Understand Events by Building a Post Mortem Tool”, Bethany Macri. DevOpsDays.org, Vimeo, 10/18/2013. https://vimeo.com/77206751 – How and why the Morgue postmortem tool was created at Etsy. This tool is publicly available on GitHub; see https://github.com/etsy/morgue
- [joao] – InfoQ, “How Etsy Deploys More Than 50 Times a Day”, João Miranda. InfoQ Magazine, 3/17/2014. https://www.infoq.com/news/2014/03/etsy-deploy-50-times-a-day
- [allspcf] – “Counterfactual Thinking, Rules, and The Knight Capital Accident”, John Allspaw. KitchenSoap.com, 10/29/2013. https://www.kitchensoap.com/2013/10/29/counterfactuals-knight-capital/ The best discussion I’ve seen to date on the Knight Capital disaster and the role of counterfactuals in our analysis.
- [dbtmph] – “The Man Who Tried to Stop Pearl Harbor”, David J. Castello. The Daily Beast, 7/12/2016. https://www.thedailybeast.com/the-man-who-tried-to-stop-pearl-harbor . The story of George Elliott and his failure to prevent Pearl Harbor; a fantastic example of a “second story” long kept hidden.
- [harrym] – “A Rough Patch”, Brian Harry. MSDN, 11/25/2013. https://blogs.msdn.microsoft.com/bharry/2013/11/25/a-rough-patch/ . One of the best real-world examples I’ve seen of a true blameless postmortem that has teeth, following several very high-visibility outages. “Either I’m going to get increasingly good at apologizing to fewer and fewer people or we’re going to get better at this. I vote for the latter.”
- [tdoh] – “The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations”, Gene Kim, Patrick Dubois, John Willis, Jez Humble. IT Revolution Press, 10/6/2016, ISBN-10: 1942788002, ISBN-13: 978-1942788003
- [arist] –”What Google Learned From Its Quest to Build the Perfect Team”, Charles Duhigg, 2/25/2016, NY Times Magazine, https://www.nytimes.com/2016/02/28/magazine/what-google-learned-from-its-quest-to-build-the-perfect-team.html
Chapter 4 – Hypothesis Driven Development
- [donovan] – “Stop Getting Stuff Done After You Said You Couldn’t”, Donovan Brown. donovanbrown.com, 3/17/2017. http://donovanbrown.com/post/Stop-Getting-Stuff-Done-After-You-Said-You-Couldnt
- [pokert] – “When is it OK to Fold Aces?”, Malcolm Clark. PokerTube.com, 6/22/2016. https://www.pokertube.com/article/when-is-it-ok-to-fold-aces
- [pokerns] – “Would You Fold Pocket Aces Postflop In This Spot?”, Martin Harris. PokerNews, 5/8/2017. https://www.pokernews.com/strategy/would-you-fold-pocket-aces-postflop-in-this-spot-27861.htm . The source for the pocket aces fold story comes from this article.
- [standish] – “Standish Group 2015 Chaos Report – Q&A with Jennifer Lynch”, Stéphane Wojewoda, Shane Hastie. InfoQ, 10/4/2015. https://www.infoq.com/articles/standish-chaos-2015 – From 2011-2015, the number of “successful” vs challenged/failed projects held rock steady at about 29%. Interestingly, the smaller the project was, the greater its chance of success; small projects had a 62% success rate versus only a 2-6% chance for grand/large sized projects.
- [tdoh] – “The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations”, Gene Kim, Patrick Dubois, John Willis, Jez Humble. IT Revolution Press, 10/6/2016, ISBN-10: 1942788002, ISBN-13: 978-1942788003. It quotes Ronny Kohavi at MSFT as saying that after evaluating well-designed and executed experiments, only 1/3rd of features were successful at improving the key metric they were targeting!
- [highsmith] – “Agile Project Management: Creating Innovative Products”, Jim Highsmith. Addison-Wesley, 1/1/2009. ISBN-13: 978-0321658395
- [jjuicbo] – “Interview: Jim Johnson of the Standish Group”, Deborah Hartmann Preuss. InfoQ, 8/25/2006. https://www.infoq.com/articles/Interview-Johnson-Standish-CHAOS
- [lean] – “Lean Enterprise: How High Performance Organizations Innovate at Scale”, Jez Humble, Joanne Molesky, Barry O’Reilly. O’Reilly Media, 1/3/2015. ISBN-10: 1449368425, ISBN-13: 978-1449368425. Excellent section by Ash Maury on “Running Lean” on traditional PMO orgs clash with hypothesis-driven development, and separately on the OODA loop.
- [siddharta] – “The biggest waste in software development”, Siddharta X. Tools For Agile blog, 3/26/2010. http://toolsforagile.com/blog/archives/260/the-biggest-waste-in-software-development
- [fowldich] – “UtilityVsStrategicDichotomy”, Martin Fowler. MartinFowler.com, 7/29/2010. https://martinfowler.com/bliki/UtilityVsStrategicDichotomy.html – Deciding when to buy versus build is a hard decision; Martin Fowler splits this up by asking if it drives actual value for the customer – or if it’s a utility function.
- [morec] – “Lean and fast — using A3 to save your program”, John A. Moreci. Project Management Institute, 10/26/2014. https://www.pmi.org/learning/library/lean-fast-using-a3-save-program-9270
- [lindea] – “Early Amazon: Shopping cart recommendations”, Greg Linden. Glinden.blogspot, 4/25/2006. http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html . A great account of an A/B test saving a valuable new feature early on for Amazon.
- [kimbre] – “An Interview with Jez Humble on Continuous Delivery, Engineering Culture, and Making Decisions”, Kimbre Lancaster. split.io, 8/16/2018. https://www.split.io/blog/jez-humble-interview-decisions-2018/
- [harris] – “Using feature flags in your app release management strategy”, Richard Harris. App Developer Magazine, 4/19/2018. https://appdevelopermagazine.com/5983/2018/4/16/Using-feature-flags-in-your-app-release-management-strategy/
Chapter 4 – Value Stream Mapping
- [teams] – “Team of Teams: New Rules of Engagement for a Complex World”, Stanley McChrystal. Portfolio, 5/12/2015. ISBN-10: 1591847486, ISBN-13: 978-1591847489. The second most influential book we read, besides “The Power of Habit”. Highly recommended either printed or on Audible; it’s a fast read, and amazingly insightful.
- [ohno] – “Toyota Production System: Beyond Large-Scale Production”, Taiichi Ohno. Productivity Press; 3/1/1988, ISBN-10: 0915299143, ISBN-13: 978-0915299140
- [shingo] – “A Study of the Toyota Production System: From an Industrial Engineering Viewpoint (Produce What Is Needed, When It’s Needed)”, Shigeo Shingo, Andrew P. Dillon. Productivity Press; 10/1/1989. ISBN-10: 9780915299171, ISBN-13: 978-0915299171
- [popp] – “Implementing Lean Software Development: From Concept to Cash”, Mary and Tom Poppendieck. Addison-Wesley Professional, 9/17/2006. ISBN-10: 0321437381, ISBN-13: 978-0321437389
- [jeffmu] – “The Multitasking Myth”, Jeff Atwood. Coding Horror Blog, 9/27/2006. https://blog.codinghorror.com/the-multi-tasking-myth/
- [liker] – “The Toyota Way: 14 Management Principles from the World’s Greatest Manufacturer”, Jeffrey K. Liker, McGraw-Hill Education; 1/7/2004, ISBN-10: 0071392319, ISBN-13: 978-0071392310
- [devcaf65] – “DevOps Cafe Episode 62 – Mary and Tom Poppendieck”, Damon Edwards, John Willis. DevOps Café, 8/16/2015. http://devopscafe.org/show/2015/8/16/devops-cafe-episode-62-mary-and-tom-poppendieck.html
- [willis] – “DevOps Culture (Part 1)”, John Willis. IT Revolution, 5/1/2012. https://itrevolution.com/devops-culture-part-1/ This is an extremely influential blog; I found myself turning back to it many times.
Chapter 5 – Small Cross Functional Teams
- [tdoh] – “The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations”, Gene Kim, Patrick Dubois, John Willis, Jez Humble. IT Revolution Press, 10/6/2016, ISBN-10: 1942788002, ISBN-13: 978-1942788003
- [domenic] – “Making Work Visible: Exposing Time Theft to Optimize Work & Flow”, Dominica Degrandis, 11/14/2017, IT Revolution Press; ISBN-10: 1942788150, ISBN-13: 978-1942788157
- [mcchryst] – “Team of Teams: New Rules of Engagement for a Complex World”, Stanley McChrystal. Portfolio, 5/12/2015. ISBN-10: 1591847486, ISBN-13: 978-1591847489
- [rother] – “Toyota Kata: Managing People for Improvement, Adaptiveness and Superior Results”, Mike Rother. McGraw-Hill Education, 8/4/2009. ISBN-10: 0071635238, ISBN-13: 978-0071635233
Chapter 5 – Configuration Management and Infrastructure as Code
- [rbias] – “The History of Pets vs Cattle and How to Use the Analogy Properly”, Randy Bias. CloudScaling.com, 9/29/2016. http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
- [kief] – “Infrastructure as Code: Managing Servers in the Cloud”, Kief Morris. O’Reilly Media, 6/27/2016. ISBN-10: 1491924357, ISBN-13: 978-1491924358
- [cern] -“Are your servers PETS or CATTLE?”, Simon Sharwood. The Register, 3/18/2013. https://www.theregister.co.uk/2013/03/18/servers_pets_or_cattle_cern/
- [guckiac] – “What is Infrastructure as Code?”, Sam Guckenheimer. Microsoft Docs, 4/3/2017. https://docs.microsoft.com/en-us/azure/devops/learn/what-is-infrastructure-as-code
- [russd] – “It Takes Dev and Ops to Make DevOps”, Russ Collier. DevOpsOnWindows.com, 7/26/2013. http://www.devopsonwindows.com/it-takes-dev-and-ops-to-make-devops/
- [puppiac] – “Infrastructure as code”, unattributed author(s). Puppet, unknown date. https://puppet.com/solutions/infrastructure-as-code – A great overview with videos of why IAC is so important
- [newm] “Building Microservices: Designing Fine-Grained Systems”, Sam Newman. O’Reilly Media, 2/20/2015. ISBN-10: 1491950358, ISBN-13: 978-1491950357
- [yevg] – “Terraform: Up and Running: Writing Infrastructure as Code”, Yevgeniy Brikman. O’Reilly Media, 3/27/2017. ISBN-10: 1491977086, ISBN-13: 978-1491977088
- [sre] – “Site Reliability Engineering: How Google Runs Production Systems”, Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff, O’Reilly Media; 4/16/2016, ISBN-10: 149192912X, ISBN-13: 978-1491929124
- [gruvle] – “Start and Scaling Devops in the Enterprise”, Gary Gruver. BookBaby, 12/1/2016. ISBN-10: 1483583589, ISBN-13: 978-1483583587
Chapter 5 – Security As Part of the Lifecycle
- [payne] – “DevOps and Security: 5 Principles for DevSecOps”, Jeffrey Payne. TechWell, 8/3/2018. https://www.techwell.com/techwell-insights/2018/08/devops-and-security-5-principles-devsecops
- [corman] – “DevOps Cafe Episode 63 – Josh Corman”, DevOps Café, 9/2/2015, http://devopscafe.org/show/2015/9/2/devops-cafe-episode-63-josh-corman.html
- [rugged] – “The Rugged Manifesto”, unattributed author(s). RuggedSoftware.org, 1/1/2010. https://www.ruggedsoftware.org
- [thmodel] – “Threat Modeling”, unattributed author(s). Microsoft Security Engineering, unknown date. https://www.microsoft.com/en-us/securityengineering/sdl/threatmodeling Microsoft’s approach to Threat Modeling and security as part of the lifecycle.
- [douglci] – “Learn how to add continuous security validation to your CI/CD pipeline”, Mike Douglas and others. Microsoft Docs, 4/25/2018. https://docs.microsoft.com/en-us/vsts/articles/security-validation-cicd-pipeline?view=vsts
- [prieur] – “ALM and DevOps – Secure and Deliver with Rugged DevOps”, Jean-Marc Prieur, Sam Guckenheimer. MSDN, 1/1/2016. https://msdn.microsoft.com/en-us/magazine/mt790188.aspx?f=255&MSPPError=-2147217396
- [gucksec] – “Security In Your Continuous Integration Pipeline”, Sam Guckenheimer. WhiteSource, YouTube, 8/30/2017. https://www.youtube.com/watch?v=C1CPN0ArZJs
- [gotim] – “A Definition of Done for DevSecOps”, Gene Gotimer. TechWell, 5/8/2018. https://www.techwell.com/techwell-insights/2018/05/definition-done-devsecops
- [reed] – “Want rugged DevOps? Team up your release and security engineers”, J Paul Reed. TechBeacon, unknown date. https://techbeacon.com/want-rugged-devops-team-your-release-security-engineers
- [barth] – “Deflating news: Bouncy Castle BKS-V1 keystore files not adequately protected”, Bradley Barth. SC Media, 3/19/2018. https://www.scmagazine.com/deflating-news-bouncy-castle-bks-v1-keystore-files-not-adequately-protected/article/751885/
- [owasp] – “OWASP Periodic Table of Vulnerabilities”, unattributed author(s). OWASP, 2/12/2016. https://www.owasp.org/index.php/OWASP_Periodic_Table_of_Vulnerabilities#tab=Periodic_Table_of_Vulnerabilities
- [zanel] – “DevSecOps: How to Use DevOps to Make You More Secure”, Zane Lackey. IT Revolution, 8/26/2018. https://itrevolution.com/devsecops-zane-lackey/
Chapter 5 – Automated Jobs and Dev Production Support
- [maun] – “Rundeck Helps Ticketmaster Reshape Operations”, unattributed author(s). Rundeck.org, 1/1/2015. http://rundeck.org/stories/mark_maun.html – Note the strong objections by both developers and Operations (costs, risks, SOX and security compliance, straightjacketed solution sets and loss of control). This resistance dropped on both sides as a lengthy pilot period proved that runbooks provided both simplicity and auditable, repeatable, and traceable action steps that simplified troubleshooting.
- [pagr] – “Incident Response”, unattributed author(s). PagerDuty, unknown date. https://response.pagerduty.com/ An excellent documentation hub on how to handle initial response.
- [mulkey2] – “DevOps Cafe Episode 61 – Jody Mulkey”, John Willis, Damon Edwards. DevOps Café, 7/27/2015. http://devopscafe.org/show/2015/7/27/devops-cafe-episode-61-jody-mulkey.html
- [newm] “Building Microservices: Designing Fine-Grained Systems”, Sam Newman. O’Reilly Media; 2/20/2015. ISBN-10: 1491950358, ISBN-13: 978-1491950357
- [sharma] – “The DevOps Adoption Playbook: A Guide to Adopting DevOps in a Multi-Speed IT Enterprise”, Sanjeev Sharma. Wiley, 2/28/2017. ISBN-10: 9781119308744, ISBN-13: 978-1119308744
- [gruvle] – “Start and Scaling Devops in the Enterprise”, Gary Gruver, BookBaby, 12/1/2016. ISBN-10: 1483583589, ISBN-13: 978-1483583587
- [sre] – “Site Reliability Engineering: How Google Runs Production Systems”, Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff, O’Reilly Media; 4/16/2016, ISBN-10: 149192912X, ISBN-13: 978-1491929124
Chapter 6 – Metrics and Monitoring
- [babb] – “Fly-Fishin’ Fool: The Adventures, Misadventures, and Outright Idiocies of a Compulsive Angler”, James Babb. Lyons Press; 4/1/2005. ISBN-10: 1592285937, ISBN-13: 978-1592285938
- [theart] – “The Art of Monitoring”, James Turnbull. Amazon Digital Services LLC, 6/8/2016. ASIN: B01GU387MS. Perhaps the best overall discussion we’ve seen of monitoring and a very good, explicit implementation of the ELK stack to handle aggregation and dashboarding. See my blog post for more on this outstanding work.
- [guckenheimer2] – “Moving 65,000 Microsofties to DevOps on the Public Cloud”, Sam Guckenheimer. Microsoft Docs, 8/3/2017. https://docs.microsoft.com/en-us/azure/devops/devops-at-microsoft/moving-65000-microsofties-devops-public-cloud
- [hawthorne] – “The Hawthorne effect”, Tom Hindle. The Economist, 11/3/2008. https://www.economist.com/news/2008/11/03/the-hawthorne-effect
- [baer] – “How Changing One Habit Helped Quintuple Alcoa’s Income”, Drake Baer. Business Insider, 4/19/2014. https://www.businessinsider.com/how-changing-one-habit-quintupled-alcoas-income-2014-4
-
[popp4] – “DevOps Cafe Episode 62 – Mary and Tom Poppendieck”, John Willis, Damon Edwards. DevOps Café, 8/16/2015. http://devopscafe.org/show/2015/8/16/devops-cafe-episode-62-mary-and-tom-poppendieck.html
- [visible] – “The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps”, Kevin Behr, Gene Kim, George Spafford. Information Technology Process Institute, 6/15/2005. ISBN-10: 0975568612, ISBN-13: 978-0975568613. We wish this short but powerful book was better known. Like Continuous Delivery”, it’s aged well – and most of its precepts still hold true. It resonates particularly well with IT managers and Operations staff.
- [rayg2] – “Customer focus and making production visible with Raygun”, Damian Brady. Channel9, 2/8/2018. https://channel9.msdn.com/Shows/DevOps-Lab/Customer-focus-and-making-production-visible-with-Raygun?WT.mc_id=dlvr_twitter_ch9
- [hubbard] – “How to Measure Anything: Finding the Value of Intangibles in Business”, Douglas Hubbard. Wiley Publishing, 3/17/2014. ISBN-10: 9781118539279, ISBN-13: 978-1118539279
- [turnbull] – “DevOps Cafe Episode 70 – James Turnbull”, John Willis, Damon Edwards. DevOps Café, 10/26/2016. http://devopscafe.org/show/2016/10/26/devops-cafe-episode-70-james-turnbull.html
-
[cockr] – “DevOps Cafe Episode 50 – Adrian Cockcroft”, John Willis, Damon Edwards. DevOps Café, 7/22/2014. http://devopscafe.org/show/2014/7/22/devops-cafe-episode-50-adrian-cockcroft.html. I love this interview in part for Adrian calling out teams that are stuck in analysis paralysis – and the absurdity of not giving teams self-service environment provisioning. “First I ask… are you serious?”
- [julian] – “Practical Monitoring: Effective Strategies for the Real World”, Mike Julian. O’Reilly Media, 11/23/2017. ISBN-10: 1491957352, ISBN-13: 978-1491957356. I think this may actually be a little better than “The Art of Monitoring” – though that’s also a book we loved and found value in – just because there’s less of a narrow focus on the ELK stack.
- [habit] – “The Power of Habit: Why We Do What We Do in Life and Business”, Charles Duhigg. Random House, 1/1/2014. ISBN-10: 081298160X, ISBN-13: 978-0812981605
- [bejtlich] – “The Practice of Network Security Monitoring: Understanding Incident Detection and Response”, Richard Bejtlich. No Starch Press, 7/15/2013. ISBN-10: 1593275099, ISBN-13: 978-1593275099
Chapter 6 – Feature Flags and Continuous Delivery
- [mugrage] – “It’s Not Continuous Delivery if You Can’t Deploy Right Now”, Ken Mugrage. InfoQ, 7/20/2018. https://www.infoq.com/presentations/cd-deployment-pipelines
-
[danno] – “The Journey to Continuous Delivery”, Dan North. InfoQ Magazine, 4/10/2018. https://www.infoq.com/presentations/cd-business-agility . Dan advocates not attempting to boil the ocean, but choosing one tasty, low-lying project to go after that drives real business value.
- [cd] – “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation”, Jez Humble, David Farley. Addison-Wesley Professional, 8/6/2010. ISBN-10: 9780321601919, ISBN-13: 978-0321601919.
This book made a giant splash when it first came out and is still having great impact to this day. I reviewed it on my blog; suffice to say, it’s a must-have.
- [kimbre] – “An Interview with Jez Humble on Continuous Delivery, Engineering Culture, and Making Decisions”, Kimbre Lancaster. split.io, 8/16/2018. https://www.split.io/blog/jez-humble-interview-decisions-2018/
- [harris] – “Using feature flags in your app release management strategy”, Richard Harris. App Developer Magazine, 4/19/2018. https://appdevelopermagazine.com/5983/2018/4/16/Using-feature-flags-in-your-app-release-management-strategy/
- [patang] – “Best of Velocity: Move Fast and Ship Things – Facebook’s Operational and Release Processes”, Girish Patangay. O’Reilly Media, YouTube, 9/9/2013. https://www.youtube.com/watch?v=dDf2t-E_Ea8&feature=youtu.be&t=11m20s A great 18-minute detail going over Facebook’s implementation of feature flags and safely introducing changes.
- [baker] – “Feature Flag-Driven Development”, Justin Baker. LaunchDarkly, 11/7/2015. https://launchdarkly.com/blog/feature-flag-driven-development/ A very readable overview, including some nifty graphical descriptions of the different use cases with FF.
- [harmes] – “Flipping Out”, Ross Harmes. 12/2/2009, Flickr. http://code.flickr.net/2009/12/02/flipping-out/ . A very influential post (if short!) that describes Flickr’s release patterns and use of feature flags.
- [ldbp] – “Best Practices”, unattributed author(s). GitHub, 3/5/2018. https://github.com/launchdarkly/featureflags/blob/master/5%20-%20Best%20Practices.md – answers common questions around naming and usage conventions, and the importance of giving access to non-devs.
- [lduc] – “Use Cases”, unattributed author(s). LaunchDarkly.com, unknown dates. https://launchdarkly.com/use-cases/?utm_source=launchdarkly_blog&utm_medium=organic
- [ffio] – “Open Source Resources”, unattributed author(s). FeatureFlags.IO, unknown dates. http://featureflags.io/resources/ – An outstanding documentation and guidance hub.
- [bird] – “Feature Toggles are one of the worst kinds of Technical Debt”, Jim Bird. SwReflections.Blogspot, 8/6/2014. http://swreflections.blogspot.com/2014/08/feature-toggles-are-one-of-worst-kinds.html. It’s hard to argue with Jim’s list of risks: that feature flags are meant to be short-lived and represent technical debt if left untended; if overused they can become an antipattern. Once again, there are no silver bullets.
- [ds2014] – “Knightmare: A DevOps Cautionary Tale”, Doug Seven. DougSeven.com, 4/7/2014. https://dougseven.com/2014/04/17/knightmare-a-devops-cautionary-tale/ Absolutely chilling. This is the story of how a company with nearly $400 million in assets went bankrupt in 45 minutes, all because of a failed deployment. Here the real issue wasn’t the reliance on feature flags – it’s what wasn’t there, better automation around configuration, a well-rehearsed deployment cycle, and robust testing. Feature flags can’t compensate for the lack of automation and good process, or believing that handing off a written set of instructions is repeatable and foolproof. As Google says, “hope is not a strategy.”
- [garve] – “Better development with Feature Flags”, Leonard Garvey. reinteractive.com, 10/28/2014. https://reinteractive.com/posts/220-better-development-with-feature-flags Leonard describes a few more little-known benefits of feature flags – including it makes collaboration with other developers easier, reduces the risk of conflicting code, and provides the ability to roll out immature features with less risk. We wouldn’t use the exact code implementation he describes today, but the principles hold true.
- [travisci] – “Using Feature Flags to Ship Changes with Confidence”, Mathias Meyer. Travis-CI.com, 3/4/2014. https://blog.travis-ci.com/2014-03-04-use-feature-flags-to-ship-changes-with-confidence. How one company uses feature flags to enable CI, including some nice implementation details using Ruby.
- [mfbl] – “FeatureToggle”, Martin Fowler. MartinFowler.com, 10/29/2010. https://martinfowler.com/bliki/FeatureToggle.html
- [bakerx] – “Enterprise Requirements for Managing Feature Flags”, Justin Baker. LaunchDarkly, 3/4/2016. https://blog.launchdarkly.com/enterprise-requirements-for-managing-feature-flags/ A nice overview of how to manage the lifecycle of feature flags so they don’t become technical debt.
- [wang4] – “Microsoft’s Abel Wang on the Key to Implementing Advanced DevOps: Feature Flags”, Becky Nagel. Visual Studio Magazine, 2/5/2018. https://visualstudiomagazine.com/articles/2018/02/02/advanced-devops.aspx?m=1 We’re huge fans of LaunchDarkly at Microsoft.
- [medx] – “Edith Harbaugh, LaunchDarkly”, unattributed author(s). Medium DFJ Posts, 2/8/2018. https://medium.com/dfj-vc/edith-harbaugh-launchdarkly-3cadf0123f15 . “Everyone talks about knowing real customer needs, but every customer will tell you something different. I want to know what people actually want and build that, rather than build stuff that nobody wants.”
- [hodgx] – “Progressive Experimentation with Feature Flags”, Buck Hodges. Microsoft Docs, 11/13/2017. https://docs.microsoft.com/en-us/azure/devops/learn/devops-at-microsoft/progressive-experimentation-feature-flags A very detailed overview of how Microsoft has applied feature flags with Azure DevOps.
- [tdoh] – “The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations”, Gene Kim, Patrick Dubois, John Willis, Jez Humble. IT Revolution Press, 10/6/2016, ISBN-10: 1942788002, ISBN-13: 978-1942788003. From a 2009 John Allspaw letter to Flickr, page 173.
Chapter 6 – Disaster Recovery and Gamedays
- [dyn1] – “The Dynatrace Unbreakable Pipeline in Azure DevOps and Azure? Bam!”, Abel Wang. AbelSquidHead.com, 8/3/2018. https://abelsquidhead.com/index.php/2018/08/03/the-dynatrace-unbreakable-pipeline-in-Azure DevOps-and-azure-bam/ We would have loved to have gone into much more detail around self-healing CD pipelines and especially the advances made by Dynatrace. Monitoring as Code as a concept is rapidly growing in popularity; we love the application of using automated monitoring for a more viable go/no go decision, and having monitoring (monspec) files kept in source control right next to the other infrastructure and source code of the project.
- [dyn2] – “Unbreakable DevOps Pipeline: Shift-Left, Shift-Right & Self-Healing”, Andreas Grabner. DynaTrace, 2/9/2018. https://www.dynatrace.com/news/blog/unbreakable-devops-pipeline-shift-left-shift-right-self-healing/ A great walkthrough of implementing an unbreakable CD pipeline, in this case using AWS Lambda functions and Dynatrace. Andreas makes a great case for applying the Shift-Left movement to monitoring as code.
- [dop65] – “DevOps Cafe Episode 65 – John interviews Damon”, John Willis, Damon Edwards. DevOps Café, 12/15/2015. http://devopscafe.org/show/2015/12/15/devops-cafe-episode-65-john-interviews-damon.html A great discussion about the antipatterns around the releases and the dangerous illusion of control that many managers suffer from. In one company, they had less than 1% of CAB submittals rejected – out of 2,000 approved. Those that were rejected often had not filled out the correct submittal form! As Damon brought out, all this activity was three degrees removed from the keyboard – those making the approvals really had very little idea of what was actually going on. [dop65]
- [dri2] – “Monitoring, and Why It Matters To You”, Dave Harrison. driftboatdave.com, 4/4/2017. https://driftboatdave.com/2017/04/04/monitoring-and-why-it-matters-to-you/ A more complete discussion of the vicious vs virtuous cycle described in this section, along with some specific examples from Etsy’s groundbreaking work around monitoring.
- [tdoh] – “The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations”, Gene Kim, Patrick Dubois, John Willis, Jez Humble. IT Revolution Press, 10/6/2016, ISBN-10: 1942788002, ISBN-13: 978-1942788003. There’s an excellent story by Heather Mickman of Target about what it took to yank an antique process centered around what they called the TEAP-LARB form. “The surprising thing was that no one knew, outside of a vague notion that we needed some sort of governance process. Many knew that there had been some sort of disaster that could never happen again years ago, but no one could remember exactly what that disaster was.”
- [forsgren] – “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations”, Nicole Forsgren PhD, Jez Humble, Gene Kim. IT Revolution Press, 3/27/2018. ISBN-10: 1942788339, ISBN-13: 978-1942788331
- [dora2017] – “Annual State of DevOps Report”, unattributed author(s). Puppet Labs, 2017. https://puppetlabs.com/2017-devops-report
- [mcchrystal] – “Team of Teams: New Rules of Engagement for a Complex World”, Stanley McChrystal. Portfolio, 5/12/2015. ISBN-10: 1591847486, ISBN-13: 978-1591847489. The author notes that top-down decisionmaking (as with CAB meetings) has the effect of sapping firepower and initiative; this was echoed by Brian Blackman and Anne Steiner in their interviews in the Appendix section. The military has learned the limitations of higher command, and strives not to command more than is necessary or plan beyond the circumstances that can be foreseen. Orders are given that define and communicate the intent, but the execution strategy is often left up to the individual units.
- [catafl] – “CatastrophicFailover”, Martin Fowler. MartinFowler.com, 3/7/2005. https://martinfowler.com/bliki/CatastrophicFailover.html . A vivid description of a cascading failure and the complexities associated with event-driven architectures that informed the failure Alex experienced in this section.
- [matr] – “Making Matrixed Organizations Successful with DevOps: Tactics for Transformation in a Less Than Optimal Organization”, Gene Kim. IT Revolution DevOps Enterprise Forum 2017. https://itrevolution.com/book/making-matrixed-organizations-successful-devops/ A good discussion on how and why to form a cross-functional team, starting with the leadership level.
- [gruvle] – “Start and Scaling Devops in the Enterprise”, Gary Gruver. BookBaby, 12/1/2016. ISBN-10: 1483583589, ISBN-13: 978-1483583587
Chapter 7 – Microservices
- [newm] “Building Microservices: Designing Fine-Grained Systems”, Sam Newman. O’Reilly Media; 2/20/2015. ISBN-10: 1491950358, ISBN-13: 978-1491950357. SUCH a great book, definitely on my top 3 list on this subject.
- [bbom] – “Big Ball of Mud”, Brian Foote and Joseph Yoder. University of Illinois at Urbana-Champaign, 6/26/1999. http://www.laputan.org/mud/mud.html Based on a presentation at the Fourth Conference on Patterns Languages of Programs 1997, the original and very well known “big ball of mud” paper.
- [yarrow] – “The Org Charts Of All The Major Tech Companies”, Jay Yarrow. Business Insider, 6/29/2011, https://www.businessinsider.com/big-tech-org-charts-2011-6
- [manu] – “The Google Doodler”, Manu Cornet. Ma.nu, 2011. http://ma.nu/about/aboutme/2013.07.15_theartofdoing_googler_doodler.pdf
- [feathers] – “Working Effectively with Legacy Code”, Michael Feathers. Prentice Hall, 10/2/2004. ISBN-13: 978-0131177055, ISBN-10: 9780131177055
- [fowl2] – “Microservices”, James Lewis and Martin Fowler. MartinFowler.com, 3/25/2014. https://martinfowler.com/articles/microservices.html
- [yegge] – “Stevey’s Google Platforms Rant”, Steve Yegge. Gist.github.com, 1/11/2011. https://gist.github.com/chitchcock/1281611 – a now legendary rant about platforms by a software architect that worked early on at both Google and Amazon. Steve did NOT get fired for his little “reply all” oopsie, shockingly – which tells you a lot about the positive traits of Google’s culture right there.
-
[dign] – “Little Things Add Up”, Larry Dignan. Baseline Magazine, 10/19/2005. http://www.baselinemag.com/c/a/Projects-Management/Profiles-Lessons-From-the-Leaders-in-the-iBaselinei500/3 – “Small teams are fast… and don’t get bogged down. … each group assigned to a particular business is completely responsible for it… the team scopes the fix, designs it, builds it, implements it and monitors its ongoing use.”
- [sfowl] – “Production-Ready Microservices: Building Standardized Systems Across an Engineering Organization”, Susan Fowler. O’Reilly, 12/1/2016. ISBN-10: 1491965975, ISBN-13: 978-1491965979. Susan points out that there’s always a balance between speed and safety; the key is to start with a clear goal in mind. Her thoughts around alerts and dashboarding are very well thought out. Even better, it hits perhaps the one true weak point of microservices right on the head; the need for governance. She found it most effective to have a direct pre-launch overview with the development team going over the design on a whiteboard; within ten minutes, it will become apparent if the solution was truly production-ready. If you have only one book to read on microservices – this is it.
- [conw2] – “How Do Committees Invent?”, Melvin Conway. MelConway.com, 4/1/1968. http://www.melconway.com/Home/Committees_Paper.html – The original paper as submitted by Melvin Conway. Famously the Harvard Business Review rejected Melvin’s original paper due to lack of proof; Datamation ended up publishing it in April 1968, and Fred Brook’s classic book “The Mythical Man-Month” made it famous. Rarely has such a small splash made such a big ripple.
- [nacha] – “The Influence of Organizational Structure On Software Quality: An Empirical Case Study”, Nachiappan Nagappan, Brendan Murphy, Victor Basili, and Nachi Nagappan. Microsoft Research, 1/1/2008. https://www.microsoft.com/en-us/research/publication/the-influence-of-organizational-structure-on-software-quality-an-empirical-case-study/?from=http%3A%2F%2Fresearch.microsoft.com%2Fpubs%2F70535%2Ftr-2008-11.pdf – A very nice metrics-based backup to what we read in “The Mythical Man-Month”, as shown with the troubled Windows Vista release at Microsoft. Here in a recap of that disastrous release, the researchers found that the structure of the organization was the most relevant predictor of failure-prone applications – versus traditional KPIs like churn, complexity, coverage, and bug counts. We suspect that this paper and others like it influenced the decision by Microsoft to upend the structure of their program teams for Azure DevOps and Bing.
- [grint] – “Splitting the organization and integrating the code: Conway’s law revisited”, Rebecca Grinter, James D. Herbsleb. ACM Digital Library, 5/22/1999. https://dl.acm.org/citation.cfm?id=302455. Interestingly, while the Nachiappan study above mentioned that globally distributed teams didn’t perform worse than collocated teams, this paper says the opposite – collocated teams are better functioning than globally distributed. It turns out that when you control for team size, both are correct: the greatest limiting factor was that old enemy, communications overhead. In other words, it doesn’t seem to matter as much if a team is collocated vs distributed, as long as we cap the size to that magical 5-12 number.
- [lightst] – “The Only Good Reason to Adopt Microservices”, Vijay Gill. LightStep.com, 7/19/2018. https://lightstep.com/blog/the-only-good-reason-to-adopt-microservices/
- [kimbre] – “An Interview with Jez Humble on Continuous Delivery, Engineering Culture, and Making Decisions”, Kimbre Lancaster. split.io, 8/16/2018. https://www.split.io/blog/jez-humble-interview-decisions-2018/
- [fami] – “Microservices, IoT, and Azure: Leveraging DevOps and Microservice Architecture to deliver SaaS Solutions”, Bob Familiar. Apress, 10/20/2015. ISBN-10: 9781484212769, ISBN-13: 978-1484212769. The best book we’ve seen out there on IoT in the Microsoft space, by a long shot. Bob Familiar does a terrific job of explaining IoT and microservices in context.
- [fowl4] – “StranglerApplication”, Martin Fowler. MartinFowler.com, 6/29/2004. https://www.martinfowler.com/bliki/StranglerApplication.html
- [narum] – “Strangler Pattern”, Masashi Narumoto and Mike Wasson. Microsoft Docs, 6/22/2014, https://docs.microsoft.com/en-us/azure/architecture/patterns/strangler A good quick overview of how we can use the strangler pattern to chip away and eventually deprecate a massive legacy app. Mike Wasson in particular may be one of the best technical writers we’ve got at Microsoft.
- [calca] – “Building Products at SoundCloud —Part I: Dealing with the Monolith”, Phil Calcado. Soundcloud, 6/11/2014. https://developers.soundcloud.com/blog/building-products-at-soundcloud-part-1-dealing-with-the-monolith
- [hodg1] – “Azure DevOps: From Monolith to Cloud Service”, Buck Hodges. YouTube, 10/24/2017. https://www.youtube.com/watch?v=9frodP5xLxk&feature=youtu.be A nice discussion of how Azure DevOps made the switch to microservices, including maintaining consistency between an on-premises product and the hosted multi-tenant service, how they tackled that tough backend problem, and starting over with telemetry.
- [hodg2] – “From Monolith to Cloud Service”, Buck Hodges. Microsoft Docs, 11/8/2017. https://docs.microsoft.com/en-us/azure/devops/learn/devops-at-microsoft/monolith-cloud-service?WT.mc_id=linkedin . Starting from a position much like Ben’s team does, with a good use of version control but little else – no telemetry, no agile or scrum, no live-site support or on-call experience, Buck walks us through turning an onprem monolith into a microservice-based, cloud-native service with Azure DevOps.
- [hodg3] – “Patterns for Resiliency in the Cloud”, Buck Hodges. Microsoft Docs, 11/8/2017. https://docs.microsoft.com/en-us/azure/devops/learn/devops-at-microsoft/patterns-resiliency-cloud . Cloud native architecture really means resilient architecture, and distributed computing makes tracking down a root cause a frustrating and sometimes multi-week endeavor – yes, even with feature flags. Buck explores the Circuit Breaker originally implemented by Netflix and how it’s used with Azure DevOps to degrade gracefully, and their use of throttling as limits are approached with SQL Xevents.
- [evans] – “Domain-Driven Design: Tackling Complexity in the Heart of Software”, Eric Evans. Addison-Wesley Professional, 8/30/2003. ISBN-10: 0321125215, ISBN-13: 978-0321125217. This is the gold standard, and should be required reading for anyone considering microservices – or indeed just plain well-defined systems architecture.
- [driftx] – “Practical Microservices”, Dave Harrison. driftboatdave.com, 9/7/2017. https://driftboatdave.com/2017/09/07/mtx-2017-practical-microservices-directors-cut/ . The original blog post and references that influenced this chapter.
- [amund] – “Microservice Architecture: Aligning Principles, Practices, and Culture”, Mike Amundsen, Matt McLarty, Ronnie Mitra, Irakli Nadareishvili. O’Reilly Media, 8/5/2016. ISBN-10: 1491956259, ISBN-13: 978-1491956250. A great discussion on Domain Driven Design in chapter 5, along with a great practical breakdown of handling one workstream and defining service boundaries using DDD of a sample company.
- [lewis] – “GOTO 2015 • How I Finally Stopped Worrying and Learnt to Love Conway’s Law”, James Lewis. GOTO 2015 Chicago conference, YouTube, 7/15/2015. https://www.youtube.com/watch?v=l1tyfb5we7I There’s a few great examples where they knew the org was not capable of the change needed – and designed a system that would fit it (square peg in square hole!) instead of dictating how the design should work in a perfect, idealistic world.
- [shconw] – “Randy Shoup on Microservices, the Reality of Conway’s Law, and Evolutionary Architecture”, Daniel Bryant. InfoQ, 7/3/2015. https://www.infoq.com/interviews/randy-shoup-microservices Randy uses his experience from Google and eBay to talk about why monoliths aren’t necessarily as evil as we often think they are.
- [vaugh] – “Implementing Domain-Driven Design”, Vaughn Vernon. Addison-Wesley, 2/16/2013. ISBN-10: 0321834577, ISBN-13: 978-0321834577. This is the best applied and in-depth discussion we’ve seen of Eric’s groundbreaking work around decomposition and finding domain boundaries.
- [newmpr] – “Principles Of Microservices”, Sam Newman. YouTube, 11/1/2015, https://www.youtube.com/watch?v=PFQnNFe27kU. Sam goes through the underlying principles behind microservices, and then attempts to resolve the tension in a core issue with microservices – how independent can they truly be as part of a whole?
- [qamr] – “Using Microservices Architecture to Break Your Vendor Lock-in”, unattributed author(s). QArea, unknown date. https://qarea.com/blog/using-microservices-architecture-to-break-your-vendor-lock-in – Google is famous for buying or relying on COTS or OS libraries – but making sure that any interactions are through a shell that they can control and modify. This article discusses the negative cycle when we overrely on vendors and how it increases the fragility of our systems – and how they have broken this vendor lockin using Golang microservices.
- [caval] – “Our journey to microservices: mono repo vs multiple repositories”, Avi Cavale. Shippable.com, 6/2/2016. http://blog.shippable.com/our-journey-to-microservices-and-a-mono-repository Shippable started their effort with multiple repositories, and ended up making the switch over to a single repository: “The only thing you really give up with a mono repo is the ability to shut off developers from code they don’t contribute to. There should be no reason to do this in a healthy organization with the right hiring practices. Unless you’re paranoid… or named Apple.”
- [netfl1] – “Adopting Microservices at Netflix: Lessons for Architectural Design”, Tony Mauro. Nginx.com, 2/19/2015. https://www.nginx.com/blog/microservices-at-netflix-architectural-best-practices/ – A very good overview of Adrian Cockroft’s series of talks and thinking on microservices and the lessons he learned at Netflix.
- [goto2014] – “GOTO 2014 • Migrating to Cloud Native with Microservices”, Adrian Cockroft. YouTube, 12/15/2014. https://www.youtube.com/watch?v=DvLvHnHNT2w – the original video on Netflix and microservices that was the source for the article above.
- [nginx2014] – “Fast Delivery”, Adrian Cockcroft. Nginx, YouTube, 12/2/2014. https://youtu.be/5qJ_BibbMLw – Adrian points out that Netflix from the beginning favored a fine-grained, loosely coupled architecture. This fed into every one of the four key capabilities Adrian finds vital to deliver at scale – allowing autonomy and the freedom to innovate and make fast decisions; getting answers using big data analytics to explore alternatives and evaluate success; relying on the cloud to remove the latency around spinning up new resources; and eliminating coordination latency by folding everyone needed to deploy and support a service into a single team.
- [gehan] – “Want to develop great microservices? Reorganize your team”, Neil Gehani. Mesosphere, unknown date. https://techbeacon.com/want-develop-great-microservices-reorganize-your-team – He calls a cross functional delivery team of 6-12 people a “build-and-run” team, which we kind of like.
- [kimgb] – “Going big with DevOps: How to scale for continuous delivery success”, Gene Kim. TechBeacon.com, unknown date. https://techbeacon.com/going-big-devops-how-scale-continuous-delivery-success . We love the Target story because it’s one of those inspiring dumpster-fire-to-paradise redemption accounts.
- [brooks] – “The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition”, Frederick P. Brooks Jr. Addison-Wesley Professional, 8/12/1995. ISBN-10: 9780201835953, ISBN-13: 978-0201835953
Chapter 7 – One Mission
- [lond] – “To Build a Fire, and Other Stories”, Jack London. Reader’s Digest Association, 1/1/1994. ISBN-10: 0895775832, ISBN-13: 978-0895775832
- [dweck] – “Mindset: The New Psychology of Success”, Carol Dweck. Random House, 2/28/2006. ISBN-10: 1400062756, ISBN-13: 978-1400062751
- [popov] – “Fixed vs. Growth: The Two Basic Mindsets That Shape Our Lives”, Maria Popova. BrainPickings.org, 1/29/2014. https://www.brainpickings.org/2014/01/29/carol-dweck-mindset/ Love the BrainPickings site and its fabulous content.
- [nigel2] – “Why are we all such hypocrites when it comes to DevOps?”, Nigel Kersten. SpeakerDeck, 10/17/2017. https://speakerdeck.com/nigelkersten/why-are-we-all-such-hypocrites-when-it-comes-to-devops – A great presentation by Nigel Kersten on impoverished communication. He covers optimism bias (which is more likely when you lack experience, believe you have more control/influence than you actually do, and think negative events are unlikely). I also love the point he makes on our own skewed view of others – that we often attribute other’s behavior/skillsets as unchangeable, whereas we excuse our own as being caused by external factors (traffic was terrible today, I’m at stress from home, etc)
- [hbr] – “Up and Down the Communications Ladder”, Bruce Harriman. Harvard Business Review, 9/1/1974. https://hbr.org/1974/09/up-and-down-the-communications-ladder – The original source of the presentation by Nigel, based on a 1969 study. We’ll call out one key point – that the feedback program must not be an endcap, but product visible results.
- [habit] – “The Power of Habit: Why We Do What We Do in Life and Business”, Charles Duhigg. Random House, 1/1/2014. ISBN-10: 081298160X, ISBN-13: 978-0812981605
- [ohwm] – “Workplace Management”, Taiichi Ohno. McGraw-Hill Education, 12/11/2002. ISBN-10: 9780071808019, ISBN-13: 978-0071808019
- [sharma] – “The DevOps Adoption Playbook: A Guide to Adopting DevOps in a Multi-Speed IT Enterprise”, Sanjeev Sharma. Wiley, 2/28/2017. ISBN-10: 9781119308744, ISBN-13: 978-1119308744
- [russd] – “It Takes Dev and Ops to Make DevOps”, Russ Collier. DevOpsOnWindows.com, 7/26/2013. http://www.devopsonwindows.com/it-takes-dev-and-ops-to-make-devops/
- [cumm2017] – “DevOpsDays Boston 2017 – KEYNOTE: Settlers of DevOps”, Rob Cummings. YouTube, 10/20/2017, https://www.youtube.com/watch?v=woSoQq3UkAc. The Boston 2017 keynote to DevOps Days, with the outstanding Settlers and Town Planners model. He dismantles the appallingly stupid Bimodal IT theory, and we love Rob’s very succinct and beautiful definitions of what DevOps is about: “I want to deliver customer value faster and more humanely.”
- [tdoh] – “The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations”, Gene Kim, Patrick Dubois, John Willis, Jez Humble. IT Revolution Press, 10/6/2016, ISBN-10: 1942788002, ISBN-13: 978-1942788003. Chapter 16 by Steve Bella and Karen Whitley Bell is outstanding as a case study of ING Netherlands; it may be the best chapter in the entire book.
- [wardl] – “On Pioneers, Settlers, Town Planners and Theft”, Simon Wardley. Gardeviance.org, 3/13/2015. https://blog.gardeviance.org/2015/03/on-pioneers-settlers-town-planners-and.html – The original source of the now famous three-phase DevOps growth model.
-
[teams] – “Team of Teams: New Rules of Engagement for a Complex World”, Stanley McChrystal. Portfolio, 5/12/2015. ISBN-10: 1591847486, ISBN-13: 978-1591847489
- [lean] – “Lean Enterprise: How High Performance Organizations Innovate at Scale”, Jez Humble, Joanne Molesky, Barry O’Reilly. O’Reilly Media, 1/3/2015. ISBN-10: 1449368425, ISBN-13: 978-1449368425. For large enterprises attempting big-picture changes, this is the best book out there that we’ve found to date. Very pragmatic, numbers-centric and a huge influence on the contents of this book.
- [bung] – “Mission Command: An Organizational Model for Our Time”, Stephen Bungay. Harvard Business Review, 11/2/2010. https://hbr.org/2010/11/mission-command-an-organizat Mission Command embraces a conception of leadership which unsentimentally places human beings at its center.
- [reine] – “The Principles of Product Development Flow: Second Generation Lean Product Development”, Donald Reinertsen. Celeritas Publishing, 1/1/2009. ISBN-10: 1935401009, ISBN-13: 978-1935401001
- [kimbg] “The Other Side of Innovation: Solving the Execution Challenge”, Vijay Govindarajan, Chris Trimble. Harvard Business Review, 9/2/2010. ISBN-10: 1422166961, ISBN-13: 978-1422166963
- [perkin] – “Structuring for Change: The Dual Operating System”, Neil Perkin. Medium.com, 4/11/2017. https://medium.com/building-the-agile-business/structuring-for-change-the-dual-operating-system-78fa3a3d3da3
- [kotte] – “Accelerate: Building Strategic Agility for a Faster-Moving World”, John P. Kotter. Harvard Business Review Press, 4/8/2014. ISBN-10: 1625271743, ISBN-13: 978-1625271747. Kotter describes here what we now call a “virtual” cross functional team, which he calls a ‘dual operating system’ – combining the entrepreneurial capability of a network with the organizational efficiency of traditional pyramid-like hierarchy, and argues that one compliments the other.
- [dam41] – “You Can’t Change Culture, But You Can Change Behavior, and Behavior Becomes Culture”, Damon Edwards. DevOpsDays.org, Vimeo, 10/10/2012. http://vimeo.com/51120539 . An awesome discussion on culture change and how our behavior – and the standards we set – causes ripple effects.
- [sagat] – “Why DevOps Matters: Practical Insights on Managing Complex & Continuous Change”, unattributed author(s). Saugatuck Technology, 10/1/2014. http://aka.ms/os09me A Microsoft-sponsored study that has some nice data driven insights.
- [eliz] – “Change Agents of Ops: What it Takes”, Eliza Earnshaw. Puppet, 11/6/2014. http://puppetlabs.com/blog/change-agents-it-operations-what-it-takes A very punchy interview with Sam Eaton, the director of engineering operations at Yelp.
- [kimx] – “How do we Better Sell DevOps?”, Gene Kim. DevOpsDays.org, Vimeo, 5/6/2013. http://vimeo.com/65548399 – A great presentation, describing the business benefits derived from DevOps.
- [chamor] – “4 Ways to Create a Learning Culture on Your Team”, Tomas Chamorro-Premuzic, Josh Bersin. Harvard Business Review, 7/12/2018. https://hbr.org/2018/07/4-ways-to-create-a-learning-culture-on-your-team – Covers how leaders shouldn’t wait or be dependent on employer-provided training, but instead lead by example in demonstrating curiosity and sharing learning; reinforce positive learning behavior (including providing meaningful critical feedback), and looking for hungry minds in your interviewing process.
- [woodw] – “Moving 65,000 Microsofties to DevOps with Visual Studio Team Services”, Martin Woodward, https://youtu.be/W6dqrvb-Yyw?t=4391. A fuller walkthrough of the Azure DevOps team’s transformation, start to finish.
- [dora2017] – “Annual State of DevOps Report”, unattributed author(s). Puppet Labs, 2017. https://puppetlabs.com/2017-devops-report
- [dora2018] – “Annual State of DevOps Report”, unattributed author(s). Puppet Labs, 2018. https://puppetlabs.com/2018-devops-report
- [kissl2] – “Transforming to a Culture of Continuous Improvement”, Courtney Kissler, DevOps Enterprise Summit 2014 presentation, https://www.youtube.com/watch?v=0ZAcsrZBSlo
- [forsgren] – “Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations”, Nicole Forsgren PhD, Jez Humble, Gene Kim. IT Revolution Press, 3/27/2018. ISBN-10: 1942788339, ISBN-13: 978-1942788331. We particularly enjoyed the introduction by Courtney Kissler.
- [nflpd] – “Adopting Microservices at Netflix: Lessons for Team and Process Design”, Tony Mauro. Nginx, 3/10/2015. https://www.nginx.com/blog/adopting-microservices-at-netflix-lessons-for-team-and-process-design/ A very good article, covering Netflix’s use of the OODA loop in optimizing for speed versus efficiency, and creating a high-freedom, high-responsibility culture with less process.
- [walkr] – “Resilience Thinking: Sustaining Ecosystems and People in a Changing World”, Brian Walker, David Salt. Island Press, 8/22/2006. ISBN-10: 9781597260930, ISBN-13: 978-1597260930
- [doj1] – “DevOps Dojo”, unattributed author(s). Chef, 4/10/2018. https://blog.chef.io/2018/04/10/fulfilling-the-need-for-continuous-improvement-with-devops-dojos/
- [targy3] – “DevOps At Target: Year 3”, Heather Mickman. IT Revolution, YouTube, 11/28/2016. https://www.youtube.com/watch?v=1FMktLCYukQ&app=desktop Heather describes the storming/norming/performing process we’ve seen elsewhere with successful DevOps initiatives – starting in 2012, with change agents appearing and kickstarting a grassroots DevOps transformation; then a gradual uplift as senior leaders took up the torch and provided the muscle and focus needed to build out better architecture.
- [damb] – “Target CIO explains how DevOps took root inside the retail giant”, Damon Brown. EnterprisersProject.com, 1/16/2017. https://enterprisersproject.com/article/2017/1/target-cio-explains-how-devops-took-root-inside-retail-giant More on Target’s use of DevOps Dojos to overcome hurdles, from the CIO directly.
- [rach] – “Target Rebuilds its Engineering Culture, Moves to DevOps”, Rachael King. Wall Street Journal, 10/19/2015. https://blogs.wsj.com/cio/2015/10/19/target-rebuilds-its-engineering-culture-moves-to-devops/ The subject of the Dojo keeps coming up as a critical catalyst in the Target use case.
- [eliz] – “DevOps and Change Agents: Common Themes”, Eliza Earnshaw. Puppet, 12/3/2014. https://puppet.com/blog/devops-and-change-agents-common-themes
- [srew] – “The Site Reliability Workbook”, Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, and Stephen Thorne. A terrific resource, especially the discussion in Chapter 6 on toil.
- [schauso] – “Sharing our experience of self-organizing teams”, Willy Schaub. Microsoft Developer Blog, 12/2/2016. https://blogs.msdn.microsoft.com/visualstudioalmrangers/2016/12/02/sharing-our-experience-of-self-organizing-teams/ This and Brian Harry’s article below describe one of the most innovative – and insane-sounding! – team building exercises that ended up being much less disruptive, and wildly successful, than Microsoft first thought.
- [bharryso] – “Self forming teams at scale”, Brian Harry. Microsoft Developer Blog, 7/24/2015. https://blogs.msdn.microsoft.com/bharry/2015/07/24/self-forming-teams-at-scale/
- [bjaaso] – “Agile principles in practice”, Aaron Bjork. Microsoft Docs, 5/30/2018. https://docs.microsoft.com/en-us/azure/devops/learn/devops-at-microsoft/agile-principles-in-practice
Chapter 7 – DevOps and Leadership
- [vinc2] – “DevOps and Leadership”, Ron Vincent. LinkedIn, 4/2/2018. https://www.linkedin.com/pulse/devops-leadership-ron-vincent/ This is the original source article for the section above. Ron also had an excellent post on eliminating waste that we encourage you to take the time to read; it’s one of the best (and shortest) writings we’ve seen on a very important topic.
- [brit] – “Taylorism”, unattributed author(s). Encyclopaedia Britannica, unknown date. https://www.britannica.com/science/Taylorism
- [neot] – “Neo Taylorism or DevOps Anti Patterns”, John Willis. IT Revolution, 10/23/2012. https://itrevolution.com/neo-taylorism-or-devops-anti-patterns
- [origi] – “The Origin of Society”, unattributed author(s). Modern Matriarchal Societies, unknown date. http://mmstudies.com/top-down .
- [finv] – “DevOps and Finance”, Ron Vincent. LinkedIn, 12/16/2017. https://www.linkedin.com/pulse/devops-finance-ron-vincent/ .
- [liker] – “The Toyota Way to Lean Leadership: Achieving and Sustaining Excellence through Leadership Development”, Jeffrey Liker, Gary Convis. McGraw-Hill Education, 11/7/2011. ISBN-10: 0071780793; ISBN-13: 978-0071780797.
- [dora2017] – “Annual State of DevOps Report”, unattributed author(s). Puppet Labs, 2017. https://puppetlabs.com/2017-devops-report
- [reine] – “The Principles of Product Development Flow: Second Generation Lean Product Development”, Donald Reinertsen. Celeritas Publishing, 1/1/2009. ISBN-10: 1935401009, ISBN-13: 978-1935401001
Chapter 8 – The End of the Beginning
- [lewpm] – “Project management non-best-practices”, Bob Lewis. InfoWorld, 9/26/2006. https://www.infoworld.com/article/2636977/techology-business/project-management-non-best-practices.html
- [mezak] – “The Origins of DevOps: What’s in a Name?”, Steve Mezak. DevOps.com, 1/25/2018. https://devops.com/the-origins-of-devops-whats-in-a-name/ A nice overview of the beginnings of the DevOps movement, including the seminal presentations given in 2008 and 2009 by Andrew Schafer, Patrick Debois, John Allspaw, and Paul Hammond.
- [net] – New English Translation of Ecclesiastes 3:22. NET Bible Noteless, Kindle edition, 8/26/2005. ASIN: B0010XIA8K
- [shunryu] – “Zen Mind, Beginner’s Mind: Informal Talks on Zen Meditation and Practice”, Shunryu Suzuki. Shambhala Library, 10/10/2006. ISBN-10: 9781590302675, ISBN-13: 978-1590302675
Appendix – Aaron Bjork
- [bjork] – “Agile At Microsoft”, Aaron Bjork. Microsoft Visual Studio, YouTube, 10/2/2017. https://www.youtube.com/watch?v=-LvCJpnNljU This is the best explanation I’ve seen of “The Microsoft Story”, and it’s packed with information; a must-watch.
- [wang2] – “VSLive! Keynote: Abel Wang Details Microsoft’s Painful DevOps Journey”, Abel Wang. Visual Studio Magazine, 8/17/2018. https://visualstudiomagazine.com/articles/2018/08/17/abel-wang-devops.aspx. There’s a great snapshot and explanation of the bug cap in this article, as well as other background behind the MS story.
Appendix – Betsy Beyer, Stephen Thorne
- [sre] – “Site Reliability Engineering: How Google Runs Production Systems”, Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy. O’Reilly Media, 4/1/2016. ISBN-10: 9781491929124, ISBN- 13: 978-1491929124
- [ghbsre] – “The Site Reliability Workbook: Practical Ways to Implement SRE”, Niall Murphy, David Rensin, Betsy Beyer, Kent Kawahara, Stephen Thorne. O’Reilly Media, 8/1/2018. ISBN-10: 1492029505, ISBN-13: 978-1492029502
- [kieran] – “Managing Misfortune for Best Results”, Kieran Barry. SREcon EMEA, 8/30/2018. https://www.usenix.org/node/218852 . This is a great overview of the Wheel of Misfortune exercises in simulating outages for training, and some antipatterns to avoid.
Appendix – John-Daniel Trask
- [rayg] – https://raygun.com/ – The official Raygun site.
- [hansrag] – “Managing Errors across platforms with RayGun.io”, Scott Hanselman, John-Daniel Trask. Hanselminutes.com, 5/22/2014. https://hanselminutes.com/421/managing-errors-across-platforms-with-raygunio
- [ch9rg] – “Handling billions of exceptions with .NET & Raygun.io”, John-Daniel Trask. Channel 9, 3/5/2015. https://channel9.msdn.com/Events/dotnetConf/2015/Handling-billions-of-exceptions-with-NET–Raygunio
- [ch9boyd] – “DevOps at LightSpeed, lessons we learned from building a Raygun”, Jeremy Boyd, John-Daniel Trask. Channel9, 9/6/2013. https://channel9.msdn.com/Events/TechEd/NewZealand/2013/DEV302
- [qzdm] – “Domino’s stock has outperformed Google, Facebook, Apple, and Amazon this decade”, Chase Purdy. Quartz, 3/22/2017. https://qz.com/938620/dominos-dpz-stock-has-outperformed-google-goog-facebook-fb-apple-aapl-and-amazon-amzn-this-decade/
Appendix – John Weers
- [issurv] – IS Survivor, Bob Lewis. http://issurvivor.com/ . This is a great site John recommended that we enjoyed very much, especially on process and change management.
Appendix – Rob England
- [itskept] – The IT Septic, Rob England, http://www.itskeptic.org/blog. See Rob’s great articles around DevOps and thoughts on his blog.
- [garthyp] – “Gartner Hype Cycle”, unattributed author(s). Gartner, unknown date. https://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
Appendix – Sam Guckenheimer
- [guck2] – “DevOps at Microsoft”, Sam Guckenheimer. Microsoft Docs, 7/8/2018. https://docs.microsoft.com/en-us/azure/devops/learn/devops-at-microsoft/ A very good doc hub and overview for those who want to know ‘How Microsoft did it’, broken down by practice.