Month: May 2026

Azure SRE Agent – Just The Links

I’ve been doing some demos / walkthroughs of the Azure SRE Agent – thought I would post here the resources I’ve found most helpful. The specific use case here is, “How can I get better / more intelligent alerts for yellow case conditions – like with an endpoint that’s degraded in response times but not yet failed / in a red state?”

Anyway this is a fun agent to work with and very extensible. Try it yourself!

GitHub Copilot CLI – Getting Started

If you spend a lot of time in the terminal, GitHub Copilot CLI feels surprisingly natural—almost like it’s always been missing from your workflow. In this post, I’ll walk you through the bare basics you’ll need to get started with using it. Please check out the references at the end as a good starting point for your journey!

Installation

Once its installed (using npm or winget, npm install -g @github/copilot for example) you should get a glorious 80s type CLI window by typing in the following:

copilot

Note here – the easiest path is to use GitHub Codespaces for zero setup. That preinstalls Python and pytest. Fork the repository to your GH account, and select Code > Codespaces > Create codespace on main

Try this as your first prompt:

Say hello and tell me what you can help with

There’s some slash commands here you can play around with. We’ll be getting into the main ones a little later:

Your First Code Review

Let’s do a quick comparison of the different models available to us:

/model

Higher-multiplier models use your premium request quota faster, so save those for when you really need them. But let’s try our very first code review, so we can compare the output:

/review review the apis using gpt 5.4, opus 4..

This is a great way to see the differences between models and what they catch (and don’t catch). There is no clear leader at present between OpenAI, Anthropic, Google, etc – no single model does it all best.

Four Interaction Modes

The key points here:

  • Interactive: conversation and iteration
  • Plan: design before coding
  • Programmatic: one-off commands

So to experiment a little with these : (in copilot mode of course)

/plan Add search and filter capabilities
/plan Add a "mark as read" command to the app
/plan Add OAuth2 authentication with Google and GitHub providers
copilot -p "Write a function that checks if a number is even or odd"

And if you’re truly kicking the tires experiment with /ask mode:

Explain what a dataclass is in Python in simple terms
Write a function that sorts a list of dictionaries by a specific key
What's the difference between a list and a tuple in Python?
Give me 5 best practices for writing clean Python code

Working with Code

These are some sample prompts you can try yourself in working with a new application.

Onboarding

Explain what FILENAME does
Review all files in PROJECT
How is logging configured in this project?
What's the pattern for adding a new API endpoint?
Explain the authentication flow
Where are the database migrations?
Compare @FILE1 and @FILE2 for consistency

Analyzing files together reveals bugs, data flow, and patterns that are invisible in isolation. If I would have checked books.py singly, it would have been – cool! Syntax is fine, types valid, style is clean…

With @FILE1  @FILE2 - How do these files work together? What's the data flow?
In one paragraph, what does this app do and what are its biggest quality issues?
Give me an overview of the code structure
How does the app save and load books?

Test Driven Development

Write failing tests for the user registration flow
Now implement code to make all tests pass
Commit with message "feat: add user registration"

Code Review

/review Use Opus 4.5 and Codex 5.2 to review the changes in my current branch against `main`. Focus on potential bugs and security issues.
Review FILE  and suggest improvements
Add type hints to all functions
Make error handling more robust
Review all files in @PROJECTNAME for error handling
Find security vulnerabilities that span BOTH files

Refactoring

i want to improve FILENAME. what does each function in this file do?
Add validation to FUNCTION() so it handles empty input and non-numeric entries
What happens if FUNCTION() receives an empty string for the title? Add guards for that.
Add a comprehensive docstring to FX() with parameter descriptions and return values

Git Workflows

What changes went into version `2.3.0`?
Create a PR for this branch with a detailed description
Rebase this branch against `main`
Resolve the merge conflicts in `package.json`

Bug Investigation

The `/api/users` endpoint returns 500 errors intermittently. Search the codebase and logs to identify the root cause.

Putting it All Together

So your workflow might look something like the following:

  • Explore: “Read the authentication files but don’t write code yet”
  • Plan: “/plan Implement password reset flow”
  • Review: “Check the plan, suggest modifications”
  • Implement: “Proceed with the plan”
  • Verify: “Run the tests and fix any failures”
  • Commit: “Commit these changes with a descriptive message”

A Few Words about Agents

So we’ve already used agents! When I enter in copilot mode:

/plan Add input validation in the app on X field

That’s using an agent!

So for example –  and this is a power tip with agents – try this: When you need to investigate a library, understand best practices, or explore an unfamiliar topic, use /research to run a deep research investigation before writing any code:

/research What are the best Python libraries for validating user input in CLI apps?

What Are Skills?

Agent Skills are folders containing instructions, scripts, and resources that Copilot automatically loads when relevant to your task. Copilot reads your prompt, checks if any skills match, and applies the relevant instructions automatically.

These can be invoked command line, for example:

/generate-tests Create tests for the user authentication module
/code-checklist Check books.py for code quality issues
/security-audit Check the API endpoints for vulnerabilities

And you can ask copilot directly what skills were used:

What skills did you use for that response?
What skills do you have available for security reviews?

Last – why would we use an instruction file vs an agent?

Best Practices and Final Thoughts

As the official documentation reminds us – the following should become second nature the more you work with GitHub Copilot CLI:

  • Set Custom Instructions: Use .github/copilot-instructions.md to define project-specific coding standards and build commands.
  • Use /plan for Big Tasks: Always generate an implementation plan for complex refactors before writing any code. A good plan leads to dramatically better results. (this is especially true now that we’ve moved to usage based billing. More to come!)
  • Offload with /delegate: Use the cloud agent for long-running or tangential tasks (like documentation) to keep your local terminal free.
  • Save prompts that work well. When Copilot CLI makes a mistake, note what went wrong. Over time, this becomes your personal playbook.

And I’ll throw in a few observations of my own:

  • Code review becomes comprehensive with specific prompts
  • Refactoring is safer when you generate tests first
  • Debugging benefits from showing Copilot CLI the error AND the code
  • Test generation should include edge cases and error scenarios
  • Git integration automates commit messages and PR descriptions

References

Videos:

Documentation

What’s New in SpecKit?

I last wrote about SpecKit in December of 2025 – an eon ago it seems. Since then, there’s been some epochal changes in how we use agents to write code – including Squad and other work orchestrating fleets of agents. In all this, it’s encouraging to note that SpecKit is moving forward in maturity level as well.

Over the past few days I’ve gotten a chance to get caught up with the latest evolution of SpecKit. This very popular project has undergone a significant pivot, moving to a modular “kit” ecosystem.

For teams looking to scale AI-assisted engineering without drowning in “token burn” or architectural drift, here are the key takeaways from the latest SpecKit developments.

Extensions and Presets

SpecKit has moved away from trying to be everything to everyone in its core code. Instead, it now uses a catalog model. Two big changes here have to do with extensions and presets:

Extensions: There are now over 80 community-driven extensions. (See the full list here – it’s getting quite long!) These allow you to add new commands (like visual GUIs or deployment triggers) without bloating the core engine. One example here is AI-Driven Engineering, which has a very cool workflow very distinct from the traditional SpecKit flow:

Presets: This is perhaps the most powerful update for enterprise teams. This is the way you can adopt the SpecKit workflow to your own methodology or use your own org standards. It’s actually a fairly complex stacked override system. So just as an example – there’s one preset (Lean Workflow) that’s very light on ceremony – just 3 files needed: spec, plan, and tasks.

But that’s just the beginning. There’s an interesting one on Accessibility, and another on Fiction Book Writing, or an enhancement to /clarify called VS Code Ask Questions. Anyway Presets look like a fantastic way to add your special spin to how SpecKit does its work.

Automating the SDD Cycle with Workflows

The introduction of Workflows transforms SpecKit from a manual CLI tool into an automated engine. Workflows chain the entire SDD cycle—from constitution to implementation—with built-in human-in-the-loop gates.

  • Gated Approvals: The workflow will pause for review/approval/rejection at key milestones.
  • Resumable Runs: If a process is interrupted, workflows maintain state, allowing you to resume exactly where you left off.
  • Deterministic Automation: Unlike “Fleet” agents that can be unpredictable, workflows provide a step-based, procedural path that reduces unnecessary LLM hallucination.

I would say this is more for advanced / experienced teams – eliminating those gated steps might be a little jarring for some who are newer to using SpecKit / SDD.

Tools and Measuring the Cost of Quality with Token Analyzer

There’s a lot of cool SpecKit aligned content and tooling. For example there’s a visual GUI you can add:

But that’s not the best one I found. I’m working with quite a few enterprises as GitHub has made the pivot to Usage Based Billing, trying to control costs. I like this extension (?) very much – the Token Consumption Analyzer. This answers what often required lots of manual analysis – whether changing your model or prompt actually save tokens, and is it costing us in terms of quality?

Pro Tips for Better Implementation

Here are three immediate tactical shifts for your workflow:

  • Stop Using Personas: Recent research suggests that telling an LLM “You are an expert architect” can actually degrade performance. It narrows the model’s focus too much. Instead, focus on the task and provide rich context.
  • Manage Your MCP Servers: Tools like Azure DevOps or Figma MCP servers are great, but they add overhead to every request. Turn them off once you move from the specification phase to implementation to save on token costs.
  • Positive & Negative Testing: Don’t just ask for “tests.” Explicitly require both positive and negative test cases. Also, avoid chasing fixed code coverage percentages (e.g., “must be 100%”), as the token cost to reach those final few percentage points often outweighs the value.

The Bottom Line

SpecKit is maturing into a professional-grade orchestrator that rewards context over complexity. Structured SDD can deliver working implementations even in complex existing codebases. I think the last two paragraphs of the excellent brownfield walkthrough sums it up quite nicely:

“The more interesting takeaway is the ceiling, not the floor. Even with a thin spec, a bare plan, and no analysis pass, the agents produced a running, end-to-end implementation that required only conversational follow-up to debug and verify. … For teams considering this workflow: the agents are only as good as the context you give them. Treat speckit.specify, speckit.plan, and speckit.analyze as investments, not formalities. The implementation will reflect the quality of the artifacts that precede it.”