Overview

This repository contains everything you need to run a live, end-to-end demo of Azure SRE Agent detecting and remediating a real application failure — automatically.

Who is this for?

Conference speakers, technical evangelists, and anyone giving an AIOps or Azure SRE Agent demo. The failure and recovery are fully repeatable — run it as many times as you need.

What is Azure SRE Agent?

Azure SRE Agent is an AI-powered site reliability agent that continuously monitors your Azure resources. When something breaks, it:

Investigates — correlates metrics, logs, traces, and deployment history (root cause analysis)
Understands code — maps Azure resources back to GitHub source code via Deep Context
Remediates — proposes or executes corrective actions depending on run mode
Remembers — captures every investigation in persistent memory so it gets smarter over time

📖 Official docs: sre.azure.com/docs

The demo story

A live e-commerce storefront — Cacao & Co. — runs on Azure Container Apps, fully instrumented with OpenTelemetry. An Azure SRE Agent monitors the environment.

Step	What happens	Who does it
1. Healthy baseline	Visitors browse products, add to cart, check out. Telemetry flows to App Insights.	The app
2. Break checkout	A "bad deployment" sets `DEMO_BROKEN_CHECKOUT=true`. Checkout returns 503 with a 1.5 s delay. Rest of the site stays up.	You (one-click workflow)
3. Detection	SRE Agent sees the spike in 503 errors and failed dependency calls.	Azure SRE Agent
4. Investigation	The agent correlates logs, metrics, and traces. Maps the failure back to source code.	Azure SRE Agent
5. Remediation	Depending on run mode, the agent recommends or executes a fix.	Azure SRE Agent
6. Recovery	Checkout returns to 201. Telemetry confirms the fix.	The app

Repositories

Repo	What it contains
azure-sre-agent-demo	SRE Agent Bicep templates, demo workflows, this documentation
webstore	Next.js e-commerce app with built-in failure mode, OpenTelemetry, Docker + Container Apps deployment

Demo workflows

Two GitHub Actions workflows automate the break / fix cycle:

🔴 Demo: Break Checkout

Generates 30 baseline requests (healthy telemetry for contrast)
Sets DEMO_BROKEN_CHECKOUT=true on the Container App
Polls until checkout returns 503

gh workflow run "Demo: Break Checkout" -f environment=<YOUR_GITHUB_ENVIRONMENT>

🟢 Demo: Reset Checkout

Sets DEMO_BROKEN_CHECKOUT=false
Polls until checkout returns 201

gh workflow run "Demo: Reset Checkout" -f environment=<YOUR_GITHUB_ENVIRONMENT>

Next steps

Architecture — how the pieces connect
Getting Started — deploy the agent and configure workflows
Demo Script — step-by-step presenter guide with speaker notes

What is Azure SRE Agent?​

The demo story​

Repositories​

Demo workflows​

🔴 Demo: Break Checkout​

🟢 Demo: Reset Checkout​

Next steps​