Demo Script

A step-by-step presenter guide for running the Azure SRE Agent demo live. Designed for a 15–20 minute slot, but can be shortened to 10 minutes by trimming the setup narration.

Before the demo

Run through this checklist the morning of your presentation.

Infrastructure health

# PostgreSQL is running
az postgres flexible-server show \
  --name <YOUR_POSTGRES_SERVER_NAME> \
  --resource-group <YOUR_DEMO_RESOURCE_GROUP> \
  --query "state" -o tsv
# Expected: "Ready"

# Container App is responding and database is connected
curl -s https://<YOUR_APP_FQDN>/api/health | jq .
# Expected: { "status": "ok", "database": { "status": "ok", ... } }

Verify checkout works

For a quick presenter sanity check, use the browser. The GitHub Actions workflows now use a configured seeded product CUID (TEST_PRODUCT_ID) for their own verification, so they no longer rely on the broken hardcoded productId: "1" payload.

Open https://<YOUR_APP_FQDN> in a browser
Add any product to the cart
Complete a checkout — confirm you see the order confirmation page
If checkout returns an error, the DEMO_BROKEN_CHECKOUT env var may still be set to true from a previous demo run. Reset it:
```
gh workflow run "Demo: Reset Checkout" -f environment=<YOUR_GITHUB_ENVIRONMENT>
```

SRE Agent

Open sre.azure.com and navigate to your agent
Confirm the agent is Active (not "Building Knowledge Graph")
Confirm the webstore GitHub repo is connected under data sources
Confirm <YOUR_DEMO_RESOURCE_GROUP> is listed under Azure resources

Browser tabs (pre-open)

Tab	What to open	Purpose
1	Webstore storefront	Show the live app
2	sre.azure.com — Agent chat	Watch the agent investigate
3	Azure Portal — App Insights (Live Metrics or Failures)	Real-time telemetry
4	GitHub Actions — this repo	Trigger workflows

Part 1: Set the scene

⏱️ ~3 minutes

Show the app

Switch to the webstore tab
Browse a few products, add one to the cart
Complete a checkout — show the success confirmation

Speaker notes

"This is a real e-commerce app running on Azure Container Apps. It's a Next.js storefront backed by PostgreSQL, fully instrumented with OpenTelemetry. Every request, every dependency call, every exception flows to Application Insights."

Show the telemetry

Switch to Application Insights → Live Metrics (or the Failures blade)
Point out healthy request rates, zero failures

Speaker notes

"Right now everything is green. The SRE Agent is quietly monitoring this environment — it has access to these metrics, the logs, and the source code repo on GitHub."

Show the SRE Agent

Switch to sre.azure.com
Show the agent dashboard — connected resources, connected code repo
Optionally ask: "What Azure resources can you see in <YOUR_DEMO_RESOURCE_GROUP>?"

Speaker notes

"This is Azure SRE Agent. It's connected to our Application Insights, our resource group, and the GitHub repo with the webstore source code. It's running in Review mode — it'll investigate autonomously but ask before taking action."

Part 2: Break it

⏱️ ~2 minutes

Trigger the failure

Switch to GitHub Actions
Navigate to "Demo: Break Checkout"
Click Run workflow → select your configured GitHub environment → click Run workflow

Speaker notes

"I'm simulating a bad deployment. All this does is flip one environment variable — DEMO_BROKEN_CHECKOUT=true. The checkout API will start returning 503 with a simulated timeout, while the rest of the site stays perfectly healthy."

Confirm the break

Wait for the workflow to complete (~1 min)
Switch to the storefront — try to check out, show the error
Switch to App Insights — watch the 503s appear

Speaker notes

"Checkout is down. Products still load, the cart works, but placing an order gives a 503. If this were a simple health check, it would still say 'healthy.' But the SRE Agent is about to tell us what is wrong and how to fix it."

Part 3: Watch the agent

⏱️ ~5–10 minutes

The investigation

Switch to the SRE Agent tab
The agent may start investigating automatically (if you've set up an incident response plan), or prompt it:

"I'm seeing checkout failures on the webstore Container App in <YOUR_DEMO_RESOURCE_GROUP>. Can you investigate?"

Watch the reasoning chain:
- Queries App Insights for recent exceptions and failed requests
- Notices the spike in 503s on POST /api/orders
- Examines span attributes and exception details
- Correlates with recent environment variable changes
- Traces to source code — identifies the DEMO_BROKEN_CHECKOUT flag

Speaker notes

"Watch what's happening. The agent is querying App Insights, looking at traces, and forming hypotheses. It's not following a runbook — it's reasoning about what's different. And because it has access to the GitHub repo, it can trace the telemetry all the way back to the line of code."

The recommendation

The agent proposes a remediation — resetting DEMO_BROKEN_CHECKOUT to false
In Review mode, it shows Approve / Deny

Speaker notes

"The agent found the root cause and is proposing a specific fix. In Review mode, I approve or deny. In Autonomous mode, it would have already done this — checkout restored before anyone noticed."

Approve (or manual reset)

Approve the agent's proposal if actionable, or:

gh workflow run "Demo: Reset Checkout" -f environment=<YOUR_GITHUB_ENVIRONMENT>

Part 4: Recovery

⏱️ ~2 minutes

Wait ~30 s after the env var flips
Switch to storefront — complete a checkout, show success
Switch to App Insights — 503s stop, 201s resume

Speaker notes

"We're back. Zero downtime for browsing, and checkout was restored in under two minutes. The agent captured the full investigation in its memory — next time it sees this pattern, it'll resolve it even faster."

Key talking points

Use these throughout the demo or in Q&A:

"How is this different from just an alert?"

An alert tells you something is wrong. The SRE Agent tells you what is wrong, why it happened, and how to fix it. It correlates across metrics, logs, traces, deployments, and source code — the same investigation that takes an on-call engineer 15–30 minutes happens in seconds.

"What if I don't trust it to act?"

Start in Review mode. The agent investigates autonomously but proposes actions for your approval. Once you see patterns you're always approving, switch those to Autonomous.

"Does it only work with Container Apps?"

No — it works with any Azure resource accessible via ARM. App Services, AKS, VMs, Functions, databases, networking. Plus external tools via connectors and MCP.

"Does it remember past incidents?"

Yes. Every investigation gets captured in persistent memory. Institutional knowledge compounds instead of walking out the door.

"How do I get started?"

Go to sre.azure.com, sign in, and the wizard walks you through creating an agent in about 5 minutes. Connect your Azure resources and a code repo, and you're up and running.

Timing guide

Section	Duration	Notes
Set the scene	3 min	Show app, telemetry, agent
Break it	2 min	Trigger workflow, confirm 503
Agent investigates	5–10 min	Varies based on agent response time
Recovery	2 min	Show restoration
Q&A	5 min	Use the talking points above
Total	15–20 min	Compress to 10 min by prompting the agent directly

Cleanup

After the demo, make sure checkout is restored:

gh workflow run "Demo: Reset Checkout" -f environment=<YOUR_GITHUB_ENVIRONMENT>

Before the demo​

Infrastructure health​

Verify checkout works​

SRE Agent​

Browser tabs (pre-open)​

Part 1: Set the scene​

Show the app​

Show the telemetry​

Show the SRE Agent​

Part 2: Break it​

Trigger the failure​

Confirm the break​

Part 3: Watch the agent​

The investigation​

The recommendation​

Approve (or manual reset)​

Part 4: Recovery​

Key talking points​

"How is this different from just an alert?"​

"What if I don't trust it to act?"​

"Does it only work with Container Apps?"​

"Does it remember past incidents?"​

"How do I get started?"​

Timing guide​

Cleanup​

Before the demo

Infrastructure health

Verify checkout works

SRE Agent

Browser tabs (pre-open)

Part 1: Set the scene

Show the app

Show the telemetry

Show the SRE Agent

Part 2: Break it

Trigger the failure

Confirm the break

Part 3: Watch the agent

The investigation

The recommendation

Approve (or manual reset)

Part 4: Recovery

Key talking points

"How is this different from just an alert?"

"What if I don't trust it to act?"

"Does it only work with Container Apps?"

"Does it remember past incidents?"

"How do I get started?"

Timing guide

Cleanup