Learn how Zscaler uses AI for prod to get to RCA for 150K alerts in minutes

Thursday, February 26, 2026 | 7:00 PM UTC (45 min)

How Zscaler Engineers Get to RCA in Minutes

Chris Umbel runs site reliability for Zscaler, a global zero-trust security cloud. His team handles 150,000 alerts and 120 incidents monthly - often pulling 30+ engineers per bridge.

The problem wasn't dashboards or data. It was getting to root cause fast enough to matter.

If you're drowning in alerts and war rooms keep growing, this is for you. Chris walks through what worked, what didn't, and the specific changes that freed up real engineering capacity.

Featuring

Chris Umbel

Chris Umbel

Sr. Principal SRE at Zscaler

Josh Grose

Josh Grose

Founding Team Member

What you'll learn

  • Faster incident resolution: How Zscaler reduced mid-severity resolution from 60 minutes to 15 minutes using AI-driven root cause analysis.
  • Leaner war rooms: Strategies that cut the number of engineers pulled into each bridge by 30%+, freeing up real capacity for proactive work.
  • Trusting AI in production: How the team validated AI performance before relying on it for real incident triage and remediation.