How Zscaler Engineers Get to RCA in Minutes
How Zscaler Engineers Get to RCA in Minutes
Chris Umbel runs site reliability for Zscaler, a global zero-trust security cloud. His team handles 150,000 alerts and 120 incidents monthly - often pulling 30+ engineers per bridge.
The problem wasn't dashboards or data. It was getting to root cause fast enough to matter.
If you're drowning in alerts and war rooms keep growing, this is for you. Chris walks through what worked, what didn't, and the specific changes that freed up real engineering capacity.
Reduced resolution from 60 minutes to 15 minutes using AI-driven root cause analysis
Reduced toil pulled into each bridge by 30%+, freeing up real capacity for proactive work
Build confidence before relying on it for real incident triage and remediation