Making the Global Crypto Backbone More Resilient

About Coinbase
Coinbase is one of the world’s largest crypto platforms, powering trading, payments, and custody for customers across more than 100 countries. The company serves over 120 million users who trade more than 40,000 crypto assets. As of late 2025, Coinbase manages over $500 billion in assets on the platform and processes close to $300 billion in quarterly trading volume, supported by more than 4,700 employees globally.
Behind the scenes, this scale runs on a large microservices architecture on Kubernetes, backed by many specialized data stores and services. Reliability is business-critical, and incidents can have immediate financial and regulatory impact, so SREs and engineers need fast, trustworthy answers whenever something looks off in production.
On a crypto platform with billions in daily trading volume, even small pockets of downtime or increased latency can stall deposits and withdrawals, cause trades to miss price windows, and trigger spikes in support tickets. For Coinbase’s central engineering team, slow or noisy incident response means more revenue at risk in the moment and less time to invest in the shared platform that keeps dozens of product teams moving.
The Challenge: Growing complexity in high-stakes production systems
Before Resolve AI, on-call engineers had to manually stitch together context from deploy pipelines, Terraform events, Datadog, and custom dashboards to understand what changed and whether an alert was real.
This created friction across both incidents and day-to-day production work:
- During incidents, teams spent too much time answering basic questions like “what changed” and “is this tied to a deploy or infra change” before they could even start deeper debugging.
- When alerts were slow to triage, core flows like sign-in, funding, trading, and withdrawals could be degraded longer than necessary, which put trading volume, customer trust, and institutional SLAs at risk.
- For everyday work, tasks like checking SLO health, validating load tests, and ruling out false alarms meant jumping between tools or waiting on someone who knew a particular service well.
- As the environment grew more complex, no single engineer had a complete mental model of the system, and tribal knowledge became a bottleneck for both incident response and shipping new features with confidence.
Coinbase needed an AI system that could understand their production environment and give engineers instant, accurate context, not a generic chatbot.
The solution: AI for prod to resolve incidents and build with product context
Coinbase adopted Resolve AI as both an AI SRE on the incident bridge and a way for engineers to code with production context. During incidents, Resolve AI is their always-on AI SRE, auto triaging and running investigations across infrastructure, deploy data, and telemetry, and answering “what changed” in real time. Outside of incidents, engineers use Resolve to bring live production signals into their day-to-day work so design, reviews, and code changes are grounded in how services actually behave in production, not just what static dashboards or docs suggest.
On the incident side:
- Resolve auto joins incident channels, runs structured investigations across Datadog, and correlates Coinbase specific custom events.
- It can see recent Terraform applies and code deployments using custom Datadog events, so it can quickly answer “what changed” without direct access to source code or runbooks.
- Resolve AI checks a dedicated load testing dashboard so it can flag when a spike is a scheduled load test instead of a real outage.
For everyday production work:
- Every day, engineers chat with Resolve AI to inquire about SLO health, generate daily “what breached” reports, and understand recent changes around a service.
- Engineers use Resolve AI directly in Slack as their interface to production, asking natural language questions about deploys, incidents, and reliability trends rather than hunting through dashboards.
All of this is currently done without exposing source code or proprietary runbooks, which keeps the security and governance model simple while still giving Resolve AI enough context to be useful. As code access comes online, Coinbase expects to expand what it automates with Resolve AI.
The impact: faster incident resolution and safer, more reliable releases
Resolve AI compresses Coinbase’s incident response and day-to-day production work. It cuts investigation time, brings median first response measured in seconds, and gets to likely root cause in minutes for critical incidents. That shortens the window where customers might see failed orders, delayed transfers, or slow trading experiences, while giving engineers the production context they need to ship changes with confidence.
Coinbase benefits with Resolve AI:
- 72% reduction in investigation time: incidents that previously required long manual triage now move from alert to informed action in minutes, not half an hour or more.
- Under 10 minutes to likely root cause: faster answers on what changed reduce the duration of degraded sign in, funding, trading, and withdrawal flows.
- On demand production context for shipping: more than 100 engineers use Resolve AI in over 250 weekly sessions to check SLOs, recent Terraform applies and deploys, and whether a spike is a real issue or a load test before rolling out changes.
- Fewer noisy alerts and escalations: engineers spend less time chasing false alarms and more time improving core platform capabilities for product teams.
- Safer platform changes from central engineering: new shared services and platform updates ship with a clearer view of real customer impact in production, which reduces the chance of shipping surprises that could slow trading, funding, or withdrawals.
The result is an AI layer that supports everyday delivery and operational decisions at Coinbase, not only emergency response, so reliability and velocity move in the same direction instead of competing.
As Coinbase continues to scale its global crypto platform, Resolve AI is becoming a core part of the production backbone, serving as its AI for prod and woven into every on-call rotation and reliability decision. By pairing always on incident investigations with the ability to design and ship changes using real production context, Coinbase is building a future where reliability and shipping velocity reinforce each other so the core flows of global crypto trading, funding, and custody stay fast and available as the platform grows.

Want to see why leading companies trust Resolve AI?
Learn how engineering teams are transforming software engineering with agentic AI.