Powering Uninterrupted Ads for DoorDash Advertisers

About DoorDash Ads
DoorDash (NASDAQ: DASH) is a technology company that connects consumers with their favorite local businesses in 30+ countries. As one of the world's leading local commerce platforms, DoorDash continuously pushes the limits to improve experiences for its customers, Dashers, and the global brands and local restaurants that rely on the platform.
The DoorDash Ads business reached a critical milestone in 2024. It crossed an annualized advertising revenue run rate of over $1 billion, becoming the fastest-growing retail media network in history. Today, over 150,000 advertisers, from global brands to local restaurants, use the platform to reach high-intent customers and drive measurable growth. System reliability directly impacts millions of dollars in revenue each day.
The Challenge for the Ads Team
Many of the advertisers DoorDash Ads serves are small businesses that depend on the platform's reliability for their success. When the ad system is degraded, it impacts both DoorDash revenue and advertiser trust.
DoorDash Ads is a system with feedback loops. When an ad drives an order, performance data feeds back into delivery and optimization. A change in one part of the system can quickly affect many others.
When an alert fires, the challenge is not noticing the problem; it is getting to mitigation fast enough. Engineers begin by jumping between graphs, dashboards, logs, traces, and code. Then the incident often takes a familiar turn: after several hops, it becomes clear the issue may be outside Ads entirely, in an upstream or downstream dependency. That is where response time increases, coordination becomes more difficult, and customer impact accumulates.
Complex incidents can pull multiple engineers into a bridge call, with an incident commander coordinating parallel investigation threads across a single Zoom channel. The problem was compounded by team structure: individual engineers had deep expertise in narrow parts of the Ads feedback loop, but no single engineer deeply understood the entire Ads flow, let alone services outside of Ads. When incidents crossed team boundaries, engineers were forced to investigate unfamiliar domains under time pressure. With constant deployment and frequent incidents, the cognitive load of investigating, communicating, and coordinating remediation contributed to burnout over time.
Why Resolve AI
DoorDash's philosophy of investing heavily in AI and ML across its platform has made it an early adopter of AI tools, including coding assistants, in its daily workflows. Now, DoorDash is partnering with Resolve AI to bring that same leverage to operating production systems and incident response.
DoorDash Ads evaluated building an internal incident response platform. They determined it was technically feasible but economically impractical. A production-ready solution would have required tens of dedicated engineers and continuous fine-tuning, capacity that could not be diverted from customer-impacting work. As a core growth driver for DoorDash, the opportunity cost was too high.
Resolve AI delivered the required capability without diverting engineering's focus. It accelerates investigations by automatically correlating signals across logs, metrics, traces, deployments, and dependencies as soon as alerts fire. When alerts originate outside of Ads, it surfaces evidence quickly so incidents are routed to the correct team on the first attempt. Built for distributed system complexity, it provides visibility across thousands of microservices where no single engineer has a complete view.
After passing comprehensive security reviews, DoorDash Ads granted Resolve AI production access to the Ads platform codebase, infrastructure, and telemetry, with selective access to engineering systems across DoorDash. This level of trust allows Resolve AI to reason across the production environment. Over fifty engineers across the Ads organization now engage with Resolve AI during active incidents. Many also use it as a "pocket tool," an on-call assistant that can investigate issues across the stack without switching between multiple tools. Engineers ask Resolve AI to surface recent deployments, locate unfamiliar metrics, or answer questions about systems outside their expertise. Production knowledge that once lived in tribal memory is now accessible in seconds.
During evaluation against an alternative solution, Resolve AI demonstrated a 2x higher accuracy in root cause identification.
Inside an incident: reducing the hops from alert to mitigation
When an alert triggers due to latency or error rate degradation, Resolve AI immediately assembles initial context from relevant signals so the on-call engineer does not start from scratch. In many cases, the biggest time sink is determining whether the issue is actually in Ads or caused by a dependency outside the stack. Resolve AI helps shorten that path by surfacing breadcrumbs early, so engineers can skip unnecessary escalations and route directly to the most likely source team.
As Alex Danilychev, Engineering Manager - Ads team, described it, "It's painful when you know there’s a problem impacting your customers but you don’t know where it’s coming from. We want to eliminate that. The goal is not a 'hole-in-one' every time. It's about consistently getting engineers to 'par' – landing close enough to the root cause that the remaining gap can be closed quickly, with less stress and fewer people pulled into the investigation."
Since deploying Resolve AI in mid-2025, DoorDash Ads has achieved measurable improvements across incident response:
- Up to 87% reduction in time to root cause: In one documented incident, Resolve AI identified the root cause within 15 minutes, but the on-call engineer pursued a different investigation path. Two hours later, the engineer returned to Resolve AI's initial finding, which proved correct. For critical incidents causing complete service outages, this 105-minute difference could represent up to $200,000 in potential revenue preservation.
- Fewer engineers pulled into incidents: Incidents that previously required upwards of four subject matter experts now resolve with fewer escalations due to Resolve AI's ability to surface which team or service is responsible upfront.
- Improved on-call experience: Engineers start incidents with structured findings and supporting evidence rather than from scratch, and can guide Resolve AI interactively when deeper investigation is needed.
- More capacity for feature development: Senior engineers spend more time building by getting immediate responses to routine questions about deployments, dependencies, and system behavior.
Inside Four More Incidents
| Production Debugging: A runtime configuration change quietly disabled a critical data publishing flow. No alarms went off. The system looked fine while silently dropping events. Resolve AI caught the configuration issue in one alerts channel. Engineers working in a different channel reached the same conclusion hours later. | Cross-Team Attribution: Circuit-breaker alerts triggered on a core service. The on-call team started investigating their own recent changes. Resolve AI traced the root cause to a configuration deployment by an entirely separate team and routed the incident correctly in minutes. |
| Hidden Root Cause: An engineer was paged about downstream client failures and investigated what appeared to be the obvious source. The real problem was Redis connection failures masked by caching behavior. Resolve AI identified the Redis issue while engineers chased surface symptoms. | 40 Minutes to 1 Minute: Engineers investigated an alert for 40 minutes before isolating it to a single misconfigured campaign. Resolve AI surfaced the same root cause in under a minute by correlating error patterns to the specific campaign ID. |
These four examples reflect a broader pattern. The Ads team set out to cut MTTR in half while reducing the on-call burden. With a large portion of engineers using Resolve AI during incidents, the team is making compounding progress toward that goal. Resolve AI handles correlation across logs, metrics, traces, code, and infrastructure, eliminating the need to jump between tools that lack context. This frees engineers to focus on building rather than firefighting. DoorDash continues to expand how teams use Resolve AI to maintain system reliability while supporting the high shipping velocity that drives the billion-dollar ads business forward.
Want to see why leading companies trust Resolve AI?
Learn how engineering teams are transforming software engineering with agentic AI.