Launching Resolve AI Labs backed by new $40M Series A Extension

Incidents agent

Co-work with agents to resolve incidents.

Agent Teams investigate incidents in parallel. Engineers steer and remediate through Workbench.

01Agent teams

Specialized agents for harder investigations.

A team of domain-specialized agents investigates in parallel and verifies findings against production evidence, the way a war room of senior engineers would.

Lead
Triager
Investigator
Verifier
Mitigator
+2
more
Root cause: schema drift on pgdb-orders-instance-3 — migrations 0335 and 0337 were silently skipped weeks ago.
07:30
Migration 0335 (event_outcome column) was merged 2026-04-24, but Drizzle's migrator skipped it due to timestamp ordering (PR #27504).
07:30
Confirmed via traces and logs showing column "event_outcome" does not exist and relation "order_doc_state" does not exist errors across 5 services.
07:31
02Workbench

Work alongside your agents in Workbench.

Interrogate every finding, evidence, or theory by just interacting with the report.

What happened
Root cause
Key evidence
Recommendations
Suggested fix

What happened

Alert fired at Wed May 6 1:19am for PostgreSQL High Rollback Rate on pgdb-orders-instance-3 (database: orders, cluster: orders-db-cluster). Rollback ratio was ~2.2% at alert time, with peaks up to 4.3%. Alert cleared at Wed May 6 1:29am (~10 minutes after firing).

Chronic schema drift on orders-db-cluster — missing event_outcome column and order_doc_state table. The order_doc_state escalation was the acute trigger that pushed rollback ratio above threshold.

Errors stopped at ~Wed May 6 7:33am, indicating remediation (migrations) was applied.

Root cause

Missing database migrations on orders-db-cluster

Drizzle migrations 0335_event_outcome_column.sql and 0337_quick_argent.sql were never applied to the orders-db-cluster orders database. Application code deployed 8–13 days earlier references schema objects that don't exist, causing every transaction touching them to roll back.

Causal chain:

AP
Alex Park1:21 AM
What caused the rollback ratio to peak at 4.3%?
Resolve
The peak rollback ratio of 4.3% occurred at approximately 08:45–09:12Z, after the alert fired, due to the commit rate collapsing from 16.74 tx/s (peak at 05:00Z) to just 0.74 tx/s while the background rollback rate from schema errors remained constant at ~0.017 tx/s. This created a 22× amplification effect — the same absolute number of rollbacks divided by far fewer commits pushed the ratio well above the 2% threshold.
AP
Alex Park1:22 AM
@Sam Patel Can you take a look at this?
Ask Resolve or @mention a teammate
03Threads

Investigate with your team, agents, or on your own.

Pursue parallel hypotheses, redirect agents, and add context as new evidence emerges.

PostgreSQL deadlocks on orders database — PostgreSQL has detected deadlo…Fired: Thu October 23, 2025 4:55pm
AP
All Threads

Main Investigation

Follow the agent's reasoning and steer it directly

Search for 5xx HTTP errors during the deadlock window

AP

Verify PostgreSQL deadlock issue resolution status

SP

What triggered the scaling event on October 23?

MT
AskSteer
Ask anything to start a thread…
Private threadPublic
What happenedImpactRoot causeKey evidenceSuggested fix

What happened

PostgreSQL deadlock alert fired on Thu October 23, 2025 4:55pm for the orders database on pgdb-orders-instance-2. The order-events-ingest service's OrderReconciler experienced 25 deadlocks over 14 minutes when attempting to batch update order events with resolution timestamps.

Impact

  • 100% error rate on updateOrderEvent API for 2 minutes
  • 456-second latency spike on getOrderEventCounts API
  • 68% error rate on spanRequest API
  • 12+ downstream services affected including investigation system and analytics APIs
  • Self-resolved when load normalized; no data loss

Root cause

Confirmed (HIGH confidence): Lack of deterministic lock ordering in batchUpdateOrderEvents() method. The SQL VALUES clause processes order IDs in arbitrary iteration order, allowing concurrent requests to acquire row locks in different sequences. A scaling event deployed ~27 new pods (vs baseline 1–2), dramatically increasing concurrency and triggering circular lock dependencies.

04Root Cause

Gets you to verified root cause.

Every finding is backed by production evidence for you to verify or explore further.

Root cause

Missing schema migrations on orders-db causing transaction rollbacks

Two schema migrations — adding the event_outcome column and the order_doc_state table — were merged but never applied to orders-db. The migration runner is invoked manually rather than through the deploy pipeline, so the cluster was silently overlooked. When a morning deployment triggered a traffic surge, transactions hit the missing schema and the rollback ratio crossed the 2% alert threshold. Applying both migrations cleared the errors.

Migration introducing event_outcome column never applied to orders-db
Migration introducing order_doc_state table never applied to orders-db
Morning deployment triggered restart of all platform services
Rollback ratio crossed 2% alert threshold during transaction surge
Migrations applied — both error patterns stopped simultaneously
Manual migration runner not executed for orders-db after dependent code shipped
Both error patterns stopped within minutes of applying migrations
Migrations present in codebase but never applied to production
Alert threshold sensitive to commit volume — same error rate produced variable ratios
Batch job created 20× transaction surge against missing schema
5 services affected: order-fulfillment, checkout-router, inventory-sync, catalog-service, payments-api
Alert cleared ~10 minutes after firing — likely transient threshold crossing
05Mitigation Actions

Remediate from the same surface.

Trigger commit reverts, GitHub Actions, and alert silencing without leaving the context or interface.

Action by Resolve · 2 min ago

Revert: disable checkout-v2-routing in production

resolve-ai/revert-checkout-v2-routing-eepuh

Revert recent enablement of enableCheckoutV2Routing, identified as the trigger for elevated p95 latency on checkout-router. Restores the previous routing path.

This change:

  • Sets enableCheckoutV2Routing: false in helm/values/production/values.yaml
1 file+1−1
helm/values/production/values.yaml
−1+1View Source
@@ −86,7 +86,7 @@
8686 enableNewCartFlow: true
8787 enableShipFromStore: true
8888 enableDeferredAuthCapture: false
89 enableCheckoutV2Routing: true
89+ enableCheckoutV2Routing: false
9090 enableMerchantPortalV3: false
9191 enableLoyaltyTierMigration: false
9292 enablePartialRefundsV2: true

Used and loved by engineers

Removing the toil of investigations, war rooms, and on-call.

We pull fewer engineers into war rooms, on-call is materially better, and that translates directly to advertiser trust and revenue protection.

Shahrooz Ansari

Shahrooz Ansari

Sr. Director of Engineering, DoorDash

I don't need more numbers or more data. What I need is a root cause.

Chris Umbel

Chris Umbel

AIOps Lead & SRE, Zscaler

Resolve AI proved it could deliver real results in a constrained environment. It identified dependencies, surfaced accurate root causes 73% faster than our teams, all while integrating cleanly into our existing stack.

Angelo Marletta

Angelo Marletta

Staff Software Engineer, Coinbase

Resolve AI makes our junior on-call engineers as effective as our seniors, flattening the experience curve. We've seen a 2x productivity lift while eliminating the runbook gap.

A.D.

A.D.

Sr. Director of Engineering, Financial Services Company

We pull fewer engineers into war rooms, on-call is materially better, and that translates directly to advertiser trust and revenue protection.

Shahrooz Ansari

Shahrooz Ansari

Sr. Director of Engineering, DoorDash

I don't need more numbers or more data. What I need is a root cause.

Chris Umbel

Chris Umbel

AIOps Lead & SRE, Zscaler

Resolve AI proved it could deliver real results in a constrained environment. It identified dependencies, surfaced accurate root causes 73% faster than our teams, all while integrating cleanly into our existing stack.

Angelo Marletta

Angelo Marletta

Staff Software Engineer, Coinbase

Resolve AI makes our junior on-call engineers as effective as our seniors, flattening the experience curve. We've seen a 2x productivity lift while eliminating the runbook gap.

A.D.

A.D.

Sr. Director of Engineering, Financial Services Company

Recent updates

Shipping every week.

See all updates
  • May 2026

    Agent Teams

    Specialized agents investigating in parallel with verified findings.

  • May 2026

    Workbench

    Shared workspace with real-time visibility and engineer steering.

  • May 2026

    Closed-loop actions

    Commit reverts, GitHub Actions, and alert silencing from investigation findings.

  • April 2026

    Adaptive knowledge

    Every investigation makes the platform smarter.

Frequently asked questions

See incident investigation in your environment.