Announcing our $125M Series A at a $1B valuation

AI is changing software engineering faster than any previous technology shift.
Models and coding agents have made writing code effortless. But software engineering is much more than writing code. It’s also about operating production systems reliably and efficiently. In reality, operating and debugging production is where most engineering time actually goes.
AI is helping us write more code than ever. But that code still has to be deployed, monitored, and debugged in production. We're scaling code generation exponentially, without scaling our ability to operate it.
That’s why we built Resolve AI: AI for prod that works across code, infrastructure, telemetry, and knowledge. Our goal is to help every engineer operate production fluently, without being bottlenecked by context, expertise, or tools.
Production is an amalgamation of hundreds of tools and systems that have accumulated over generations of technology. Each component is built in isolation, optimizing for its own domain without integrating with the rest. But production workflows require reasoning across multiple of them, and this burden has always fallen on humans.
As an example, when a service goes down, engineers need to understand which of the hundreds of microservices is failing, trace dependencies across unknown systems, correlate metrics from multiple monitoring tools, check if a recent code deployment changed behavior, determine if it's an infrastructure problem, and coordinate with teams across databases, networking, and security. All of this occurs while customers are impacted and the business is losing money.
The cost of such failures is evidently high. And this complexity isn’t limited to incidents and on-call, it spans across workflows like change deployments, day-to-day debugging, cost management, and security.
Few individuals can understand and reason about the whole system, if anyone could at all. For the longest time, manual and human-intensive operations have been the painful reality of production systems.
This no longer has to be the case. Managing production is finally automatable end-to-end with AI. Foundation models can now reason through complex problems and orchestrate long-horizon agents across external tool calls, giving us a baseline intelligence layer that improves every month.
Foundation models provide the substrate, but automating production workflows requires building substantial engineering layers on top that must:
A product that incorporates these requirements becomes a data flywheel, where each solved incident makes the system better at solving the next one. Such a system continually improves to the point where it can take fully autonomous actions, all the way to remediation.
We built AI for Prod as the manifestation of this opportunity. Our architecture has been derived from first principles to solve the production problem in its entirety.
Here’s how it works:
Understands production and operates all your tools
Resolve AI connects to your entire production stack: code, infrastructure, telemetry, knowledge sources, and a long tail of in-house and MCP tools. It operates each tool with expert-level proficiency, formulating precise queries that respect rate limits, interpreting results efficiently, and managing context without overwhelming the model.
Absorbs context and continually learns
Resolve AI gathers knowledge and context from undocumented sources: runbooks, documentation, past incidents, and Slack threads. It maps both the infrastructure dependencies and the teams that own them. Simultaneously, Resolve AI is a collaborative tool that a human can always use to get the right answer (and remediation) faster than doing it by hand. So, when an engineer corrects its reasoning (for example: "no, check the Redis cluster first, this pattern usually means cache invalidation"), that correction becomes part of the agent's knowledge. Over time, Resolve AI becomes the single and up-to-date source of truth for all production-related context.
Reasons with investigative expertise across boundaries
Resolve AI brings deep investigative expertise about how to debug and operate production systems together with the specific expertise of multiple teams in every execution. During investigations, the Resolve AI planner creates a plan that spans multiple systems and teams. It spawns agents that pursue several hypotheses in parallel, the way your team would if they were all available and already had context. As evidence emerges, the planner continuously refines the plan: ruling out hypotheses that don't fit, doubling down on those that do. The investigation converges through iteration until it reaches the root cause, meeting the planner’s bar for causality, citation grounding, evidence, and more.
Resolve AI is already trusted in the most demanding production environments by world-class engineering organizations like Coinbase, Zscaler, and DoorDash.
The broader goal is shifting left: helping engineers debug, optimize, and secure production systems before issues ever reach customers.
We're building a world-class research organization: an applied AI lab focused on production systems. With this team, there are a few ideas central to our technical worldview and how Resolve AI will solve this problem completely:
AI code generation will produce software at unprecedented scale. This means the bottleneck shifts entirely. Success will be determined by whether you can operate what you build with the same speed, reliability, and security.
We're building the system that makes that possible.
If you want to work on the frontier of agentic AI and production systems, we'd love to talk.

Discover why most AI approaches like LLMs or individual AI agents fail in complex production environments and how multi-agent systems enable truly AI-native engineering. Learn the architectural patterns from our Stanford presentation that help engineering teams shift from AI-assisted to AI-native workflows.

Hear AI strategies and approaches from engineering leaders at FinServ companies including Affirm, MSCI, and SoFi.

Software engineering has embraced code generation, but the real bottleneck is production. Downtime, degradations, and war rooms drain velocity and cost millions. This blog explains why an AI SRE is the critical next step, how it flips the script on reliability, and why it must be part of your AI strategy now.