Company

Introducing AI for prod

02/04/2026

6 min read

AI is changing software engineering faster than any previous technology shift.

Models and coding agents have made writing code effortless. But software engineering is much more than writing code. It’s also about operating production systems reliably and efficiently. In reality, operating and debugging production is where most engineering time actually goes.

AI is helping us write more code than ever. But that code still has to be deployed, monitored, and debugged in production. We're scaling code generation exponentially, without scaling our ability to operate it.

That’s why we built Resolve AI: AI for prod that works across code, infrastructure, telemetry, and knowledge. Our goal is to help every engineer operate production fluently, without being bottlenecked by context, expertise, or tools.

Production work remains an unsolved problem

Production is an amalgamation of hundreds of tools and systems that have accumulated over generations of technology. Each component is built in isolation, optimizing for its own domain without integrating with the rest. But production workflows require reasoning across multiple of them, and this burden has always fallen on humans.

As an example, when a service goes down, engineers need to understand which of the hundreds of microservices is failing, trace dependencies across unknown systems, correlate metrics from multiple monitoring tools, check if a recent code deployment changed behavior, determine if it's an infrastructure problem, and coordinate with teams across databases, networking, and security. All of this occurs while customers are impacted and the business is losing money.

The cost of such failures is evidently high. And this complexity isn’t limited to incidents and on-call, it spans across workflows like change deployments, day-to-day debugging, cost management, and security.

Few individuals can understand and reason about the whole system, if anyone could at all. For the longest time, manual and human-intensive operations have been the painful reality of production systems.

Why AI for prod? Why now?

This no longer has to be the case. Managing production is finally automatable end-to-end with AI. Foundation models can now reason through complex problems and orchestrate long-horizon agents across external tool calls, giving us a baseline intelligence layer that improves every month.

Foundation models provide the substrate, but automating production workflows requires building substantial engineering layers on top that must:

Integrate with the dozens of tools that teams use to run production
Continually learn and maintain organizational context scattered across tools and people, including user-validated past agent trajectories
Store and maintain up-to-date team-wide knowledge that has never been unified before
Codify deep investigative expertise in production workflows, such as reasoning about time and causality, backing conclusions with evidence, and learning how investigations should progress
Maintain enterprise-grade security that meets the bar for production access
Deliver a unified UX integrated into the workflow
Create a continually increasing set of evals that expands with more usage

A product that incorporates these requirements becomes a data flywheel, where each solved incident makes the system better at solving the next one. Such a system continually improves to the point where it can take fully autonomous actions, all the way to remediation.

Resolve AI: AI for prod

We built AI for Prod as the manifestation of this opportunity. Our architecture has been derived from first principles to solve the production problem in its entirety.

Mayank blog _ Introducing aI for prod.png

Here’s how it works:

Understands production and operates all your tools

Resolve AI connects to your entire production stack: code, infrastructure, telemetry, knowledge sources, and a long tail of in-house and MCP tools. It operates each tool with expert-level proficiency, formulating precise queries that respect rate limits, interpreting results efficiently, and managing context without overwhelming the model.

Absorbs context and continually learns

Resolve AI gathers knowledge and context from undocumented sources: runbooks, documentation, past incidents, and Slack threads. It maps both the infrastructure dependencies and the teams that own them. Simultaneously, Resolve AI is a collaborative tool that a human can always use to get the right answer (and remediation) faster than doing it by hand. So, when an engineer corrects its reasoning (for example: "no, check the Redis cluster first, this pattern usually means cache invalidation"), that correction becomes part of the agent's knowledge. Over time, Resolve AI becomes the single and up-to-date source of truth for all production-related context.

Reasons with investigative expertise across boundaries

Resolve AI brings deep investigative expertise about how to debug and operate production systems together with the specific expertise of multiple teams in every execution. During investigations, the Resolve AI planner creates a plan that spans multiple systems and teams. It spawns agents that pursue several hypotheses in parallel, the way your team would if they were all available and already had context. As evidence emerges, the planner continuously refines the plan: ruling out hypotheses that don't fit, doubling down on those that do. The investigation converges through iteration until it reaches the root cause, meeting the planner’s bar for causality, citation grounding, evidence, and more.

What comes next?

Resolve AI is already trusted in the most demanding production environments by world-class engineering organizations like Coinbase, Zscaler, and DoorDash.

The broader goal is shifting left: helping engineers debug, optimize, and secure production systems before issues ever reach customers.

We're building a world-class research organization: an applied AI lab focused on production systems. With this team, there are a few ideas central to our technical worldview and how Resolve AI will solve this problem completely:

We're going to invest heavily in building the best agents and models for this domain. The models will continue to get better, and this will accelerate everything we're building.
We're pursuing closed-loop automation that will further accelerate the flywheel. We're focusing on production operations end-to-end: root cause analysis, remediation, prevention, and containment.
We’re going to productize workflows to help make production operations proactive
We're building a world-class research organization and advancing Resolve AI as a world class Applied AI lab for prod.

AI code generation will produce software at unprecedented scale. This means the bottleneck shifts entirely. Success will be determined by whether you can operate what you build with the same speed, reliability, and security.

We're building the system that makes that possible.

If you want to work on the frontier of agentic AI and production systems, we'd love to talk.

[Join the team →]

Mayank Agarwal

Founder and CTO

Technology

The role of multi agent systems in making software engineers AI-native

Discover why most AI approaches like LLMs or individual AI agents fail in complex production environments and how multi-agent systems enable truly AI-native engineering. Learn the architectural patterns from our Stanford presentation that help engineering teams shift from AI-assisted to AI-native workflows.

Fireside Chat: How FinServ Companies Optimize Cost with AI for Prod

Hear AI strategies and approaches from engineering leaders at FinServ companies including Affirm, MSCI, and SoFi.

Technology

AI SRE: The Next Critical Application of AI in Software Engineering

Software engineering has embraced code generation, but the real bottleneck is production. Downtime, degradations, and war rooms drain velocity and cost millions. This blog explains why an AI SRE is the critical next step, how it flips the script on reliability, and why it must be part of your AI strategy now.

Social

Machines on call for humans

Join the conversation