Let’s talk strategy, scalability, partnerships, and the future of autonomous systems.
Engineering today carries a strange contradiction: You can spin up entire services in hours using AI, yet understanding what went wrong with those services still demands painstaking work across fragmented tools. Take the following example:
Coding | Production |
---|---|
To write a service: You open an AI-native development environment and ask AI to "Create a payment service that handles retries and timeouts". AI generates implementation with error handling, using context of your code base | When the same service is experiencing high latency: You start with a hypothesis → Check Datadog for metrics → Switch to Loki for logs → Cross-reference deployment history → Correlate timestamps →... And so on |
The problem isn't AI’s capability; it's how we architect AI systems. Most engineering teams still treat AI to execute the same workflows faster, not reimagining how we do things.
At Resolve AI, we've been building multi-agent systems for engineers to work on production systems. We’ve been advocating that engineering should be AI-native (where engineers primarily interface with AI agents to work on production systems), while most industry conversations were centered on writing AI generated code with copilots and coding assistants.
We recently presented our approach to Stanford's graduate AI program, diving deep into the AI agents and their architectural patterns that enable AI-native engineering workflows.
AI-Native Engineering is where engineers primarily interface with AI to orchestrate their work: be it writing code or working on production systems. AI-native is a significant departure from just “using AI” where engineers are still interfacing with their systems and tools, but using AI to speed up individual steps of the process**.** Here is an example workflow to showcase the distinction.
AI-Assisted: You use AI tools to work faster on tasks. The workflow remains human-centric: Engineer → Systems and tools → Correlation → Action. Engineers still interface with tools, just using AI to perform individual tasks faster
AI-Native: AI becomes your primary interface for production work. The workflow becomes AI-led: Engineer → Natural Language Request → AI System → Response / Action. Engineers set goals and let AI agents handle the operational work
Take incident response as an example. In AI-assisted workflows, you're still generating hypotheses, deciding which evidence matters, and manually correlating signals across tools. AI helps with data retrieval and analysis, but you're doing the heavy lifting in investigation.
AI-native incident response operates differently: AI agents automatically triage investigation priorities, generate competing hypotheses in parallel, and iteratively refine theories based on cross-system evidence. Instead of asking "Can you analyze these logs?" you say "Resolve this checkout failure" and agents coordinate the entire investigation.
This isn't just faster. It changes what problems deserve engineering attention. When AI agents handle log analysis, metric correlation, and deployment timeline reconstruction, engineers focus on architectural decisions and system design rather than tactical investigation.
The shift requires persistent AI agents, not just AI tools. While LLMs can accelerate individual tasks, only stateful agents can maintain investigation context, coordinate across multiple tools, and execute complex multi-step workflows autonomously.
Modern production systems exhibit what academics call "irreducible interdependence": understanding them requires specialized knowledge across domains that cannot be unified into a single coherent model. This is the insight most builders miss: No single AI tool can maintain expert-level knowledge across all these domains while coordinating an investigation.
For example: When API latency spikes 10x during a critical incident, the investigation requires simultaneous specialized analysis: correlating traces across 50+ microservices, analyzing slow database queries and connection pool exhaustion, checking recent deployments and infrastructure changes, scanning auth logs for security anomalies, evaluating auto-scaling decisions against current load patterns, and analyzing support tickets for customer impact with SLA context. Each activity requires domain-specific expertise and contextual data that no single system could effectively maintain.
As system complexity increases, individual AI tools face exponential growth in context requirements. This is where multi-agent systems can scale by combining coordination and individual domain specialization. This matrix provides an overview for engineering leaders. Find the row that matches your current state to understand its limitations:
This framework reveals the technical limitations at each level:
Approach | What it is | Where the approach breaks | Cause for limitation |
---|---|---|---|
LLM | Use LLMs for individual tasks like explanations, analysis, and documentation | Engineers still do majority of the operational work | Single-pass generation with no feedback loops or real-world integration |
LLM + Tools | AI can fetch data from monitoring systems on command | Cognitive load of correlation remains on humans | Limited context windows, no persistent state management across tool interactions |
Single Agent | AI follows investigation workflows independently | Sequential investigation, gets stuck on wrong hypotheses | Cannot manage diverse reasoning strategies or parallel investigation paths |
Multi-Agent | Specialized AI agents coordinate parallel investigations | Requires investment in coordination protocols | Distributed intelligence needs formal communication schemas and conflict resolution |
The progression reveals a fundamental architectural truth: Each level hits a different scalability ceiling. LLMs lack a persistent state. Tool-augmented LLMs can't maintain investigation context across multiple chats. Single agents become decision bottlenecks as system complexity grows. Only multi-agent systems can break through the sequential reasoning constraint that limits all previous approaches. They enable parallel hypothesis testing while single agents must investigate sequentially, making them fundamentally unsuitable for the temporal demands of production incidents.
Building production-ready multi-agent systems requires a rare combination of deep domain expertise and AI engineering prowess. Most attempts fail because teams have expertise in one area but not both. Here’s why this dual expertise is needed:
At Resolve AI, our team includes engineers with over two decades of experience in production systems, founders who co-created OpenTelemetry and researchers with Deep AI expertise who are the minds behind Google DeepResearch and Gemini Agents. This combination lets us build systems that don't just understand that "payment failures are bad" but know to check connection pool metrics and correlate with upstream service degradation. All the while managing complex agent orchestration that prevents circular investigations and maintains coherent narrative threads across parallel execution paths.
Resolve AI is your always-on AI SRE that helps you resolve incidents and run production. With Resolve AI, customers like DataStax, Tubi, and Rappi, have increased engineering velocity and systems reliability by putting machines on-call for humans and letting engineers just code. Learn more about AI-native engineering workflows at resolve.ai.
Spiros Xanthos
Founder and CEO
Spiros is the Founder and CEO of Resolve AI. He loves learning from customers and building. He helped create OpenTelemetry and started Log Insight (acquired by VMware) and Omnition (acquired by Splunk), most recently he was an SVP and the GM of the Observability business at Splunk.
Gabor Angeli
Research Engineer
Gabor Angeli brings extensive AI expertise, most recently at Google DeepMind and Square. His work on products like Gemini and Square Assistant touches millions of users daily. He joined Resolve AI to build Agentic AI systems that help engineers understand and navigate production systems.
Bharat Khandelwal
Research Engineer
@ Resolve AI
Bharat is a Research Engineer at Resolve AI, where he builds agentic systems that enable large language models to debug and operate production software infrastructure. Prior to Resolve, he led machine learning initiatives at WorldQuant, designing transformer-based architectures for macroeconomic forecasting and incorporating LLM-driven sentiment signals from unstructured data. He has also worked at Moveworks on enterprise NLP systems and at Tower Research Capital, where he developed low-latency ML strategies for high-frequency trading. Bharat holds an M.S. in Computer Science from Stanford University, with a specialization in Artificial Intelligence, and a B.Tech. (Honors) in Computer Science from IIT Bombay.
AI generates code in seconds, but debugging production takes hours. Learn how conversational AI debugging can match the speed of modern code generation. And what role do logs play in it?
Vibe debugging is the process of using AI agents to investigate any software issue, from understanding code to troubleshooting the daily incidents that disrupt your flow. In a natural language conversation, the agent translates your intent (whether a vague question or a specific hypothesis) into the necessary tool calls, analyzes the resulting data, and delivers a synthesized answer.
Software runs the world. But when it breaks, business slows. Deals stall. Customers churn. Teams lose momentum. With AI code generation accelerating how fast software is shipped, companies need Resolve now more than ever. That is why I joined Resolve AI as VP of Worldwide Sales. I am excited to partner with the most strategic customers in the world to keep their software reliable and free up their engineers to focus on innovation instead of war rooms.