Technology

Claude Sonnet 4.6: Testing adaptive thinking on AI agents for prod

02/17/2026

5 min read

Claude Sonnet 4.6: Testing adaptive thinking on AI agents for prod

In our previous post, we shared our early impressions of Claude Opus 4.6’s strengths in agent coordination, thoroughness, and long-context attention. In this version, we focus on Sonnet 4.6, the effort parameter, and what adaptive thinking means in practice for production agents.

Both Opus 4.6 and the newly released Sonnet 4.6 share these features. We benchmark these models against a curated set of real production incidents (from subtle misconfigurations to cascading failures), scoring on root cause accuracy and investigation completeness. Sonnet 4.6 at medium effort with adaptive thinking came surprisingly close to Opus 4.6 on our hardest investigations, at a fraction of the cost. Adaptive thinking eliminated the need for manually calibrating reasoning depth, and the effort parameter gave us a reliable lever for the quality-latency tradeoff.

Adaptive thinking

Previous Claude models (Sonnet 4.5, Opus 4.5) offered binary extended thinking: off, or on with a fixed token budget. The 4.6 generation introduces adaptive thinking: the model decides for itself when and how deeply to reason.

This matters because production incidents are unpredictable. A cascading failure across three services might look routine at first, with an obvious spike in error rates. But under specific conditions, it could be a novel incident that requires deep investigation. Fixed thinking budgets can't handle this well. You either over-allocate reasoning on routine steps and burn latency, or under-allocate on the hard parts and miss the root cause. Adaptive thinking handles the transition naturally.

In practice, the model spends less time thinking early in an investigation, gathering evidence, pulling logs, and querying metrics. It dispatches tool calls and moves through evidence collection without over-deliberating.

As the investigation deepens and the agent starts correlating evidence, the behavior shifts. When it needs to cross-reference timestamps across multiple signals, evaluate whether evidence supports or contradicts a hypothesis, or reason through causal chains between services, the model thinks significantly more. It self-reflects on evidence relevance and pays close attention to temporal ordering.

Adaptive thinking naturally allocates deeper reasoning to the hard parts (correlation vs causation, determining next investigation steps) and stays light on the routine parts.

Working with Claude’s 4.6 generation in your agent architecture

Always set a high max output token limit, at least 16k. Thinking and output tokens share the same budget. With lower limits, the model hits the ceiling mid-reasoning and cuts off abruptly, no graceful degradation. We default to 32k max_tokens and tune this down only for simpler subagent tasks.

Set effort explicitly. The effort parameter controls how much the model explores before committing. Sonnet 4.6 defaults to high effort. If you're migrating from Sonnet 4.5 and not setting effort, you'll see higher latency and may notice the model overthinking. Start with effort set to medium and adjust from there.

Write precise tool descriptions. The 4.6 models select tools based on what they say they do, not just surrounding context. We found that precision in tool names and parameter descriptions directly impacts tool selection accuracy.

The model is more proactive, so tune your prompts accordingly. Instructions like "be thorough" or "think carefully" which were common workarounds for Sonnet 4.5 amplify the model's already-proactive behavior on 4.6 and can cause overthinking loops. The effort parameter is a better lever for controlling depth.

Sonnet 4.6 closes the gap with Opus on performance

With Sonnet 4.6 we observed ~10% improvement over Opus 4.5 with thinking disabled, and ~20% with high thinking on our investigation eval suite. Against Sonnet 4.5, the jump is even larger.

The tradeoff is latency.

Without thinking, Sonnet 4.6 is about 20% slower than Opus 4.5 on our dataset, making 3-4 more tool calls on average per investigation.
With high thinking, it's roughly 40% slower with ~5 additional tool calls on average. The model explores more, gathers more context, and reasons more carefully before committing to conclusions.

What we're exploring next

Every new model generation shifts what's possible for AI agents in production. Capabilities like adaptive thinking don't just improve results, they open up new architectural patterns we hadn't considered before. Evaluating these frontier models is a continuous effort and part of how we build. We're actively researching how these advances reshape agent design.

This is the kind of work that sits at the intersection of frontier AI research and real-world systems engineering. If this is the kind of problem you want to work on, we're hiring. And if you're building agents for production or thinking about how frontier models fit into your engineering workflows, we'd love to talk.

AI for prod ebook

Learn how top engineering teams use AI to run production.

Download

AI for prod ebook

Learn how top engineering teams use AI to run production.

Download

Murali Balusu

Member of Technical Staff

Member of Technical Staff at Resolve AI - ex-Meta, Google, and HalloApp

Vasanth Balakrishnan

Member of Technical Staff

@ Resolve AI

Technology

The role of multi agent systems in making software engineers AI-native

Discover why most AI approaches like LLMs or individual AI agents fail in complex production environments and how multi-agent systems enable truly AI-native engineering. Learn the architectural patterns from our Stanford presentation that help engineering teams shift from AI-assisted to AI-native workflows.

Fireside Chat: How FinServ Companies Optimize Cost with AI for Prod

Hear AI strategies and approaches from engineering leaders at FinServ companies including Affirm, MSCI, and SoFi.

Company

Introducing Resolve AI

Resolve AI has launched with a $35M Seed round to automate software operations for engineers using agentic AI, reducing mean time to resolve incidents by 5x, and allowing engineers to focus on innovation by handling operational tasks autonomously.

Social

Machines on call for humans

Join the conversation

Claude Sonnet 4.6: Testing adaptive thinking on AI agents for prod

Adaptive thinking

Working with Claude’s 4.6 generation in your agent architecture

Sonnet 4.6 closes the gap with Opus on performance

What we're exploring next

AI for prod ebook

AI for prod ebook

Related Post

The role of multi agent systems in making software engineers AI-native

Fireside Chat: How FinServ Companies Optimize Cost with AI for Prod

Introducing Resolve AI