Build or buy? See where eng teams are landing

Coding agents have made it faster to write and ship software. Most of our conversations in engineering have been about that half of the lifecycle. The other half, keeping what you shipped running, has not improved at the same rate. In many cases, the acceleration on the build side has increased the operational load on the teams responsible for keeping systems healthy.
Every engineering team carries a backlog of production work that never appears on a sprint board. Watching a deployment roll out, writing the on-call handoff, re-investigating the p99 drift that returned overnight, running the capacity check that nobody officially owns. This work is continuous, and it has no ticket, no trigger, and no explicit owner. It accumulates on the people who care enough to do it, and it competes with the work they were actually hired to do.
Background agents, introduced earlier this year, are our answer to that backlog. These are always-on agents that run on a schedule or a trigger, carry live context about your production environment, and handle operational work continuously without requiring a human to initiate each task. This post follows up from our Behind the Build: Always-on background agents AMA to explain what they are, how they work, and why the architecture matters specifically for production systems.
A useful way to think about any piece of production work is:
task = execution x production context
Execution is the actual analysis or change. It is usually small, deterministic, and well understood. Production context is everything required to execute correctly inside your environment: which services are involved, what changed recently, what normal behavior looks like, where the runbooks live, which alerts are signal versus noise. This part grows with every PR, even when the underlying task does not change. A deploy health check takes ten minutes to run. Knowing which services to check, which baselines are meaningful, and which thresholds to trust in your specific environment takes years to accumulate.
Production context dominates the cost of operational toil. The task itself is rarely the expensive part. The expensive part is navigating the environment well enough to know how to perform the task.
General-purpose models compress execution; they can draft, summarize, and reason. What they cannot do is hold your service topology, your deployment history, your team's tribal knowledge about brittle components, or the context of what changed in the last three deploys. Without that, a frontier model can help you think faster, but it cannot perform the work. You remain responsible for navigating the environment.
Background agents carry the environment as a live context graph and bring that context into every task they run. When the environment is carried for you, the cost of the task collapses back to execution, and execution can run on a schedule.
A background agent is defined by three architectural properties.
1. When it runs:
Agents wake on a predefined schedule, a trigger, or a direct message.
A cron-aligned morning digest, a deploy event that matches a keyword filter, a Slack or Teams DM.
The agent tracks why it woke up and executes the appropriate task list.
2. How it does the job:
Each agent has a defined set of tasks, skills, and integrations.
Tasks are durable work items, one-off or scheduled, that persist across restarts.
Skills encode how to use specific capabilities: how to query a datastore, how to read a trace, how to correlate a deploy to a latency spike. You can bring your own skills from a Git repository, and the agent will load them.
Integrations connect the agent to your tools, with permissions scoped per agent.
3. How it stays “alive”:
Background agents run in the cloud, in a sandboxed filesystem that persists across restarts.
Tasks, memory, and intermediate files do not disappear when the agent goes idle.
It enters standby, wakes when there is work to do, and resumes where it left off.
Where the production context cost shows up
The thesis is easier to see in concrete work. Here are four categories we are starting with in which production context is the dominant cost and background agents eliminate it.
Deployment monitoring. Most CI/CD pipelines check a fixed set of metrics during a release. What they do not do is read the changeset and construct a monitoring plan around what specifically changed. If a deploy touches a database query, someone has to know to watch that query. If a feature flag is flipping, someone has to know to include that in the check. That knowledge lives in engineers' heads, not in the pipeline. A background agent wakes on a deploy event, reads the changeset, builds an environment-aware health plan, runs checks at five and fifteen minutes post-deploy, and posts findings to the relevant channel. The execution is small, but without the environment context, it cannot happen at all.
Scheduled health and anomaly checks. Recurring drift on a key service is a known problem. The check to catch it requires knowing the service, the normal baseline, which datastore to query, and which threshold matters in this specific environment. A general-purpose model given the task cold cannot do it correctly. An agent that carries your service graph and historical baselines can run checks on a schedule and surface drift before it becomes an incident.
Operational reports and handoffs. A morning digest or on-call handoff requires pulling from multiple tools, knowing which alerts from last night were signal versus background noise, and writing something an engineer can act on in the first five minutes of their shift. The execution, drafting a short summary, is trivial. The context required to do it accurately - what paged, what was investigated, what is still open, what can be safely ignored - is not. Most teams do not produce these consistently because the context cost is too high relative to the perceived value. An agent that already holds that context produces them on schedule every time.
First responder to engineering questions. A Slack/Teams channel where engineers ask operational questions is a constant source of interruptions. Most questions are answerable: is this behavior expected, what is the CPU doing right now, did this alert fire before. The answer requires access to tools and environmental context, not difficulty with reasoning. An agent watching the channel answers what it can, stays silent when it cannot, and escalates to a specific person when it is stuck. The interruption is handled without hitting a human because the context required to respond to it is already loaded.
We made an early decision to avoid large, manually maintained prompt files. As the number of background agents grows, prompt maintenance becomes an operational burden in its own right. Instead, the agent configures itself through conversation. If you want a new operational report, you describe it the way you would describe it to a colleague. The agent asks follow-up questions, confirms the channel to post in and the schedule to run on, and then creates the task. Adjustments follow the same pattern.
The creation interface is identical in the web UI and in Slack. You describe the work, and the agent figures out how to perform it. This also means that tuning an agent over time, pushing back on a finding, correcting a false positive, adjusting a threshold, happens in the same conversation. The agent incorporates the correction on the next run. It is not a configure-once system.
Of course, you can always see the full prompt the agent is operating with, along with all the agent work for the tasks it handles, but we think the core way people will work with background agents will be conversational.
We run background agents internally across several workflows.
The time savings are real, but the more important change is consistency. This work now happens whether or not anyone has bandwidth.
Instead of starting the day in an empty queue or triaging a wall of alerts, you open Resolve AI to findings that have already been investigated, with recommended next steps. Agents pre-investigate every item before you see it. You step in to review, approve, or dig further, not to start from scratch.
The work that was previously invisible because the context cost was too high to justify doing it manually now happens on schedule, with cited evidence across teams and services. The capacity check, the drift re-investigation, and the deployment health summary are not heroic acts performed by whoever had spare cycles; they are scheduled tasks that complete regardless of whether anyone has bandwidth. This is just the start; we are working to expand the range of tasks that background agents can support, such as capacity management, cost analysis, some security control meetings, tuning alerts, and more, with the aim of helping engineers become more proactive when running production systems.
That is the actual shift. Not that engineers do less work, but that the work that always requires on-demand environmental awareness stops depending on always-on engineers to carry it. Be it deployment monitoring, scheduled health checks, and operational reports. You can create background agents for any of these tasks in Resolve AI. Start with the one that matches your biggest interrupt, tune it through conversation, then move to the next one.

Join our engineering leads for "Behind the Build", a webinar series deep-dive into how we built agents that run software.

Justin Smith
Founding Engineer

The question isn't whether AI belongs in production anymore. Here's what engineers at AWS Summit NYC 2026 told us about how agents run your software, what guardrails they want, and how the pricing should work.

Watch how Resolve AI investigates a service timeout from application logs through Kubernetes pods down to failing memory modules in a UCS blade - building a complete causation chain in 3 minutes. See the stark contrast between traditional multi-team incident response (4 teams, multiple tools, hours of coordination) and AI-native investigation that maps dependencies from app code to storage infrastructure without organizational handoffs. Learn why engineering silos slow incident response and how AI agents can reason across the entire production stack as one connected system.

Hear AI strategies and approaches from engineering leaders at FinServ companies including Affirm, MSCI, and SoFi.