Caught a performance regression before it became an incident

Founder and CEO, Resolve AI

Summary

  • A seemingly innocent database query was added to the AdService, causing system-wide latency spikes on the /api/data endpoint
  • This "harmless" addition was executing select count(1) from product; on every single ad request, creating a massive bottleneck under load
  • Resolve AI traced the performance degradation from frontend symptoms back to the specific code change, pinpointing the exact file and method
  • Integrated with Cursor to automatically generate a PR with the fix, turning a potential production crisis into a 5-minute resolution

What was the incident?

Frontend response times on /api/data had been gradually increasing over the past few days, with latency spikes becoming more frequent during peak traffic periods. The performance degradation wasn't dramatic enough to trigger critical alerts, but engineers noticed the API feeling "sluggish" and started investigating.

Spiros suspected something in recent deployments might be causing the issue, but the code changes looked routine - mostly feature additions and minor refactoring. Without obvious smoking guns in the recent commits, the team faced the prospect of diving deep into profiling, tracing through multiple services, and correlating deployment timings with performance metrics.

How was the resolution?

Screenshot 2025-07-25 at 10.14.05 AM.png

*Question in Slack triggered Resolve AI, which began tracing the latency from frontend symptoms back through the service chain.*

Resolve AI immediately started investigating the performance regression across multiple dimensions. While analyzing recent deployments to the AdService, it simultaneously examined database query patterns, request tracing data, and load distribution metrics. Unlike the typical manual approach of checking each service sequentially, Resolve AI explored several theories in parallel: database connection issues, cache misses, resource contention, and code regressions.

Screenshot 2025-07-25 at 10.14.36 AM.png

*Cross-system analysis revealed the correlation between a recent code addition and the exponential increase in database load.*

The breakthrough came when Resolve AI connected seemingly unrelated data points. It found that database query volume had spiked dramatically, correlating precisely with a recent deployment that added what appeared to be a simple database connectivity check. Digging deeper into the code, it discovered that the innocent-looking select count(1) from product; query in the AdService's getAds method was executing on every single ad request.

This created a cascading performance problem: each frontend request triggered multiple ad fetches, each ad fetch triggered a database count query, and under load, this innocent check became a massive bottleneck that delayed every request in the system.

Screenshot 2025-07-25 at 10.14.51 AM.png

*Resolve AI provided the exact code location and reasoning, enabling immediate action with Cursor for automated fix generation.*

Resolve AI didn't stop at identifying the problem - it provided the complete resolution path. It pinpointed the exact file (src/ad/src/main/java/oteldemo/AdService.java), the specific method (getAds), and the precise code block to remove. More importantly, it explained why this seemingly harmless database call was causing system-wide latency and offered alternative approaches for maintaining database connectivity checks without impacting user requests.

Screenshot 2025-07-25 at 10.15.04 AM.png

The integration with Cursor transformed this analysis into immediate action. With a simple @Cursor create a pr for this command, the fix was implemented and ready for deployment, complete with proper commit messages and code changes.

What was the impact?

  • Prevented a potential production outage by catching the performance regression before it reached critical thresholds during peak traffic
  • Reduced investigation time from hours to minutes, eliminating the need to manually profile services, analyze traces, and correlate deployment histories
  • Provided precise root cause analysis that went beyond "database is slow" to identify the exact code change and its multiplicative impact
  • Streamlined the resolution workflow by integrating diagnosis with automated code generation, turning analysis into immediate action
  • Delivered operational knowledge about performance patterns and best practices for database connectivity checks in high-throughput endpoints

Handoff your headaches to Resolve AI

Get back to driving innovation and delivering customer value.

Join our community

©Resolve.ai - All rights reserved

semi-circle-shape
square-shape
shrinked-square-shape
bell-shape