Fixing Kubernetes 502 Bad Gateway Error
Kubernetes 502 errors mean your ingress controller reached a backend pod and got something unusable back. This guide covers the most common causes like misconfigured readiness probes, service selector mismatches, wrong target ports, and backend timeouts with the exact commands to diagnose and fix each one.
How to Fix Kubernetes '502 Bad Gateway' Error
You're getting 502s from your Kubernetes cluster and traffic is dropping. The ingress controller is returning a bad gateway response, which means it reached out to a backend pod and got something unusable back. The problem could be in your pods, your service configuration, your ingress setup, or even your readiness probes. Here's how to work through it.
What a 502 Actually Means in Kubernetes
A 502 Bad Gateway means the ingress controller (usually Nginx, Traefik, or an AWS ALB) tried to forward a request to a backend pod and either got an invalid response or no response at all. The ingress controller is working fine. The issue is between the ingress and the pod it's trying to reach.
This is different from a 503, where the ingress knows there's no healthy backend to send traffic to. With a 502, traffic is being routed to a pod that can't handle it properly.
Common Causes
Pods crashing or not ready. The most frequent cause. Your pod is in the target pool but it's either crashing, still starting up, or failing health checks. The ingress sends traffic to it and gets nothing back.
Check your pod status first:
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
Look for pods in CrashLoopBackOff, Error, or Pending states. If a pod just restarted, there's a window where it's in the target pool but not actually ready to serve traffic.
Readiness probes misconfigured. This is the sneaky one. If your readiness probe is too lenient, Kubernetes marks the pod as ready before your application is actually able to handle requests. Traffic arrives, the app isn't listening yet, and the ingress gets back a connection refused or timeout.
Check what your readiness probe is doing:
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].readinessProbe}'
If it's hitting a generic / endpoint instead of a real health check path, or if the initialDelaySeconds is too short for your app's startup time, that's likely your problem.
Service selector mismatch. Your service exists and your pods exist, but they're not actually connected. The service selector labels don't match the pod labels, so the service has no endpoints. The ingress points to an empty service and returns 502.
kubectl get endpoints <service-name> -n <namespace>
If this returns an empty list of addresses, your selector is wrong. Compare the labels:
kubectl get svc <service-name> -n <namespace> -o jsonpath='{.spec.selector}'
kubectl get pods -n <namespace> --show-labels
Wrong target port. Your service is pointing to a port that your container isn't listening on. The connection reaches the pod but nothing is there to accept it. This happens a lot after someone changes the application's listen port without updating the service spec.
kubectl get svc <service-name> -n <namespace> -o yaml
Check that targetPort matches the port your application actually binds to inside the container. Not the container port in the deployment spec, but the port your code listens on.
Backend timeout. Your application is slow to respond and the ingress controller gives up before the response arrives. This looks like a 502 from the client's perspective, but the root cause is a slow backend, not a broken one. If you're seeing 502s only on certain endpoints that do heavy processing, this is probably it.
For Nginx ingress:
kubectl get ingress <ingress-name> -n <namespace> -o yaml
Look for proxy-read-timeout and proxy-send-timeout annotations. Default is often 60 seconds. If your endpoint takes longer than that, you'll get 502s.
Ingress controller resource limits. If the ingress controller pods themselves are running out of memory or CPU, they can't proxy requests properly. Check whether the ingress controller is getting OOMKilled or throttled:
kubectl top pods -n ingress-nginx
kubectl describe pod <ingress-controller-pod> -n ingress-nginx
Debugging Sequence
If you're not sure which of the above is your problem, here's the order to check:
Start with pod health. Run kubectl get pods and look for anything that isn't Running and Ready. If pods are restarting, that's your answer. Check logs with kubectl logs <pod-name> --previous to see what happened before the last crash.
Then check endpoints. If pods look healthy, run kubectl get endpoints <service-name>. If empty, your labels are mismatched. If endpoints exist, the pods are reachable from the service layer.
Test the connection from inside the cluster. Exec into a debug pod and curl the service directly:
kubectl run debug --image=curlimages/curl -it --rm -- curl -v http://<service-name>.<namespace>.svc.cluster.local:<port>/health
If this works but external traffic doesn't, the issue is in the ingress layer. If this also fails, the issue is in the pod or service.
Check ingress controller logs. The ingress controller will tell you exactly what happened when it tried to proxy the request:
kubectl logs <ingress-controller-pod> -n ingress-nginx | grep 502
You'll see upstream connection errors, timeouts, or reset indicators that point to the specific backend that's failing.
Preventing Recurring 502s
Most 502s happen during deployments. A rolling update drains old pods and starts new ones, and there's a brief window where traffic can land on a pod that isn't ready or one that's shutting down.
Three things help here. Set your readiness probe to actually verify that your application is ready to serve traffic, not just that the container started. Add a preStop lifecycle hook with a short sleep to give the ingress controller time to remove the pod from its backend list before the pod shuts down. And make sure your terminationGracePeriodSeconds is long enough for in-flight requests to complete.
When It Gets Complicated
The straightforward cases above cover most 502s. But sometimes the root cause isn't in Kubernetes at all. It could be a downstream dependency that's timing out, causing your pods to hold connections open too long. Or a network policy that's silently dropping traffic between namespaces. Or a service mesh sidecar that's not ready when your application container starts.
When you're investigating across multiple layers like this, connecting the 502 you see at the ingress with a code change from yesterday, a resource limit on the pod, and a downstream service that started timing out, that's where troubleshooting stops being a checklist and starts requiring real investigative reasoning across your full stack.
What is Resolve AI
Resolve AI investigates production issues across your code, infrastructure, and telemetry. Instead of manually checking pods, services, ingress logs, and recent deploys one by one, Resolve investigates the way a senior SRE would: pulling context from across your stack, forming hypotheses, and narrowing down root cause.
If you spend too much time on investigations like this, see Resolve AI in action.
