Understanding k8s cluster infrastructure and resource allocation

What makes this hard?

kubectl shows you pods and deployments, but requires separate commands for each namespace and doesn't aggregate resource totals. AWS console shows you EC2 nodes and availability zones, but doesn't connect them to Kubernetes workloads. Your monitoring dashboards show CPU metrics, but don't label them by specific nodes or map to infrastructure topology. Building a complete infrastructure picture requires fragmented queries across multiple interfaces:

Run kubectl get namespaces, then query each namespace individually for deployments
Execute kubectl describe for each deployment to find resource limits and replica counts
Check kubectl get nodes for node details, then parse labels for region and AZ placement
Query daemonsets separately to understand which system controllers run where
Search for metrics with node identifiers to find CPU/memory utilization
Manually connect: namespace resources → node capacity → AWS topology → system controllers → utilization metrics

How did Resolve AI help?

With one query, Resolve AI navigated the infrastructure graph to build a complete cluster overview with resource breakdowns:

Mapped cluster structure across 6 namespaces: 27 deployments, 57 pods total—ecommerce-app (21 deployments, 44 pods), kube-system (2 deployments, 7 pods), satellite/karpenter/infra-system/amazon-guardduty system namespaces
Analyzed resource allocation for 21 ecommerce deployments: Total 9.2Gi memory allocated with no CPU requests/limits specified anywhere, highest consumers kafka and load-generator (1500Mi each), cart service (2 replicas, 1Gi per pod), fraud-detection (2 replicas, 750Mi per pod)
Identified node topology across AWS us-east-2: 7 nodes spanning 3 availability zones—us-east-2a (3 nodes: 2 Fargate + 1 EC2 r7g.xlarge ARM64), us-east-2b (1 Fargate), us-east-2c (3 Fargate)
Discovered networking and storage controllers: AWS VPC CNI Plugin (aws-node daemonset) with warm-ip-target=1 configuration, AWS EBS CSI Driver (ebs-csi-controller with 2 replicas, ebs-csi-node daemonset) for dynamic volume provisioning
Found critical architecture pattern: System daemonsets (aws-node, ebs-csi-node, kube-proxy) only run on the single EC2 node—6 Fargate nodes can't support traditional daemonsets due to serverless nature, creating hybrid model where EC2 handles system operations
Identified monitoring gap: Node-level CPU metrics unavailable despite queries—metrics lack node hostname labels, and the EC2 node is only 7 hours old, so 2-day historical data doesn't exist

Resolve AI generated a visual Mermaid diagram showing the AWS region containing three availability zones, each with specific nodes labeled by type (Fargate/EC2), architecture (amd64/arm64), and system daemonsets. The investigation revealed that no CPU limits are set cluster-wide (potential resource contention risk) and the single EC2 node is a single point of failure for system-level networking and storage operations.RetryClaude can make mistakes. Please double-check responses.

Social

Understanding k8s cluster infrastructure and resource allocation

What makes this hard?

How did Resolve AI help?

Shaping the future of software engineering

Join our community