$ resources: requests: cpu: 100m memory: 500Mi # No limits set
resources: requests: cpu: 100m memory: 500Mi # No limits set
resources: requests: cpu: 100m memory: 500Mi # No limits set
-weight: 500;">kubectl get events -n live --field-selector involvedObject.name=<pod-name>
-weight: 500;">kubectl get events -n live --field-selector involvedObject.name=<pod-name>
-weight: 500;">kubectl get events -n live --field-selector involvedObject.name=<pod-name>
TaintManagerEviction pod/<new-pod> Cancelling deletion of Pod
TaintManagerEviction pod/<new-pod> Cancelling deletion of Pod
TaintManagerEviction pod/<new-pod> Cancelling deletion of Pod
NodeNotReady node/<node-name> Node -weight: 500;">status is now: NodeNotReady
NodeReady node/<node-name> Node -weight: 500;">status is now: NodeReady
NodeNotReady node/<node-name> Node -weight: 500;">status is now: NodeNotReady
NodeReady node/<node-name> Node -weight: 500;">status is now: NodeReady
NodeNotReady node/<node-name> Node -weight: 500;">status is now: NodeNotReady
NodeReady node/<node-name> Node -weight: 500;">status is now: NodeReady
aws ec2 describe-instance-credit-specifications \ --instance-ids <instance-id>
aws ec2 describe-instance-credit-specifications \ --instance-ids <instance-id>
aws ec2 describe-instance-credit-specifications \ --instance-ids <instance-id>
CpuCredits: unlimited
CpuCredits: unlimited
CpuCredits: unlimited
-weight: 500;">service pod: 7m (0.007 CPUs)
weave-scope-agent: 40m
aws-node: 23m
kube-proxy: 7m
ebs-csi: 3m
efs-csi: 5m
──────────────────────────
Total: ~85m (out of 2000m available)
-weight: 500;">service pod: 7m (0.007 CPUs)
weave-scope-agent: 40m
aws-node: 23m
kube-proxy: 7m
ebs-csi: 3m
efs-csi: 5m
──────────────────────────
Total: ~85m (out of 2000m available)
-weight: 500;">service pod: 7m (0.007 CPUs)
weave-scope-agent: 40m
aws-node: 23m
kube-proxy: 7m
ebs-csi: 3m
efs-csi: 5m
──────────────────────────
Total: ~85m (out of 2000m available)
aws ssm send-command \ --instance-ids <instance-id> \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["cat /proc/loadavg"]'
aws ssm send-command \ --instance-ids <instance-id> \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["cat /proc/loadavg"]'
aws ssm send-command \ --instance-ids <instance-id> \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["cat /proc/loadavg"]'
34.04 25.03 22.70
34.04 25.03 22.70
34.04 25.03 22.70
CPU: some avg10=85.41 avg60=84.62 avg300=82.10
Memory: some avg10=98.98 avg60=98.90 avg300=98.38 full avg10=62.85 avg60=63.91 avg300=63.33
IO: some avg10=0.04 avg60=0.16 avg300=0.21
CPU: some avg10=85.41 avg60=84.62 avg300=82.10
Memory: some avg10=98.98 avg60=98.90 avg300=98.38 full avg10=62.85 avg60=63.91 avg300=63.33
IO: some avg10=0.04 avg60=0.16 avg300=0.21
CPU: some avg10=85.41 avg60=84.62 avg300=82.10
Memory: some avg10=98.98 avg60=98.90 avg300=98.38 full avg10=62.85 avg60=63.91 avg300=63.33
IO: some avg10=0.04 avg60=0.16 avg300=0.21
MemTotal: 3,936 MB
MemFree: 86 MB (only 86 MB free!)
MemAvailable: 735 MB (after counting reclaimable cache)
SwapTotal: 1,048 MB
SwapFree: 549 MB (500 MB of swap in use)
Committed_AS: 5,001 MB (5 GB committed on a 4 GB machine!)
MemTotal: 3,936 MB
MemFree: 86 MB (only 86 MB free!)
MemAvailable: 735 MB (after counting reclaimable cache)
SwapTotal: 1,048 MB
SwapFree: 549 MB (500 MB of swap in use)
Committed_AS: 5,001 MB (5 GB committed on a 4 GB machine!)
MemTotal: 3,936 MB
MemFree: 86 MB (only 86 MB free!)
MemAvailable: 735 MB (after counting reclaimable cache)
SwapTotal: 1,048 MB
SwapFree: 549 MB (500 MB of swap in use)
Committed_AS: 5,001 MB (5 GB committed on a 4 GB machine!)
PODS (total): 616 MB
├── main -weight: 500;">service 381 MB
├── weave-scope-agent 64 MB
├── aws-node (VPC CNI) 39 MB
├── promtail 30 MB
├── ebs-csi-node 26 MB
├── kube-proxy 24 MB SYSTEM PROCESSES: 58 MB
PAGE CACHE: 831 MB
FREE: 87 MB
PODS (total): 616 MB
├── main -weight: 500;">service 381 MB
├── weave-scope-agent 64 MB
├── aws-node (VPC CNI) 39 MB
├── promtail 30 MB
├── ebs-csi-node 26 MB
├── kube-proxy 24 MB SYSTEM PROCESSES: 58 MB
PAGE CACHE: 831 MB
FREE: 87 MB
PODS (total): 616 MB
├── main -weight: 500;">service 381 MB
├── weave-scope-agent 64 MB
├── aws-node (VPC CNI) 39 MB
├── promtail 30 MB
├── ebs-csi-node 26 MB
├── kube-proxy 24 MB SYSTEM PROCESSES: 58 MB
PAGE CACHE: 831 MB
FREE: 87 MB
KERNEL SLAB: 2,194 MB
├── SReclaimable: 50 MB (can be freed)
├── SUnreclaim: 2,143 MB (CANNOT be freed!)
KERNEL SLAB: 2,194 MB
├── SReclaimable: 50 MB (can be freed)
├── SUnreclaim: 2,143 MB (CANNOT be freed!)
KERNEL SLAB: 2,194 MB
├── SReclaimable: 50 MB (can be freed)
├── SUnreclaim: 2,143 MB (CANNOT be freed!)
SLAB OBJECT COUNT x SIZE = TOTAL
────────────────────────────────────────────────────────────
kmalloc-1k 1,667,384 x 1,024 B = 1,632 MB
skbuff_head_cache 1,657,980 x 256 B = 414 MB
────────────────────────────────────────────────────────────
These two alone: 2,046 MB
SLAB OBJECT COUNT x SIZE = TOTAL
────────────────────────────────────────────────────────────
kmalloc-1k 1,667,384 x 1,024 B = 1,632 MB
skbuff_head_cache 1,657,980 x 256 B = 414 MB
────────────────────────────────────────────────────────────
These two alone: 2,046 MB
SLAB OBJECT COUNT x SIZE = TOTAL
────────────────────────────────────────────────────────────
kmalloc-1k 1,667,384 x 1,024 B = 1,632 MB
skbuff_head_cache 1,657,980 x 256 B = 414 MB
────────────────────────────────────────────────────────────
These two alone: 2,046 MB
affected node large node another small node (t3a.medium) (t3a.xlarge) (t3a.medium)
──────────────────────────────────────────────────────────────────────────
Total RAM 3,936 MB 16,207 MB 3,938 MB
Slab (SUnreclaim) 2,143 MB 4,533 MB 1,744 MB
skbuff count 1,667,384 3,309,501 1,310,669
Memory pressure 98.98% 0.00% 0.00%
Load average 32.64 2.11 0.12
affected node large node another small node (t3a.medium) (t3a.xlarge) (t3a.medium)
──────────────────────────────────────────────────────────────────────────
Total RAM 3,936 MB 16,207 MB 3,938 MB
Slab (SUnreclaim) 2,143 MB 4,533 MB 1,744 MB
skbuff count 1,667,384 3,309,501 1,310,669
Memory pressure 98.98% 0.00% 0.00%
Load average 32.64 2.11 0.12
affected node large node another small node (t3a.medium) (t3a.xlarge) (t3a.medium)
──────────────────────────────────────────────────────────────────────────
Total RAM 3,936 MB 16,207 MB 3,938 MB
Slab (SUnreclaim) 2,143 MB 4,533 MB 1,744 MB
skbuff count 1,667,384 3,309,501 1,310,669
Memory pressure 98.98% 0.00% 0.00%
Load average 32.64 2.11 0.12
-weight: 500;">kubectl get daemonsets -n weave
-weight: 500;">kubectl get daemonsets -n weave
-weight: 500;">kubectl get daemonsets -n weave
NAME DESIRED CURRENT READY AGE
weave-scope-agent 16 16 16 2y326d
NAME DESIRED CURRENT READY AGE
weave-scope-agent 16 16 16 2y326d
NAME DESIRED CURRENT READY AGE
weave-scope-agent 16 16 16 2y326d
-weight: 500;">kubectl delete namespace weave
-weight: 500;">kubectl delete namespace weave
-weight: 500;">kubectl delete namespace weave
BEFORE AFTER
──────────────────────────────────────────────────────
Slab (SUnreclaim) 2,143 MB 74 MB
MemFree 87 MB 1,937 MB
MemAvailable 735 MB 2,600 MB
Memory pressure 98.98% 0.00%
Load average 32.64 0.39
BEFORE AFTER
──────────────────────────────────────────────────────
Slab (SUnreclaim) 2,143 MB 74 MB
MemFree 87 MB 1,937 MB
MemAvailable 735 MB 2,600 MB
Memory pressure 98.98% 0.00%
Load average 32.64 0.39
BEFORE AFTER
──────────────────────────────────────────────────────
Slab (SUnreclaim) 2,143 MB 74 MB
MemFree 87 MB 1,937 MB
MemAvailable 735 MB 2,600 MB
Memory pressure 98.98% 0.00%
Load average 32.64 0.39
-weight: 500;">kubectl get daemonsets --all-namespaces
-weight: 500;">kubectl get daemonsets --all-namespaces
-weight: 500;">kubectl get daemonsets --all-namespaces
# Pod events
-weight: 500;">kubectl get events -n <namespace> --sort-by='.lastTimestamp' # Node conditions
-weight: 500;">kubectl describe node <node-name> | grep -A5 Conditions # All pods on a node
-weight: 500;">kubectl get pods --all-namespaces --field-selector spec.nodeName=<node-name> -o wide # Pod resource usage
-weight: 500;">kubectl top pods -n <namespace> # List all DaemonSets
-weight: 500;">kubectl get daemonsets --all-namespaces
# Pod events
-weight: 500;">kubectl get events -n <namespace> --sort-by='.lastTimestamp' # Node conditions
-weight: 500;">kubectl describe node <node-name> | grep -A5 Conditions # All pods on a node
-weight: 500;">kubectl get pods --all-namespaces --field-selector spec.nodeName=<node-name> -o wide # Pod resource usage
-weight: 500;">kubectl top pods -n <namespace> # List all DaemonSets
-weight: 500;">kubectl get daemonsets --all-namespaces
# Pod events
-weight: 500;">kubectl get events -n <namespace> --sort-by='.lastTimestamp' # Node conditions
-weight: 500;">kubectl describe node <node-name> | grep -A5 Conditions # All pods on a node
-weight: 500;">kubectl get pods --all-namespaces --field-selector spec.nodeName=<node-name> -o wide # Pod resource usage
-weight: 500;">kubectl top pods -n <namespace> # List all DaemonSets
-weight: 500;">kubectl get daemonsets --all-namespaces
# System pressure — which resource is the bottleneck?
cat /proc/pressure/cpu
cat /proc/pressure/memory
cat /proc/pressure/io # Memory breakdown — look for SUnreclaim
grep -E "MemTotal|MemFree|MemAvailable|Slab|SReclaimable|SUnreclaim|SwapTotal|SwapFree" /proc/meminfo # Top kernel slab consumers
cat /proc/slabinfo | sort -k3 -rn | head -10 # Load average
cat /proc/loadavg
# System pressure — which resource is the bottleneck?
cat /proc/pressure/cpu
cat /proc/pressure/memory
cat /proc/pressure/io # Memory breakdown — look for SUnreclaim
grep -E "MemTotal|MemFree|MemAvailable|Slab|SReclaimable|SUnreclaim|SwapTotal|SwapFree" /proc/meminfo # Top kernel slab consumers
cat /proc/slabinfo | sort -k3 -rn | head -10 # Load average
cat /proc/loadavg
# System pressure — which resource is the bottleneck?
cat /proc/pressure/cpu
cat /proc/pressure/memory
cat /proc/pressure/io # Memory breakdown — look for SUnreclaim
grep -E "MemTotal|MemFree|MemAvailable|Slab|SReclaimable|SUnreclaim|SwapTotal|SwapFree" /proc/meminfo # Top kernel slab consumers
cat /proc/slabinfo | sort -k3 -rn | head -10 # Load average
cat /proc/loadavg
# CPU credit balance (burstable instances)
aws cloudwatch get-metric-statistics \ --namespace AWS/EC2 --metric-name CPUCreditBalance \ --dimensions Name=InstanceId,Value=<id> \ ---weight: 500;">start-time <-weight: 500;">start> --end-time <end> \ --period 300 --statistics Average # Run commands on a node without SSH
aws ssm send-command \ --instance-ids <id> \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["your-command"]'
# CPU credit balance (burstable instances)
aws cloudwatch get-metric-statistics \ --namespace AWS/EC2 --metric-name CPUCreditBalance \ --dimensions Name=InstanceId,Value=<id> \ ---weight: 500;">start-time <-weight: 500;">start> --end-time <end> \ --period 300 --statistics Average # Run commands on a node without SSH
aws ssm send-command \ --instance-ids <id> \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["your-command"]'
# CPU credit balance (burstable instances)
aws cloudwatch get-metric-statistics \ --namespace AWS/EC2 --metric-name CPUCreditBalance \ --dimensions Name=InstanceId,Value=<id> \ ---weight: 500;">start-time <-weight: 500;">start> --end-time <end> \ --period 300 --statistics Average # Run commands on a node without SSH
aws ssm send-command \ --instance-ids <id> \ --document-name "AWS-RunShellScript" \ --parameters 'commands=["your-command"]' - Memory usage had climbed to 832 MiB, then abruptly dropped to zero
- CPU dropped to zero at the same time
- After a ~45 minute gap, a new pod appeared and started running normally - some = percentage of time at least one process was stalled on this resource
- full = percentage of time ALL processes were stalled - Installed 2 years and 326 days ago via raw -weight: 500;">kubectl apply (no Helm, no GitOps)
- Running weaveworks/scope:1.13.2 — the last version ever released
- Weaveworks, the company behind it, shut down in 2024
- The DaemonSet was running on all 16 nodes, intercepting all network traffic
- Its packet interception was creating socket buffers in kernel space that were never freed - AWS Burstable Instances Explained: CPU Credits, Throttling, and Why Your t3 Instance Isn't What You Think
- Linux Memory Explained: Swap, Kernel Slab, and skbuff — What Kubernetes Doesn't Show You