Hour trước endpoint: ~14 GB qua NAT
Hour ngay sau: ~0.4 GB qua NAT ↑ drop ~97%
Hour trước endpoint: ~14 GB qua NAT
Hour ngay sau: ~0.4 GB qua NAT ↑ drop ~97%
Hour trước endpoint: ~14 GB qua NAT
Hour ngay sau: ~0.4 GB qua NAT ↑ drop ~97%
for inst in instances_to_migrate: assert inst["EnaSupport"] == True # ENA driver loaded assert inst_ami["VirtualizationType"] == "hvm" # not paravirtual (Xen-only)
for inst in instances_to_migrate: assert inst["EnaSupport"] == True # ENA driver loaded assert inst_ami["VirtualizationType"] == "hvm" # not paravirtual (Xen-only)
for inst in instances_to_migrate: assert inst["EnaSupport"] == True # ENA driver loaded assert inst_ami["VirtualizationType"] == "hvm" # not paravirtual (Xen-only)
aws ec2 wait instance-running # SAI cho gen migration
aws ec2 wait instance-running # SAI cho gen migration
aws ec2 wait instance-running # SAI cho gen migration
aws ec2 wait instance-status-ok # confirm cả system status + instance status
aws ec2 wait instance-status-ok # confirm cả system status + instance status
aws ec2 wait instance-status-ok # confirm cả system status + instance status
NAT Gateway traffic share
nat-A ~97% ← 1 NAT này thôi
nat-B ~2%
các NAT khác <1%
NAT Gateway traffic share
nat-A ~97% ← 1 NAT này thôi
nat-B ~2%
các NAT khác <1%
NAT Gateway traffic share
nat-A ~97% ← 1 NAT này thôi
nat-B ~2%
các NAT khác <1%
apm-server uninstall → OK
kibana uninstall → HANG (state "uninstalling")
elasticsearch uninstall → OK (đã uninstall trước rồi)
apm-server uninstall → OK
kibana uninstall → HANG (state "uninstalling")
elasticsearch uninstall → OK (đã uninstall trước rồi)
apm-server uninstall → OK
kibana uninstall → HANG (state "uninstalling")
elasticsearch uninstall → OK (đã uninstall trước rồi)
WRONG: ES uninstall → Kibana uninstall (hook FAIL)
RIGHT: Kibana uninstall trước → ES uninstall sau
WRONG: ES uninstall → Kibana uninstall (hook FAIL)
RIGHT: Kibana uninstall trước → ES uninstall sau
WRONG: ES uninstall → Kibana uninstall (hook FAIL)
RIGHT: Kibana uninstall trước → ES uninstall sau
# Option 1: Force delete hook job + uninstall không hook
kubectl delete job post-delete-kibana-kibana
helm uninstall kibana --no-hooks # Option 2: Patch finalizers nếu Helm release stuck
kubectl patch helmrelease kibana -p '{"metadata":{"finalizers":[]}}' --type=merge
# Option 1: Force delete hook job + uninstall không hook
kubectl delete job post-delete-kibana-kibana
helm uninstall kibana --no-hooks # Option 2: Patch finalizers nếu Helm release stuck
kubectl patch helmrelease kibana -p '{"metadata":{"finalizers":[]}}' --type=merge
# Option 1: Force delete hook job + uninstall không hook
kubectl delete job post-delete-kibana-kibana
helm uninstall kibana --no-hooks # Option 2: Patch finalizers nếu Helm release stuck
kubectl patch helmrelease kibana -p '{"metadata":{"finalizers":[]}}' --type=merge
1. Get cost trend (Cost Explorer) → identify magnitude
2. Group-by service → find Pareto concentration
3. Drill usage_type/resource_id → find specific contributor
4. Pre-flight check before destructive action
5. Confirm explicit với stakeholder (đặc biệt prod)
6. Execute với reversibility in mind
7. MEASURE actual impact (CloudWatch metrics, không chỉ Cost Explorer)
8. Build verification command để re-check trong tương lai
9. Document findings + remaining TODOs in source repo
1. Get cost trend (Cost Explorer) → identify magnitude
2. Group-by service → find Pareto concentration
3. Drill usage_type/resource_id → find specific contributor
4. Pre-flight check before destructive action
5. Confirm explicit với stakeholder (đặc biệt prod)
6. Execute với reversibility in mind
7. MEASURE actual impact (CloudWatch metrics, không chỉ Cost Explorer)
8. Build verification command để re-check trong tương lai
9. Document findings + remaining TODOs in source repo
1. Get cost trend (Cost Explorer) → identify magnitude
2. Group-by service → find Pareto concentration
3. Drill usage_type/resource_id → find specific contributor
4. Pre-flight check before destructive action
5. Confirm explicit với stakeholder (đặc biệt prod)
6. Execute với reversibility in mind
7. MEASURE actual impact (CloudWatch metrics, không chỉ Cost Explorer)
8. Build verification command để re-check trong tương lai
9. Document findings + remaining TODOs in source repo - Gateway Endpoint chỉ free cho S3 và DynamoDB. Các service khác (ECR API, STS, SSM, KMS...) phải dùng Interface Endpoint ($0.01/giờ/AZ + $0.01/GB) → tính lại ROI cho từng case.
- Nếu app dùng S3 cross-region (vd VPC ap-southeast-1, bucket us-east-1), Gateway Endpoint không apply — vẫn qua NAT.
- ECR Pull Through Cache (feature mới) thay đổi pattern này — layers cache local trong region, giảm cross-region pull nhưng không thay được nhu cầu cho S3 endpoint. - AMI build trước 2017 rất khả năng thiếu ENA. Amazon Linux 2 / Ubuntu 18.04+ thường OK nhưng vẫn nên check.
- Custom AMI nội bộ là rủi ro cao nhất — đặc biệt nếu base từ snapshot cũ.
- ENA có thể modprobe runtime, nhưng nếu init không có driver, network sẽ down trong giai đoạn boot — SSH vào fix là không khả thi. - Estimate để rank ưu tiên (việc nào làm trước)
- Đo CloudWatch/Cost Explorer để claim ROI với stakeholder - Service breakdown — Cost Explorer → Group by Service. Top 3 service thường > 80% bill.
- usage_type breakdown trong service — vd EC2 có BoxUsage:t3.medium, EBS:VolumeUsage.gp3, DataTransfer-Out-Bytes. Cost Explorer cho group by usage_type → thấy cụ thể chi phí đến từ compute hay storage hay transfer.
- resource_id breakdown trong usage_type — cần tagging strategy tốt (Project, Environment, Owner tags), hoặc dùng Cost & Usage Report (CUR) export → query Athena. - Reserved Instance / Savings Plan discount apply theo proportional logic, có thể distort breakdown — phải look at "unblended cost" nếu muốn thấy real consumption.
- Cross-account org thì Pareto apply ở account level trước (account nào dominant), rồi xuống service trong account đó.
- CUR query qua Athena là endgame — nếu audit > 5 lần/năm, đáng setup. - ES master helm uninstall trước → ES pod deleted, Service elasticsearch-master deleted
- Kibana helm uninstall chạy → trigger post-delete hook
- Hook job pod start, gọi elasticsearch-master.namespace.svc.cluster.local
- K8s DNS không resolve được (Service đã chết) → connection refused
- Hook retry vài lần → fail → kibana stuck "uninstalling" forever - App trước Database
- Cache trước Storage backend
- Sidecar trước main service
- Webhook controller trước CRD nó depend - Operator (CRD-based) thì khác — uninstall Operator trước CRD instance → orphan custom resource. Phải uninstall CRD instance trước, Operator sau.
- helm uninstall --wait flag confirm tất cả resource deleted trước khi return — chậm hơn nhưng catch hang state sớm.
- Production: nên có runbook cho từng helm release stack ghi rõ uninstall order. Đừng assume team member nhớ. - Audit có hệ thống (top-down, Pareto)
- Đo thực tế (CloudWatch, Cost Explorer)
- Action reversible (thử + đo > paralysis-by-analysis)
- Pre-flight check trước destructive action