CloudFront → ALB → Fargate (web ×2 tasks, worker ×1) ↓ Aurora Serverless v2 (writer) ElastiCache (Redis, t4g.small ×2) NAT ×2 (multi-AZ) VPC + interface endpoints WAF (managed rule sets)
CloudFront → ALB → Fargate (web ×2 tasks, worker ×1) ↓ Aurora Serverless v2 (writer) ElastiCache (Redis, t4g.small ×2) NAT ×2 (multi-AZ) VPC + interface endpoints WAF (managed rule sets)
CloudFront → ALB → Fargate (web ×2 tasks, worker ×1) ↓ Aurora Serverless v2 (writer) ElastiCache (Redis, t4g.small ×2) NAT ×2 (multi-AZ) VPC + interface endpoints WAF (managed rule sets)
// Aurora: enable auto-pause when idle
const cfnCluster = cluster.node.defaultChild as rds.CfnDBCluster;
cfnCluster.serverlessV2ScalingConfiguration = { minCapacity: 0, // was 0.5 — auto-pause after 5 min idle maxCapacity: 2, // was 4 secondsUntilAutoPause: 300,
}; // Network: 1 NAT instead of 2
natGateways: 1, // was 2 (multi-AZ) // Web: smaller, fewer tasks, autoscale up if needed
desiredCount: 1, // was 2
cpu: 512, // was 1024
memoryLimitMiB: 1024, // was 2048 // Worker on Fargate Spot
capacityProviderStrategies: [ { capacityProvider: "FARGATE_SPOT", weight: 4 }, { capacityProvider: "FARGATE", weight: 1 },
], // Container Insights off
containerInsightsV2: ecs.ContainerInsights.DISABLED, // Backup retention
backup: { retention: cdk.Duration.days(1) }, // was 14 // WAF: removed entirely (CloudFront has free Shield Standard)
// Aurora: enable auto-pause when idle
const cfnCluster = cluster.node.defaultChild as rds.CfnDBCluster;
cfnCluster.serverlessV2ScalingConfiguration = { minCapacity: 0, // was 0.5 — auto-pause after 5 min idle maxCapacity: 2, // was 4 secondsUntilAutoPause: 300,
}; // Network: 1 NAT instead of 2
natGateways: 1, // was 2 (multi-AZ) // Web: smaller, fewer tasks, autoscale up if needed
desiredCount: 1, // was 2
cpu: 512, // was 1024
memoryLimitMiB: 1024, // was 2048 // Worker on Fargate Spot
capacityProviderStrategies: [ { capacityProvider: "FARGATE_SPOT", weight: 4 }, { capacityProvider: "FARGATE", weight: 1 },
], // Container Insights off
containerInsightsV2: ecs.ContainerInsights.DISABLED, // Backup retention
backup: { retention: cdk.Duration.days(1) }, // was 14 // WAF: removed entirely (CloudFront has free Shield Standard)
// Aurora: enable auto-pause when idle
const cfnCluster = cluster.node.defaultChild as rds.CfnDBCluster;
cfnCluster.serverlessV2ScalingConfiguration = { minCapacity: 0, // was 0.5 — auto-pause after 5 min idle maxCapacity: 2, // was 4 secondsUntilAutoPause: 300,
}; // Network: 1 NAT instead of 2
natGateways: 1, // was 2 (multi-AZ) // Web: smaller, fewer tasks, autoscale up if needed
desiredCount: 1, // was 2
cpu: 512, // was 1024
memoryLimitMiB: 1024, // was 2048 // Worker on Fargate Spot
capacityProviderStrategies: [ { capacityProvider: "FARGATE_SPOT", weight: 4 }, { capacityProvider: "FARGATE", weight: 1 },
], // Container Insights off
containerInsightsV2: ecs.ContainerInsights.DISABLED, // Backup retention
backup: { retention: cdk.Duration.days(1) }, // was 14 // WAF: removed entirely (CloudFront has free Shield Standard)
services: postgres: image: postgres:16-alpine volumes: [./data/postgres:/var/lib/postgresql/data] deploy: { resources: { limits: { memory: 512M } } } redis: image: redis:7-alpine command: redis-server --appendonly yes --maxmemory 128mb --maxmemory-policy noeviction volumes: [./data/redis:/data] deploy: { resources: { limits: { memory: 192M } } } web: image: tm-web:latest ports: ["127.0.0.1:3000:3000"] env_file: .env depends_on: postgres: { condition: service_healthy } redis: { condition: service_healthy } deploy: { resources: { limits: { memory: 768M } } } worker: image: tm-worker:latest env_file: .env depends_on: postgres: { condition: service_healthy } redis: { condition: service_healthy } deploy: { resources: { limits: { memory: 384M } } }
services: postgres: image: postgres:16-alpine volumes: [./data/postgres:/var/lib/postgresql/data] deploy: { resources: { limits: { memory: 512M } } } redis: image: redis:7-alpine command: redis-server --appendonly yes --maxmemory 128mb --maxmemory-policy noeviction volumes: [./data/redis:/data] deploy: { resources: { limits: { memory: 192M } } } web: image: tm-web:latest ports: ["127.0.0.1:3000:3000"] env_file: .env depends_on: postgres: { condition: service_healthy } redis: { condition: service_healthy } deploy: { resources: { limits: { memory: 768M } } } worker: image: tm-worker:latest env_file: .env depends_on: postgres: { condition: service_healthy } redis: { condition: service_healthy } deploy: { resources: { limits: { memory: 384M } } }
services: postgres: image: postgres:16-alpine volumes: [./data/postgres:/var/lib/postgresql/data] deploy: { resources: { limits: { memory: 512M } } } redis: image: redis:7-alpine command: redis-server --appendonly yes --maxmemory 128mb --maxmemory-policy noeviction volumes: [./data/redis:/data] deploy: { resources: { limits: { memory: 192M } } } web: image: tm-web:latest ports: ["127.0.0.1:3000:3000"] env_file: .env depends_on: postgres: { condition: service_healthy } redis: { condition: service_healthy } deploy: { resources: { limits: { memory: 768M } } } worker: image: tm-worker:latest env_file: .env depends_on: postgres: { condition: service_healthy } redis: { condition: service_healthy } deploy: { resources: { limits: { memory: 384M } } }
toolmango.com, www.toolmango.com { reverse_proxy 127.0.0.1:3000 encode gzip zstd header { Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" X-Content-Type-Options "nosniff" }
}
toolmango.com, www.toolmango.com { reverse_proxy 127.0.0.1:3000 encode gzip zstd header { Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" X-Content-Type-Options "nosniff" }
}
toolmango.com, www.toolmango.com { reverse_proxy 127.0.0.1:3000 encode gzip zstd header { Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" X-Content-Type-Options "nosniff" }
}
aws ecs run-task \ --cluster tm-prod-compute \ --task-definition tm-prod-pgdump \ --launch-type FARGATE \ --network-configuration 'awsvpcConfiguration={subnets=[subnet-...],securityGroups=[sg-...],assignPublicIp=DISABLED}'
aws ecs run-task \ --cluster tm-prod-compute \ --task-definition tm-prod-pgdump \ --launch-type FARGATE \ --network-configuration 'awsvpcConfiguration={subnets=[subnet-...],securityGroups=[sg-...],assignPublicIp=DISABLED}'
aws ecs run-task \ --cluster tm-prod-compute \ --task-definition tm-prod-pgdump \ --launch-type FARGATE \ --network-configuration 'awsvpcConfiguration={subnets=[subnet-...],securityGroups=[sg-...],assignPublicIp=DISABLED}'
pg_dump --no-owner --no-acl --clean --if-exists -h $DB_HOST -U $DB_USER -d toolmango \ | gzip > /tmp/dump.sql.gz \ && aws s3 cp /tmp/dump.sql.gz s3://tm-prod-assets/migration/dump.sql.gz
pg_dump --no-owner --no-acl --clean --if-exists -h $DB_HOST -U $DB_USER -d toolmango \ | gzip > /tmp/dump.sql.gz \ && aws s3 cp /tmp/dump.sql.gz s3://tm-prod-assets/migration/dump.sql.gz
pg_dump --no-owner --no-acl --clean --if-exists -h $DB_HOST -U $DB_USER -d toolmango \ | gzip > /tmp/dump.sql.gz \ && aws s3 cp /tmp/dump.sql.gz s3://tm-prod-assets/migration/dump.sql.gz
gunzip -c /tmp/dump.sql.gz | docker compose exec -T postgres psql -U tmadmin -d toolmango
gunzip -c /tmp/dump.sql.gz | docker compose exec -T postgres psql -U tmadmin -d toolmango
gunzip -c /tmp/dump.sql.gz | docker compose exec -T postgres psql -U tmadmin -d toolmango
docker build --network=host -f Dockerfile.web -t tm-web:latest \ --build-arg NEXT_PUBLIC_SITE_URL=https://toolmango.com \ --build-arg NEXT_PUBLIC_PLAUSIBLE_DOMAIN=toolmango.com \ .
docker build --network=host -f Dockerfile.web -t tm-web:latest \ --build-arg NEXT_PUBLIC_SITE_URL=https://toolmango.com \ --build-arg NEXT_PUBLIC_PLAUSIBLE_DOMAIN=toolmango.com \ .
docker build --network=host -f Dockerfile.web -t tm-web:latest \ --build-arg NEXT_PUBLIC_SITE_URL=https://toolmango.com \ --build-arg NEXT_PUBLIC_PLAUSIBLE_DOMAIN=toolmango.com \ .
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile
echo "/swapfile none swap sw 0 0" | sudo tee -a /etc/fstab
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile
echo "/swapfile none swap sw 0 0" | sudo tee -a /etc/fstab
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile
echo "/swapfile none swap sw 0 0" | sudo tee -a /etc/fstab
# Take final Aurora snapshot first (safety rollback)
aws rds create-db-cluster-snapshot --db-cluster-identifier ... --db-cluster-snapshot-identifier tm-prod-final-... # Disable deletion protection
aws rds modify-db-cluster --no-deletion-protection ... # Delete Aurora cluster + writer
aws rds delete-db-instance --skip-final-snapshot ...
aws rds delete-db-cluster --skip-final-snapshot ... # CDK destroy stacks in reverse dependency order
cdk destroy tm-prod-edge --force # CloudFront, WAF
cdk destroy tm-prod-compute --force # Fargate, ALB, ECS cluster
cdk destroy tm-prod-data --force # ElastiCache (S3 retains via RemovalPolicy.RETAIN)
cdk destroy tm-prod-network --force # VPC, NAT, subnets
# Take final Aurora snapshot first (safety rollback)
aws rds create-db-cluster-snapshot --db-cluster-identifier ... --db-cluster-snapshot-identifier tm-prod-final-... # Disable deletion protection
aws rds modify-db-cluster --no-deletion-protection ... # Delete Aurora cluster + writer
aws rds delete-db-instance --skip-final-snapshot ...
aws rds delete-db-cluster --skip-final-snapshot ... # CDK destroy stacks in reverse dependency order
cdk destroy tm-prod-edge --force # CloudFront, WAF
cdk destroy tm-prod-compute --force # Fargate, ALB, ECS cluster
cdk destroy tm-prod-data --force # ElastiCache (S3 retains via RemovalPolicy.RETAIN)
cdk destroy tm-prod-network --force # VPC, NAT, subnets
# Take final Aurora snapshot first (safety rollback)
aws rds create-db-cluster-snapshot --db-cluster-identifier ... --db-cluster-snapshot-identifier tm-prod-final-... # Disable deletion protection
aws rds modify-db-cluster --no-deletion-protection ... # Delete Aurora cluster + writer
aws rds delete-db-instance --skip-final-snapshot ...
aws rds delete-db-cluster --skip-final-snapshot ... # CDK destroy stacks in reverse dependency order
cdk destroy tm-prod-edge --force # CloudFront, WAF
cdk destroy tm-prod-compute --force # Fargate, ALB, ECS cluster
cdk destroy tm-prod-data --force # ElastiCache (S3 retains via RemovalPolicy.RETAIN)
cdk destroy tm-prod-network --force # VPC, NAT, subnets
- name: Rsync source to Lightsail run: | rsync -az --delete --exclude='node_modules' --exclude='.next' --exclude='.git' \ -e "ssh -i ~/.ssh/id_ed25519" \ ./ ${{ secrets.LIGHTSAIL_USER }}@${{ secrets.LIGHTSAIL_HOST }}:/home/ubuntu/toolmango/src/ - name: Build images on Lightsail run: | ssh ... 'cd /home/ubuntu/toolmango/src && \ sg docker -c "docker build -f Dockerfile.web -t tm-web:latest ." && \ sg docker -c "docker build -f Dockerfile.worker -t tm-worker:latest ."' - name: Run prisma migrate + restart services run: | ssh ... 'cd /home/ubuntu/toolmango && \ sg docker -c "docker compose run --rm --no-deps web npx prisma migrate deploy" && \ sg docker -c "docker compose up -d --force-recreate web worker"' - name: Smoke test run: | for i in {1..6}; do [ "$(curl -s -o /dev/null -w '%{http_code}' https://toolmango.com/api/healthz)" = "200" ] && exit 0 sleep 5 done exit 1
- name: Rsync source to Lightsail run: | rsync -az --delete --exclude='node_modules' --exclude='.next' --exclude='.git' \ -e "ssh -i ~/.ssh/id_ed25519" \ ./ ${{ secrets.LIGHTSAIL_USER }}@${{ secrets.LIGHTSAIL_HOST }}:/home/ubuntu/toolmango/src/ - name: Build images on Lightsail run: | ssh ... 'cd /home/ubuntu/toolmango/src && \ sg docker -c "docker build -f Dockerfile.web -t tm-web:latest ." && \ sg docker -c "docker build -f Dockerfile.worker -t tm-worker:latest ."' - name: Run prisma migrate + restart services run: | ssh ... 'cd /home/ubuntu/toolmango && \ sg docker -c "docker compose run --rm --no-deps web npx prisma migrate deploy" && \ sg docker -c "docker compose up -d --force-recreate web worker"' - name: Smoke test run: | for i in {1..6}; do [ "$(curl -s -o /dev/null -w '%{http_code}' https://toolmango.com/api/healthz)" = "200" ] && exit 0 sleep 5 done exit 1
- name: Rsync source to Lightsail run: | rsync -az --delete --exclude='node_modules' --exclude='.next' --exclude='.git' \ -e "ssh -i ~/.ssh/id_ed25519" \ ./ ${{ secrets.LIGHTSAIL_USER }}@${{ secrets.LIGHTSAIL_HOST }}:/home/ubuntu/toolmango/src/ - name: Build images on Lightsail run: | ssh ... 'cd /home/ubuntu/toolmango/src && \ sg docker -c "docker build -f Dockerfile.web -t tm-web:latest ." && \ sg docker -c "docker build -f Dockerfile.worker -t tm-worker:latest ."' - name: Run prisma migrate + restart services run: | ssh ... 'cd /home/ubuntu/toolmango && \ sg docker -c "docker compose run --rm --no-deps web npx prisma migrate deploy" && \ sg docker -c "docker compose up -d --force-recreate web worker"' - name: Smoke test run: | for i in {1..6}; do [ "$(curl -s -o /dev/null -w '%{http_code}' https://toolmango.com/api/healthz)" = "200" ] && exit 0 sleep 5 done exit 1 - Next.js 14 App Router
- Postgres 16
- Redis (BullMQ for the agent job queue)
- Anthropic Claude Sonnet for editorial agents (research, SEO sweep, social drafts)
- A worker process running 5 cron schedules - No multi-AZ HA. Single VM = single point of failure. AZ-level outage means downtime.
- No Aurora point-in-time restore. Just nightly pg_dump to S3. RPO is up to 24h. Acceptable for a content site, not for transactional data.
- No autoscaling. Vertical only — bump to a bigger Lightsail bundle if traffic grows. The next tier is $24/mo for 4GB / 80GB. Past that, $44/mo for 8GB. At those numbers you should rethink Lightsail vs going back to managed services.
- Manual ops. No service auto-restart on host failure. If the VM dies, I get notified by uptime monitor and SSH in. That's the trade. - Skip Fargate entirely for pre-revenue projects. Start on Lightsail. The migration was 4 hours; if I'd started there, that's 4 hours and ~$700 of avoided bills (the 6 days I was on Fargate).
- Don't enable Container Insights "just because." It's $5-15/mo and you'll never look at it on a small project.
- Don't let CDK enable WAF by default. WAF is real money ($12-15/mo) for a pre-revenue site that's not under attack. CloudFront's free Shield Standard is enough.
- Don't pre-provision multi-AZ NAT. Single NAT is fine until you have customers.
- Use Aurora minCapacity: 0 from day 1. The auto-pause feature added in 2024 makes Aurora Serverless v2 actually serverless. Most CDK examples still default to 0.5. - Sustained traffic > 200 req/sec (single VM saturates)
- Need for multi-AZ HA (revenue at risk from single AZ outage)
- DB > 20 GB (Postgres on local SSD becomes risky for backup/recovery)
- Compliance requirement (SOC 2 etc.)