Tools: Running Redis 24/7? You're Leaving 40% on the Table Without Reserved Nodes πŸ”₯

Tools: Running Redis 24/7? You're Leaving 40% on the Table Without Reserved Nodes πŸ”₯

Source: Dev.to

If your ElastiCache Redis or Memcached runs around the clock, you're overpaying by 40%. Here's how to automate Reserved Node purchases and tracking with Terraform. ## πŸ’Έ The On-Demand Tax ## πŸ€” When Should You Reserve? ## πŸ—οΈ Terraform Implementation ## Step 1: Deploy Your ElastiCache Cluster ## Step 2: Purchase Reserved Nodes with Terraform ## Step 3: Payment Options Compared ## πŸ“Š Automated RI Coverage Monitoring ## ⚑ Quick Audit: Are You Wasting Money Right Now? ## 🎯 Implementation Checklist ## πŸ’‘ Pro Tips ## πŸ“Š TL;DR Here's a painful truth: If your ElastiCache cluster has been running for more than a month, you've already overpaid. Most teams deploy Redis or Memcached, set it, forget it β€” and never think about reserved pricing. Here's what a typical ElastiCache setup costs on-demand: That's 40% savings (1-year) or 60% savings (3-year) β€” for the exact same cluster doing the exact same thing. πŸ’° Scale that across a real environment: A 3-node cluster with replicas saves $4,440/year with 1-year RIs. No changes to your application. Zero downtime. Just cheaper. βœ… The break-even point for a 1-year No Upfront RI is roughly 7-8 months. So if your cluster has been running for 8+ months and you haven't reserved β€” you're burning money. ⚠️ Important: Running terraform apply on reserved node resources commits you to a purchase. There's no undo. Always run terraform plan first and review carefully. My recommendation: Start with No Upfront 1-Year. You get most of the savings with maximum flexibility. Graduate to Partial/All Upfront once you're confident in your setup. 🎯 Don't let reservations expire silently. This Lambda checks coverage and alerts you: You'll get an email alert whenever nodes are unreserved or reservations are about to expire. No more surprise bills. πŸ“¬ Run this CLI command to check your current RI coverage: If the first table has more nodes than the second β€” you're overpaying. 🚨 Bottom line: If your Redis has been running for 8+ months and you haven't reserved, you're throwing away 40% of that bill. Fix it today. ⚑ Running ElastiCache without Reserved Nodes is like paying rent monthly when the landlord offers 40% off for signing a lease. Same apartment, just cheaper. 🏠 Found this helpful? Follow for more AWS cost optimization with Terraform! πŸ’¬ Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: cache.r7g.large (Redis, Multi-AZ) On-Demand: $0.252/hour Γ— 730 hours = $184/month 1-Year RI: $0.150/hour Γ— 730 hours = $110/month 3-Year RI: $0.102/hour Γ— 730 hours = $74/month Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: cache.r7g.large (Redis, Multi-AZ) On-Demand: $0.252/hour Γ— 730 hours = $184/month 1-Year RI: $0.150/hour Γ— 730 hours = $110/month 3-Year RI: $0.102/hour Γ— 730 hours = $74/month CODE_BLOCK: cache.r7g.large (Redis, Multi-AZ) On-Demand: $0.252/hour Γ— 730 hours = $184/month 1-Year RI: $0.150/hour Γ— 730 hours = $110/month 3-Year RI: $0.102/hour Γ— 730 hours = $74/month COMMAND_BLOCK: # modules/elasticache/main.tf variable "environment" { type = string } variable "node_type" { type = string default = "cache.r7g.large" } variable "num_cache_clusters" { type = number default = 3 } resource "aws_elasticache_replication_group" "redis" { replication_group_id = "${var.environment}-redis" description = "${var.environment} Redis cluster" node_type = var.node_type num_cache_clusters = var.num_cache_clusters engine = "redis" engine_version = "7.1" port = 6379 parameter_group_name = "default.redis7" # Multi-AZ for production automatic_failover_enabled = var.environment == "prod" multi_az_enabled = var.environment == "prod" # Encryption at_rest_encryption_enabled = true transit_encryption_enabled = true # Maintenance maintenance_window = "sun:05:00-sun:07:00" snapshot_retention_limit = var.environment == "prod" ? 7 : 0 snapshot_window = "03:00-05:00" tags = { Environment = var.environment ManagedBy = "terraform" ReserveReady = "true" # πŸ‘ˆ Tag for RI tracking } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # modules/elasticache/main.tf variable "environment" { type = string } variable "node_type" { type = string default = "cache.r7g.large" } variable "num_cache_clusters" { type = number default = 3 } resource "aws_elasticache_replication_group" "redis" { replication_group_id = "${var.environment}-redis" description = "${var.environment} Redis cluster" node_type = var.node_type num_cache_clusters = var.num_cache_clusters engine = "redis" engine_version = "7.1" port = 6379 parameter_group_name = "default.redis7" # Multi-AZ for production automatic_failover_enabled = var.environment == "prod" multi_az_enabled = var.environment == "prod" # Encryption at_rest_encryption_enabled = true transit_encryption_enabled = true # Maintenance maintenance_window = "sun:05:00-sun:07:00" snapshot_retention_limit = var.environment == "prod" ? 7 : 0 snapshot_window = "03:00-05:00" tags = { Environment = var.environment ManagedBy = "terraform" ReserveReady = "true" # πŸ‘ˆ Tag for RI tracking } } COMMAND_BLOCK: # modules/elasticache/main.tf variable "environment" { type = string } variable "node_type" { type = string default = "cache.r7g.large" } variable "num_cache_clusters" { type = number default = 3 } resource "aws_elasticache_replication_group" "redis" { replication_group_id = "${var.environment}-redis" description = "${var.environment} Redis cluster" node_type = var.node_type num_cache_clusters = var.num_cache_clusters engine = "redis" engine_version = "7.1" port = 6379 parameter_group_name = "default.redis7" # Multi-AZ for production automatic_failover_enabled = var.environment == "prod" multi_az_enabled = var.environment == "prod" # Encryption at_rest_encryption_enabled = true transit_encryption_enabled = true # Maintenance maintenance_window = "sun:05:00-sun:07:00" snapshot_retention_limit = var.environment == "prod" ? 7 : 0 snapshot_window = "03:00-05:00" tags = { Environment = var.environment ManagedBy = "terraform" ReserveReady = "true" # πŸ‘ˆ Tag for RI tracking } } COMMAND_BLOCK: # reserved-instances/elasticache.tf resource "aws_elasticache_reserved_cache_node" "redis_prod" { reserved_cache_nodes_offering_id = data.aws_elasticache_reserved_cache_node_offering.redis.offering_id cache_node_count = 3 # Match your cluster size } data "aws_elasticache_reserved_cache_node_offering" "redis" { cache_node_type = "cache.r7g.large" duration = "P1Y" # 1 year (P3Y for 3-year) offering_type = "No Upfront" # or "Partial Upfront", "All Upfront" product_description = "redis" } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # reserved-instances/elasticache.tf resource "aws_elasticache_reserved_cache_node" "redis_prod" { reserved_cache_nodes_offering_id = data.aws_elasticache_reserved_cache_node_offering.redis.offering_id cache_node_count = 3 # Match your cluster size } data "aws_elasticache_reserved_cache_node_offering" "redis" { cache_node_type = "cache.r7g.large" duration = "P1Y" # 1 year (P3Y for 3-year) offering_type = "No Upfront" # or "Partial Upfront", "All Upfront" product_description = "redis" } COMMAND_BLOCK: # reserved-instances/elasticache.tf resource "aws_elasticache_reserved_cache_node" "redis_prod" { reserved_cache_nodes_offering_id = data.aws_elasticache_reserved_cache_node_offering.redis.offering_id cache_node_count = 3 # Match your cluster size } data "aws_elasticache_reserved_cache_node_offering" "redis" { cache_node_type = "cache.r7g.large" duration = "P1Y" # 1 year (P3Y for 3-year) offering_type = "No Upfront" # or "Partial Upfront", "All Upfront" product_description = "redis" } COMMAND_BLOCK: # Option A: No Upfront (most flexible, least savings) # Pay monthly, cancel-proof but still committed for term offering_type = "No Upfront" # Savings: ~33-36% # Option B: Partial Upfront (balanced) # Pay some upfront + reduced monthly offering_type = "Partial Upfront" # Savings: ~38-41% # Option C: All Upfront (maximum savings) # Pay everything upfront, nothing monthly offering_type = "All Upfront" # Savings: ~40-44% Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Option A: No Upfront (most flexible, least savings) # Pay monthly, cancel-proof but still committed for term offering_type = "No Upfront" # Savings: ~33-36% # Option B: Partial Upfront (balanced) # Pay some upfront + reduced monthly offering_type = "Partial Upfront" # Savings: ~38-41% # Option C: All Upfront (maximum savings) # Pay everything upfront, nothing monthly offering_type = "All Upfront" # Savings: ~40-44% COMMAND_BLOCK: # Option A: No Upfront (most flexible, least savings) # Pay monthly, cancel-proof but still committed for term offering_type = "No Upfront" # Savings: ~33-36% # Option B: Partial Upfront (balanced) # Pay some upfront + reduced monthly offering_type = "Partial Upfront" # Savings: ~38-41% # Option C: All Upfront (maximum savings) # Pay everything upfront, nothing monthly offering_type = "All Upfront" # Savings: ~40-44% COMMAND_BLOCK: # monitoring/ri-coverage.tf resource "aws_lambda_function" "ri_monitor" { filename = data.archive_file.ri_monitor.output_path function_name = "elasticache-ri-monitor" role = aws_iam_role.ri_monitor.arn handler = "index.handler" runtime = "python3.12" timeout = 30 source_code_hash = data.archive_file.ri_monitor.output_base64sha256 environment { variables = { SNS_TOPIC_ARN = aws_sns_topic.cost_alerts.arn } } } data "archive_file" "ri_monitor" { type = "zip" output_path = "${path.module}/ri_monitor.zip" source { content = <<-PYTHON import boto3 import os from datetime import datetime, timedelta def handler(event, context): ec = boto3.client('elasticache') sns = boto3.client('sns') # Get all running nodes clusters = ec.describe_cache_clusters()['CacheClusters'] running_nodes = {} for c in clusters: key = f"{c['CacheNodeType']}|{c['Engine']}" running_nodes[key] = running_nodes.get(key, 0) + c['NumCacheNodes'] # Get active reservations reservations = ec.describe_reserved_cache_nodes()['ReservedCacheNodes'] reserved = {} expiring_soon = [] for r in reservations: if r['State'] == 'active': key = f"{r['CacheNodeType']}|{r['ProductDescription']}" reserved[key] = reserved.get(key, 0) + r['CacheNodeCount'] # Check if expiring within 30 days end_time = r['StartTime'] + timedelta(seconds=r['Duration']) if end_time - datetime.now(end_time.tzinfo) < timedelta(days=30): expiring_soon.append({ 'id': r['ReservedCacheNodeId'], 'type': r['CacheNodeType'], 'expires': end_time.strftime('%Y-%m-%d') }) # Find unreserved nodes unreserved = [] for key, count in running_nodes.items(): reserved_count = reserved.get(key, 0) if count > reserved_count: node_type, engine = key.split('|') unreserved.append( f" {node_type} ({engine}): " f"{count - reserved_count} unreserved of {count} total" ) # Build alert alerts = [] if unreserved: alerts.append("UNRESERVED NODES (wasting money!):\n" + "\n".join(unreserved)) if expiring_soon: alerts.append("EXPIRING WITHIN 30 DAYS:\n" + "\n".join( f" {e['id']} ({e['type']}) expires {e['expires']}" for e in expiring_soon )) if alerts: sns.publish( TopicArn=os.environ['SNS_TOPIC_ARN'], Subject='ElastiCache RI Coverage Alert', Message="\n\n".join(alerts) ) return {'unreserved': len(unreserved), 'expiring': len(expiring_soon)} PYTHON filename = "index.py" } } # Run weekly resource "aws_cloudwatch_event_rule" "weekly_ri_check" { name = "elasticache-ri-check" schedule_expression = "rate(7 days)" } resource "aws_cloudwatch_event_target" "ri_monitor" { rule = aws_cloudwatch_event_rule.weekly_ri_check.name arn = aws_lambda_function.ri_monitor.arn } resource "aws_lambda_permission" "allow_eventbridge" { action = "lambda:InvokeFunction" function_name = aws_lambda_function.ri_monitor.function_name principal = "events.amazonaws.com" source_arn = aws_cloudwatch_event_rule.weekly_ri_check.arn } resource "aws_sns_topic" "cost_alerts" { name = "elasticache-cost-alerts" } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # monitoring/ri-coverage.tf resource "aws_lambda_function" "ri_monitor" { filename = data.archive_file.ri_monitor.output_path function_name = "elasticache-ri-monitor" role = aws_iam_role.ri_monitor.arn handler = "index.handler" runtime = "python3.12" timeout = 30 source_code_hash = data.archive_file.ri_monitor.output_base64sha256 environment { variables = { SNS_TOPIC_ARN = aws_sns_topic.cost_alerts.arn } } } data "archive_file" "ri_monitor" { type = "zip" output_path = "${path.module}/ri_monitor.zip" source { content = <<-PYTHON import boto3 import os from datetime import datetime, timedelta def handler(event, context): ec = boto3.client('elasticache') sns = boto3.client('sns') # Get all running nodes clusters = ec.describe_cache_clusters()['CacheClusters'] running_nodes = {} for c in clusters: key = f"{c['CacheNodeType']}|{c['Engine']}" running_nodes[key] = running_nodes.get(key, 0) + c['NumCacheNodes'] # Get active reservations reservations = ec.describe_reserved_cache_nodes()['ReservedCacheNodes'] reserved = {} expiring_soon = [] for r in reservations: if r['State'] == 'active': key = f"{r['CacheNodeType']}|{r['ProductDescription']}" reserved[key] = reserved.get(key, 0) + r['CacheNodeCount'] # Check if expiring within 30 days end_time = r['StartTime'] + timedelta(seconds=r['Duration']) if end_time - datetime.now(end_time.tzinfo) < timedelta(days=30): expiring_soon.append({ 'id': r['ReservedCacheNodeId'], 'type': r['CacheNodeType'], 'expires': end_time.strftime('%Y-%m-%d') }) # Find unreserved nodes unreserved = [] for key, count in running_nodes.items(): reserved_count = reserved.get(key, 0) if count > reserved_count: node_type, engine = key.split('|') unreserved.append( f" {node_type} ({engine}): " f"{count - reserved_count} unreserved of {count} total" ) # Build alert alerts = [] if unreserved: alerts.append("UNRESERVED NODES (wasting money!):\n" + "\n".join(unreserved)) if expiring_soon: alerts.append("EXPIRING WITHIN 30 DAYS:\n" + "\n".join( f" {e['id']} ({e['type']}) expires {e['expires']}" for e in expiring_soon )) if alerts: sns.publish( TopicArn=os.environ['SNS_TOPIC_ARN'], Subject='ElastiCache RI Coverage Alert', Message="\n\n".join(alerts) ) return {'unreserved': len(unreserved), 'expiring': len(expiring_soon)} PYTHON filename = "index.py" } } # Run weekly resource "aws_cloudwatch_event_rule" "weekly_ri_check" { name = "elasticache-ri-check" schedule_expression = "rate(7 days)" } resource "aws_cloudwatch_event_target" "ri_monitor" { rule = aws_cloudwatch_event_rule.weekly_ri_check.name arn = aws_lambda_function.ri_monitor.arn } resource "aws_lambda_permission" "allow_eventbridge" { action = "lambda:InvokeFunction" function_name = aws_lambda_function.ri_monitor.function_name principal = "events.amazonaws.com" source_arn = aws_cloudwatch_event_rule.weekly_ri_check.arn } resource "aws_sns_topic" "cost_alerts" { name = "elasticache-cost-alerts" } COMMAND_BLOCK: # monitoring/ri-coverage.tf resource "aws_lambda_function" "ri_monitor" { filename = data.archive_file.ri_monitor.output_path function_name = "elasticache-ri-monitor" role = aws_iam_role.ri_monitor.arn handler = "index.handler" runtime = "python3.12" timeout = 30 source_code_hash = data.archive_file.ri_monitor.output_base64sha256 environment { variables = { SNS_TOPIC_ARN = aws_sns_topic.cost_alerts.arn } } } data "archive_file" "ri_monitor" { type = "zip" output_path = "${path.module}/ri_monitor.zip" source { content = <<-PYTHON import boto3 import os from datetime import datetime, timedelta def handler(event, context): ec = boto3.client('elasticache') sns = boto3.client('sns') # Get all running nodes clusters = ec.describe_cache_clusters()['CacheClusters'] running_nodes = {} for c in clusters: key = f"{c['CacheNodeType']}|{c['Engine']}" running_nodes[key] = running_nodes.get(key, 0) + c['NumCacheNodes'] # Get active reservations reservations = ec.describe_reserved_cache_nodes()['ReservedCacheNodes'] reserved = {} expiring_soon = [] for r in reservations: if r['State'] == 'active': key = f"{r['CacheNodeType']}|{r['ProductDescription']}" reserved[key] = reserved.get(key, 0) + r['CacheNodeCount'] # Check if expiring within 30 days end_time = r['StartTime'] + timedelta(seconds=r['Duration']) if end_time - datetime.now(end_time.tzinfo) < timedelta(days=30): expiring_soon.append({ 'id': r['ReservedCacheNodeId'], 'type': r['CacheNodeType'], 'expires': end_time.strftime('%Y-%m-%d') }) # Find unreserved nodes unreserved = [] for key, count in running_nodes.items(): reserved_count = reserved.get(key, 0) if count > reserved_count: node_type, engine = key.split('|') unreserved.append( f" {node_type} ({engine}): " f"{count - reserved_count} unreserved of {count} total" ) # Build alert alerts = [] if unreserved: alerts.append("UNRESERVED NODES (wasting money!):\n" + "\n".join(unreserved)) if expiring_soon: alerts.append("EXPIRING WITHIN 30 DAYS:\n" + "\n".join( f" {e['id']} ({e['type']}) expires {e['expires']}" for e in expiring_soon )) if alerts: sns.publish( TopicArn=os.environ['SNS_TOPIC_ARN'], Subject='ElastiCache RI Coverage Alert', Message="\n\n".join(alerts) ) return {'unreserved': len(unreserved), 'expiring': len(expiring_soon)} PYTHON filename = "index.py" } } # Run weekly resource "aws_cloudwatch_event_rule" "weekly_ri_check" { name = "elasticache-ri-check" schedule_expression = "rate(7 days)" } resource "aws_cloudwatch_event_target" "ri_monitor" { rule = aws_cloudwatch_event_rule.weekly_ri_check.name arn = aws_lambda_function.ri_monitor.arn } resource "aws_lambda_permission" "allow_eventbridge" { action = "lambda:InvokeFunction" function_name = aws_lambda_function.ri_monitor.function_name principal = "events.amazonaws.com" source_arn = aws_cloudwatch_event_rule.weekly_ri_check.arn } resource "aws_sns_topic" "cost_alerts" { name = "elasticache-cost-alerts" } COMMAND_BLOCK: # List all running ElastiCache nodes aws elasticache describe-cache-clusters \ --query 'CacheClusters[].{ID:CacheClusterId,Type:CacheNodeType,Engine:Engine,Nodes:NumCacheNodes}' \ --output table # List active reservations aws elasticache describe-reserved-cache-nodes \ --query 'ReservedCacheNodes[?State==`active`].{Type:CacheNodeType,Count:CacheNodeCount,Expires:StartTime}' \ --output table Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # List all running ElastiCache nodes aws elasticache describe-cache-clusters \ --query 'CacheClusters[].{ID:CacheClusterId,Type:CacheNodeType,Engine:Engine,Nodes:NumCacheNodes}' \ --output table # List active reservations aws elasticache describe-reserved-cache-nodes \ --query 'ReservedCacheNodes[?State==`active`].{Type:CacheNodeType,Count:CacheNodeCount,Expires:StartTime}' \ --output table COMMAND_BLOCK: # List all running ElastiCache nodes aws elasticache describe-cache-clusters \ --query 'CacheClusters[].{ID:CacheClusterId,Type:CacheNodeType,Engine:Engine,Nodes:NumCacheNodes}' \ --output table # List active reservations aws elasticache describe-reserved-cache-nodes \ --query 'ReservedCacheNodes[?State==`active`].{Type:CacheNodeType,Count:CacheNodeCount,Expires:StartTime}' \ --output table - βœ… Cluster has been stable for 3+ months - βœ… You don't plan to change node types soon - βœ… It's a production workload running 24/7 - βœ… You're using consistent node families (e.g., r7g, m7g) - ❌ Dev/test clusters that get torn down - ❌ You're actively testing different node sizes - ❌ Cluster is less than 3 months old - ❌ Planning a migration to a different engine or service - Audit β€” Run the CLI commands above to find unreserved nodes - Identify stable clusters β€” Production clusters running 3+ months - Start conservative β€” 1-Year, No Upfront for your first reservation - Deploy monitoring β€” Set up the Lambda to catch gaps and expirations - Review quarterly β€” Reassess node types and reservation coverage - Reservations are region-specific β€” A reservation in us-east-1 won't cover nodes in eu-west-1 - Node type must match exactly β€” cache.r7g.large RI won't cover cache.r7g.xlarge - Reservations apply automatically β€” Once purchased, billing adjusts immediately. No cluster changes needed - Combine with Graviton β€” If you haven't migrated to r7g/m7g yet, do that first (20% cheaper), then reserve the Graviton nodes for compounding savings πŸ”₯