Tools

Tools: Complete Guide to Django + Celery + Redis Sentinel: A Real Failover Test (With Metrics)

2026-04-04 0 views admin

Here’s what actually happened.

Architecture Overview

Stack Components

Sentinel Integration (Django + Celery)

Observability with Prometheus

Failover Drill Walkthrough

Initial State

Induced Failure

Sentinel Election

Celery Behavior During Failover

Timeline

Observed Task

Performance Impact

Production Readiness Assessment

What Works

What Needs Attention

When This Architecture Is Production-Ready

When This Is Not Enough

How to Reduce Failover Latency

Key Takeaway

Final Thoughts Redis Sentinel + Celery Failover: What Actually Happens in Production Most tutorials on Redis Sentinel stop at “it elects a new master”.

Very few show what happens to a real system under failover pressure. I ran a failover drill on a Django + Celery stack backed by Redis Sentinel and Prometheus monitoring. All services were switched to Sentinel using environment configuration: At this stage, the system is fully Sentinel-aware After pointing redis_exporter to Sentinel: This confirms monitoring is tracking cluster state, not a single node. Failover was immediate and correct Avoid this setup (as-is) if you need: To push recovery closer to 10–15 seconds: Redis Sentinel ensures infrastructure recovery.

Celery determines how fast your system actually resumes work. That gap is the real engineering challenge. If you're using Redis Sentinel with Celery: “How long until my system behaves normally again?” Because that’s what production users experience. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ flowchart LR Client --> Django Django -->|Cache| Sentinel Django -->|Tasks| Celery Celery -->|Broker| Sentinel Celery -->|Result Backend| Sentinel Sentinel --> RedisMaster Sentinel --> RedisReplica1 Sentinel --> RedisReplica2 Prometheus --> RedisExporter RedisExporter --> Sentinel flowchart LR Client --> Django Django -->|Cache| Sentinel Django -->|Tasks| Celery Celery -->|Broker| Sentinel Celery -->|Result Backend| Sentinel Sentinel --> RedisMaster Sentinel --> RedisReplica1 Sentinel --> RedisReplica2 Prometheus --> RedisExporter RedisExporter --> Sentinel flowchart LR Client --> Django Django -->|Cache| Sentinel Django -->|Tasks| Celery Celery -->|Broker| Sentinel Celery -->|Result Backend| Sentinel Sentinel --> RedisMaster Sentinel --> RedisReplica1 Sentinel --> RedisReplica2 Prometheus --> RedisExporter RedisExporter --> Sentinel REDIS_ADDR=redis://host.-weight: 500;">docker.internal:26379 REDIS_ADDR=redis://host.-weight: 500;">docker.internal:26379 REDIS_ADDR=redis://host.-weight: 500;">docker.internal:26379 pytest tests/test_settings_redis_sentinel.py pytest tests/test_settings_redis_sentinel.py pytest tests/test_settings_redis_sentinel.py redis_instance_info{redis_mode="sentinel", tcp_port="26379"} redis_instance_info{redis_mode="sentinel", tcp_port="26379"} redis_instance_info{redis_mode="sentinel", tcp_port="26379"} flowchart LR Sentinel -->|Master| Redis1["172.20.0.3:6379"] Sentinel --> Redis2["Replica"] Sentinel --> Redis3["Replica"] flowchart LR Sentinel -->|Master| Redis1["172.20.0.3:6379"] Sentinel --> Redis2["Replica"] Sentinel --> Redis3["Replica"] flowchart LR Sentinel -->|Master| Redis1["172.20.0.3:6379"] Sentinel --> Redis2["Replica"] Sentinel --> Redis3["Replica"] master_address="172.20.0.3:6379" master_address="172.20.0.3:6379" master_address="172.20.0.3:6379" flowchart LR Sentinel -->|New Master| Redis2["172.20.0.2:6379"] Sentinel --> Redis3["Replica"] Sentinel --> Redis1["Down"] flowchart LR Sentinel -->|New Master| Redis2["172.20.0.2:6379"] Sentinel --> Redis3["Replica"] Sentinel --> Redis1["Down"] flowchart LR Sentinel -->|New Master| Redis2["172.20.0.2:6379"] Sentinel --> Redis3["Replica"] Sentinel --> Redis1["Down"] sequenceDiagram participant App as Django App participant Celery participant Sentinel participant Redis App->>Celery: Submit Task Celery->>Redis: Send to Master Redis-->>Celery: Connection Lost Sentinel->>Sentinel: Elect New Master Celery->>Sentinel: Retry Connection Note over Celery: ~54.7s delay Celery->>Redis: Reconnect to New Master Redis-->>Celery: OK Celery-->>App: Task SUCCESS sequenceDiagram participant App as Django App participant Celery participant Sentinel participant Redis App->>Celery: Submit Task Celery->>Redis: Send to Master Redis-->>Celery: Connection Lost Sentinel->>Sentinel: Elect New Master Celery->>Sentinel: Retry Connection Note over Celery: ~54.7s delay Celery->>Redis: Reconnect to New Master Redis-->>Celery: OK Celery-->>App: Task SUCCESS sequenceDiagram participant App as Django App participant Celery participant Sentinel participant Redis App->>Celery: Submit Task Celery->>Redis: Send to Master Redis-->>Celery: Connection Lost Sentinel->>Sentinel: Elect New Master Celery->>Sentinel: Retry Connection Note over Celery: ~54.7s delay Celery->>Redis: Reconnect to New Master Redis-->>Celery: OK Celery-->>App: Task SUCCESS - Architecture Overview - Sentinel Integration (Django + Celery) - Observability with Prometheus - Failover Drill Walkthrough - Celery Behavior During Failover - Performance Impact - Production Readiness Assessment - How to Reduce Failover Latency - Django → Redis cache via Sentinel - Celery → Broker + result backend via Sentinel - Redis Sentinel → High availability + failover - Prometheus + redis_exporter → Monitoring - Django cache → successful round-trip - Celery broker → connected via Sentinel - Celery result backend → SentinelBackend initialized - Test suite passed: - redis_sentinel_master_status - redis_sentinel_master_ok_sentinels - redis_sentinel_master_ok_slaves - redis_sentinel_masters - Current master was stopped manually - New master elected on first poll - Prometheus updated on next scrape - Task ID: 9b57ba3b-a707-4c13-9255-d74de411b64b - Status during failover: PENDING - Delay: ~54.7 seconds - Final state: SUCCESS - Redis Sentinel failover is reliable - Prometheus reflects cluster changes correctly - Django cache survives failover - No task loss in Celery - Celery introduces significant delay during failover - Reconnection is not instantaneous - Tasks are asynchronous/background - Eventual completion is acceptable - Temporary latency spikes are tolerable - Real-time task execution - Sub-10s failover recovery - User-facing async operations - Tune Celery broker retry settings - Reduce reconnect backoff intervals - Optimize worker heartbeat and visibility timeout - Re-run failover drills with timing instrumentation - Sentinel recovery: instant - Application recovery: ~55 seconds

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolscompleteguidedjangoceleryredissentinelfailover

More from Tools

Tools: Essential Guide: Devices Paralelos Bacula: Configurar 10 Devices para Alta Concorrência

2026-04-04 0

Tools: Latest: Why SSH Key Management Is Broken and How Certificates Fix It

2026-04-04 0

Tools: Windows 11 Security Hardening: Practical Steps That Actually Matter (2026)

2026-04-04 0

Tools: Report: How to Convert WebP to JPG (5 Free Methods)

2026-04-04 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News