$ services: redis: image: redis/redis-stack:latest ports: - "6379:6379" - "8001:8001" # RedisInsight volumes: - redis_data:/data healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 3s retries: 5 ollama: image: ollama/ollama ports: - "11434:11434" volumes: - ollama_models:/root/.ollama deploy: resources: reservations: devices: - capabilities: [gpu] healthcheck: test: ["CMD-SHELL", "-weight: 500;">curl -f http://localhost:11434/api/tags || exit 1"] interval: 10s timeout: 5s retries: 3 volumes: redis_data: ollama_models:
services: redis: image: redis/redis-stack:latest ports: - "6379:6379" - "8001:8001" # RedisInsight volumes: - redis_data:/data healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 3s retries: 5 ollama: image: ollama/ollama ports: - "11434:11434" volumes: - ollama_models:/root/.ollama deploy: resources: reservations: devices: - capabilities: [gpu] healthcheck: test: ["CMD-SHELL", "-weight: 500;">curl -f http://localhost:11434/api/tags || exit 1"] interval: 10s timeout: 5s retries: 3 volumes: redis_data: ollama_models:
services: redis: image: redis/redis-stack:latest ports: - "6379:6379" - "8001:8001" # RedisInsight volumes: - redis_data:/data healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 3s retries: 5 ollama: image: ollama/ollama ports: - "11434:11434" volumes: - ollama_models:/root/.ollama deploy: resources: reservations: devices: - capabilities: [gpu] healthcheck: test: ["CMD-SHELL", "-weight: 500;">curl -f http://localhost:11434/api/tags || exit 1"] interval: 10s timeout: 5s retries: 3 volumes: redis_data: ollama_models:
with RedisSaver.from_conn_string("redis://localhost:6379") as checkpointer: checkpointer.setup() agent = create_react_agent(..., checkpointer=checkpointer)
with RedisSaver.from_conn_string("redis://localhost:6379") as checkpointer: checkpointer.setup() agent = create_react_agent(..., checkpointer=checkpointer)
with RedisSaver.from_conn_string("redis://localhost:6379") as checkpointer: checkpointer.setup() agent = create_react_agent(..., checkpointer=checkpointer)
with RedisStore.from_conn_string( "redis://localhost:6379", index={ "embed": embeddings, "dims": 768, "distance_type": "cosine", "fields": ["text"], },
) as store: store.setup()
with RedisStore.from_conn_string( "redis://localhost:6379", index={ "embed": embeddings, "dims": 768, "distance_type": "cosine", "fields": ["text"], },
) as store: store.setup()
with RedisStore.from_conn_string( "redis://localhost:6379", index={ "embed": embeddings, "dims": 768, "distance_type": "cosine", "fields": ["text"], },
) as store: store.setup()
from redisvl.extensions.llmcache import SemanticCache cache = SemanticCache( name="llm_cache", redis_url="redis://localhost:6379", distance_threshold=0.1, ttl=3600,
)
from redisvl.extensions.llmcache import SemanticCache cache = SemanticCache( name="llm_cache", redis_url="redis://localhost:6379", distance_threshold=0.1, ttl=3600,
)
from redisvl.extensions.llmcache import SemanticCache cache = SemanticCache( name="llm_cache", redis_url="redis://localhost:6379", distance_threshold=0.1, ttl=3600,
)
ollama pull qwen3.5:4b # 2.5 GB, requires ~4 GB VRAM
ollama pull nomic-embed-text # 274 MB, for embeddings
ollama pull qwen3.5:4b # 2.5 GB, requires ~4 GB VRAM
ollama pull nomic-embed-text # 274 MB, for embeddings
ollama pull qwen3.5:4b # 2.5 GB, requires ~4 GB VRAM
ollama pull nomic-embed-text # 274 MB, for embeddings
model = ChatOllama( model="qwen3.5:4b", base_url="http://ollama:11434", # Docker -weight: 500;">service name
)
model = ChatOllama( model="qwen3.5:4b", base_url="http://ollama:11434", # Docker -weight: 500;">service name
)
model = ChatOllama( model="qwen3.5:4b", base_url="http://ollama:11434", # Docker -weight: 500;">service name
)
CHAT_MODEL=qwen3.5:4b
EMBEDDING_MODEL=nomic-embed-text
CHAT_MODEL=qwen3.5:4b
EMBEDDING_MODEL=nomic-embed-text
CHAT_MODEL=qwen3.5:4b
EMBEDDING_MODEL=nomic-embed-text
# Memory usage
redis-cli INFO memory | grep used_memory_human # Key count
redis-cli DBSIZE # Live command stream
redis-cli MONITOR # Slow queries
redis-cli SLOWLOG GET 10
# Memory usage
redis-cli INFO memory | grep used_memory_human # Key count
redis-cli DBSIZE # Live command stream
redis-cli MONITOR # Slow queries
redis-cli SLOWLOG GET 10
# Memory usage
redis-cli INFO memory | grep used_memory_human # Key count
redis-cli DBSIZE # Live command stream
redis-cli MONITOR # Slow queries
redis-cli SLOWLOG GET 10
# Which models are loaded?
-weight: 500;">curl http://localhost:11434/api/tags # How much VRAM is being used?
nvidia-smi
# Which models are loaded?
-weight: 500;">curl http://localhost:11434/api/tags # How much VRAM is being used?
nvidia-smi
# Which models are loaded?
-weight: 500;">curl http://localhost:11434/api/tags # How much VRAM is being used?
nvidia-smi
# Snapshot
redis-cli BGSAVE
cp /data/dump.rdb /backup/redis.rdb # Or copy AOF
cp /data/appendonly.aof /backup/
# Snapshot
redis-cli BGSAVE
cp /data/dump.rdb /backup/redis.rdb # Or copy AOF
cp /data/appendonly.aof /backup/
# Snapshot
redis-cli BGSAVE
cp /data/dump.rdb /backup/redis.rdb # Or copy AOF
cp /data/appendonly.aof /backup/ - Conversation history: Redis (checkpointer)
- Saved memories: Redis (vector index)
- Cached responses: Redis (semantic cache)
- Scan history: Redis (vector index) - Restart the agent without losing anything
- Run multiple instances behind a load balancer
- Scale horizontally without shared state in memory
- Deploy new versions with zero downtime (rolling -weight: 500;">update) - Cloud (e.g. Qwen 3 72b via OpenRouter): ~$0.005 per scan. 200 scans per day = $30 per month
- Local (Qwen 3.5 4b): ~$0 per scan. Unlimited. - Set maxmemory and eviction policy: Redis without a memory limit on a shared machine is a ticking time bomb. maxmemory-policy allkeys-lru automatically evicts the oldest entries.
- TTL on everything that does not need to live forever: Cached LLM responses: 1 hour. Conversation history: 7 days. Scan history: keep.
- Separate Redis instances per environment: Dev, staging, prod should not share data. Use key prefixes (dev:, staging:, prod:) or ideally separate Redis instances entirely. Avoid logical databases (/0, /1, /2). RediSearch and other modules only work on database 0, and clustering does not support them either.
- Health checks in -weight: 500;">docker compose: Already included in the example above. If you add an agent -weight: 500;">service, use depends_on with condition: service_healthy so it does not -weight: 500;">start before Redis and Ollama are ready.
- Log token usage: Even with local models, you want to know how much inference you are running. It helps with capacity planning.