Tools: Self-Hosting AutoBot: A DevOps Deep Dive into Docker Compose, Model Sizing, and Production Ops - Analysis

Tools: Self-Hosting AutoBot: A DevOps Deep Dive into Docker Compose, Model Sizing, and Production Ops - Analysis

Why Self-Host?

Docker Compose Deep Dive

Services Overview

What Each Service Does

Networking

Model Sizing to Hardware

The Rule of Thumb

Local Ollama vs. Cloud LLM Trade-offs

Practical Model Recommendations

Production Tips

Backups

Upgrades

Monitoring

What's Next You've seen the demos. You want to run AutoBot on your own hardware, your own data, under your own control. Good instinct. Here's the full operational picture — Docker Compose internals, how to match LLM models to your GPU or CPU, and the production habits that keep things stable long-term. AutoBot's tagline is "Your data. Your AI." That's not marketing copy — it's an architectural choice. When you self-host: The trade-off is operational responsibility. This post is about making that trade-off comfortable. AutoBot ships with a docker-compose.yml that wires together several services. Let's walk through each layer. backend — FastAPI application. Handles chat sessions, RAG retrieval, fleet management. The OLLAMA_HOST env var points it at your local model server; swap this for an OpenAI-compatible URL to use a cloud LLM instead. frontend — Next.js UI. Talks only to the backend on port 8000. Stateless — you can restart it without losing anything. chromadb — Vector database for knowledge bases. Your embedded documents live here. The chroma_data volume is critical — back it up. redis — Session state and task queues. With --appendonly yes, Redis persists to disk. Losing this volume means losing active session context (but not your knowledge bases). ollama — Local LLM inference server. Holds downloaded model weights in ollama_models. Models are large (4–70 GB each); this volume is expensive to rebuild. All services communicate on a default Docker bridge network. The service names (chromadb, redis, ollama) resolve as hostnames inside the network — that's why the backend config uses http://ollama:11434 rather than localhost. For a production deployment, consider an explicit network definition: This lets you add an Nginx reverse proxy or Traefik on the same network without exposing internal ports. This is where most self-hosting guides go wrong — they talk about VPS pricing instead of the actual constraint: inference throughput vs. memory bandwidth. A model running entirely in VRAM is fast. A model that spills to RAM (or worse, disk) is slow. Plan your setup so your primary model fits in VRAM with room for the OS and other processes. AutoBot supports both. Here's how to think about the choice: Local Ollama (default) Cloud LLM (OpenAI, Anthropic, etc.) The OLLAMA_HOST env var makes switching simple. Point it at https://api.openai.com/v1 (with an OpenAI-compatible wrapper) to route through a cloud provider without touching application code. For a RAG-heavy knowledge base workload (most AutoBot deployments): a quantized 8B model (Llama 3.1 8B Q4_K_M) hits the sweet spot — fast enough for real-time chat, accurate enough for document retrieval, fits comfortably on a single consumer GPU. For a multi-agent fleet workload: consider running a smaller model (3B–7B) per agent node and reserving a larger model for orchestration decisions. AutoBot's fleet manager is built to handle per-agent model config. The three volumes that matter: Run chroma and redis backups daily. Ollama models only change when you pull new ones — back up on change, not on schedule. Pin image tags in production (chromadb/chroma:0.5.3 not latest) so upgrades are deliberate, not automatic. AutoBot's backend exposes a /health endpoint. Wire it into your monitoring stack: For metrics, the backend emits structured logs to stdout. Forward them to Loki, Datadog, or whatever you already use: Watch for these signals: Self-hosting is the start, not the finish. Once you're running in production, the interesting work is building knowledge bases, connecting data sources, and wiring up agents for your specific workflows. If you want to help make AutoBot better at the infrastructure layer, there are open issues tagged for DevOps contributors: → Good first issues — DevOps label on AutoBot-AI If AutoBot is saving you money or time on your infra, consider supporting development: → Ko-fi: ko-fi.com/mrveiss Questions, corrections, or war stories from your own deployment — drop them in the comments. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

services: backend: build: ./backend ports: ["8000:8000"] depends_on: [chromadb, redis] environment: - OLLAMA_HOST=http://ollama:11434 - CHROMA_HOST=chromadb - REDIS_URL=redis://redis:6379 frontend: build: ./frontend ports: ["3000:3000"] depends_on: [backend] chromadb: image: chromadb/chroma:latest volumes: - chroma_data:/chroma/chroma redis: image: redis:7-alpine volumes: - redis_data:/data command: redis-server --appendonly yes ollama: image: ollama/ollama:latest volumes: - ollama_models:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: chroma_data: redis_data: ollama_models: services: backend: build: ./backend ports: ["8000:8000"] depends_on: [chromadb, redis] environment: - OLLAMA_HOST=http://ollama:11434 - CHROMA_HOST=chromadb - REDIS_URL=redis://redis:6379 frontend: build: ./frontend ports: ["3000:3000"] depends_on: [backend] chromadb: image: chromadb/chroma:latest volumes: - chroma_data:/chroma/chroma redis: image: redis:7-alpine volumes: - redis_data:/data command: redis-server --appendonly yes ollama: image: ollama/ollama:latest volumes: - ollama_models:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: chroma_data: redis_data: ollama_models: services: backend: build: ./backend ports: ["8000:8000"] depends_on: [chromadb, redis] environment: - OLLAMA_HOST=http://ollama:11434 - CHROMA_HOST=chromadb - REDIS_URL=redis://redis:6379 frontend: build: ./frontend ports: ["3000:3000"] depends_on: [backend] chromadb: image: chromadb/chroma:latest volumes: - chroma_data:/chroma/chroma redis: image: redis:7-alpine volumes: - redis_data:/data command: redis-server --appendonly yes ollama: image: ollama/ollama:latest volumes: - ollama_models:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: chroma_data: redis_data: ollama_models: networks: autobot_net: driver: bridge services: backend: networks: [autobot_net] # ... same for all services networks: autobot_net: driver: bridge services: backend: networks: [autobot_net] # ... same for all services networks: autobot_net: driver: bridge services: backend: networks: [autobot_net] # ... same for all services # ChromaDB — your knowledge bases docker run --rm \ -v autobot_chroma_data:/source \ -v /backup:/backup \ alpine tar czf /backup/chroma-$(date +%Y%m%d).tar.gz -C /source . # Redis — session state docker exec autobot-redis-1 redis-cli BGSAVE docker cp autobot-redis-1:/data/dump.rdb /backup/redis-$(date +%Y%m%d).rdb # Ollama models — large, but painful to re-download docker run --rm \ -v autobot_ollama_models:/source \ -v /backup:/backup \ alpine tar czf /backup/ollama-$(date +%Y%m%d).tar.gz -C /source . # ChromaDB — your knowledge bases docker run --rm \ -v autobot_chroma_data:/source \ -v /backup:/backup \ alpine tar czf /backup/chroma-$(date +%Y%m%d).tar.gz -C /source . # Redis — session state docker exec autobot-redis-1 redis-cli BGSAVE docker cp autobot-redis-1:/data/dump.rdb /backup/redis-$(date +%Y%m%d).rdb # Ollama models — large, but painful to re-download docker run --rm \ -v autobot_ollama_models:/source \ -v /backup:/backup \ alpine tar czf /backup/ollama-$(date +%Y%m%d).tar.gz -C /source . # ChromaDB — your knowledge bases docker run --rm \ -v autobot_chroma_data:/source \ -v /backup:/backup \ alpine tar czf /backup/chroma-$(date +%Y%m%d).tar.gz -C /source . # Redis — session state docker exec autobot-redis-1 redis-cli BGSAVE docker cp autobot-redis-1:/data/dump.rdb /backup/redis-$(date +%Y%m%d).rdb # Ollama models — large, but painful to re-download docker run --rm \ -v autobot_ollama_models:/source \ -v /backup:/backup \ alpine tar czf /backup/ollama-$(date +%Y%m%d).tar.gz -C /source . # Pull latest images docker compose pull # Recreate containers (zero-downtime if you add a load balancer) docker compose up -d --no-deps --build backend frontend # Full restart (brief downtime) docker compose down && docker compose up -d # Pull latest images docker compose pull # Recreate containers (zero-downtime if you add a load balancer) docker compose up -d --no-deps --build backend frontend # Full restart (brief downtime) docker compose down && docker compose up -d # Pull latest images docker compose pull # Recreate containers (zero-downtime if you add a load balancer) docker compose up -d --no-deps --build backend frontend # Full restart (brief downtime) docker compose down && docker compose up -d # Simple cron healthcheck */5 * * * * curl -sf http://localhost:8000/health || notify-oncall # Simple cron healthcheck */5 * * * * curl -sf http://localhost:8000/health || notify-oncall # Simple cron healthcheck */5 * * * * curl -sf http://localhost:8000/health || notify-oncall backend: logging: driver: "json-file" options: max-size: "50m" max-file: "5" backend: logging: driver: "json-file" options: max-size: "50m" max-file: "5" backend: logging: driver: "json-file" options: max-size: "50m" max-file: "5" - Conversations never leave your network - You choose which models run (open-weight, cloud API, or a mix) - Upgrade timing is yours to control - No per-seat pricing surprises - Zero per-token cost - Private by definition - Latency depends on your hardware - Best for: high-volume internal tools, sensitive data, experimentation - Pay per token - Faster for large models you can't run locally - Data leaves your network (check your provider's retention policy) - Best for: production apps that need frontier model quality without buying GPUs - ChromaDB query latency > 2s — index fragmentation or under-resourced container - Redis memory approaching limit — set maxmemory and a sensible eviction policy (allkeys-lru) - Ollama inference time spiking — model being swapped to RAM; consider reducing context length or switching to a smaller quantization