Tools

Tools: Ultimate Guide: I Dockerized a Production AI System as an Intern. Here's What Actually Mattered.

2026-04-06 0 views admin

The System I Walked Into

What I Built

The Constraints That Shaped Everything

No CI/CD

Shared Secrets, Different Environments

Volume Isolation Without Duplication

The Override File Pattern

The Migration

Rollback

What I'd Do Differently With More Access

The Actual Takeaway No CI/CD. No Kubernetes. Just PuTTY, WinSCP, and a system that needed to stop being fragile. I'm an intern on an AI team. My project is an internal AI support tool: an augmented RAG-based system that ingests knowledge bases, searches resolved tickets via vector similarity, and synthesizes resolutions using an LLM. FastAPI backend, React frontend, PostgreSQL with pgvector, ChromaDB for embeddings, OpenAI for generation. The AI pipeline is interesting. The infrastructure it was running on was not. Here's what "deployment" looked like when I joined: Four ports exposed + separate frontend and backend for both environments. No containerization. No build step for the frontend. No rollback mechanism. No isolation between test and production beyond "they're in different folders." The frontend was not even served via Vite's dev server in production. If the EC2 instance had a bad day, reconstruction was from memory and hope. One directory. Docker Compose with overlay files for environment separation. nginx as a reverse proxy (two ports instead of four). Image versioning with semantic tags and timestamp backups. A deploy script. A rollback script. Full isolation between prod and test + different Docker networks, different data volumes, different container names. Deploy went from "copy files and pray" to: This is the part I actually want to talk about. The Docker setup isn't novel... anyone can follow a tutorial. What made this interesting was what I couldn't do. No GitHub Actions. No webhooks. No automated pipelines. My deployment tools are PuTTY (SSH terminal) and WinSCP (file transfer). That's it. So I built a shell script that acts as a poor-man's pipeline: The branch argument means I can test feature branches on the test stack without merging: Is this as good as GitHub Actions with automated tests and staging environments? No. Does it work reliably for a single-server deployment with one developer? Yes. The backend reads its config from a single .env file: database credentials, API keys, OIDC settings. Both prod and test use the same file because they're on the same machine talking to the same database. But the OIDC redirect URIs must differ between environments. Prod redirects to the public DNS. Test redirects to the internal IP. The solution: Docker Compose's precedence rules. environment: in a compose file beats env_file:. So the base compose file loads the shared secrets via env_file:, and each overlay (prod.yml, test.yml) overrides just the OIDC URI via environment:. This is a small detail. It's also the kind of thing that causes a two-hour debugging session if you don't know about it. env_file values get silently overridden by environment values, and there's no warning, no log, nothing. You just get the wrong redirect and stare at your OIDC provider's error page. ChromaDB stores embeddings on disk. The knowledge base files live on disk. Logs go to disk. Prod and test need completely separate copies of all of these. You don't want a test run corrupting production embeddings. Docker Compose variable substitution handles this: Default values point to prod directories. When you pass --env-file .env.test, the paths switch to test directories. Same compose file, different data. The deploy script handles this automatically, and you never pass --env-file manually. Docker Compose has a feature where docker-compose.override.yml is automatically loaded alongside docker-compose.yml, but only when you don't use explicit -f flags. I used this to create three distinct modes from the same codebase: One base file. Three overlays. Three completely different behaviors. The developer never thinks about which files to compose: docker compose watch just works locally, and ./deploy.sh picks the right overlay on EC2. The scariest part was the cutover. Two directories, both running live. I needed to consolidate into one without downtime on prod.

The sequence: Step 6 is where you sweat. The public DNS now needs to resolve to the new container, with the right OIDC config, serving the right data. If anything is wrong, users see a broken page. It worked on the first try. Which means I probably over-prepared, but I'd rather over-prepare than explain to the team why production is down. The deploy script tags current images before rebuilding: Rollback reuses the saved image without touching data volumes: This is basic, but it's infinitely better than what existed before (nothing). The timestamp tag is insurance. Even if you forget to bump the version, you can still roll back to any previous deploy by timestamp. If I had CI/CD and wasn't constrained to PuTTY: But these are improvements to a system that already works. The first version doesn't need to be perfect. It needs to be better than what it replaced, and "someone manually copy-pasting files" is a low bar to clear. The interesting skill in infrastructure work isn't knowing Docker or nginx or Compose. It's designing around constraints you can't remove. I couldn't set up CI/CD. I couldn't get a second server. I couldn't change the OIDC provider's configuration beyond adding redirect URIs. I had an intern's access level. So I built something that works within those constraints. It's not elegant by industry standards. But it's reproducible, it's rollback-safe, it has environment isolation, and it replaced a process that depended on one person's memory of which files to copy where. That's the gap between knowing tools and doing systems design. Tools are things you learn. Systems design is figuring out what to do when the tools you want aren't available. I'm an intern working on AI systems: RAG pipelines, support ticket analytics, UX upgrades and apparently now DevOps. If you're working on similar problems or just want to talk about building things under real-world constraints, I'd love to connect. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

Directory A (test): └── git pull → run uvicorn directly on EC2 Directory B (production): └── teammate manually copies changed files from Directory A └── find-and-replace URLs └── run uvicorn directly on EC2 Directory A (test): └── git pull → run uvicorn directly on EC2 Directory B (production): └── teammate manually copies changed files from Directory A └── find-and-replace URLs └── run uvicorn directly on EC2 Directory A (test): └── git pull → run uvicorn directly on EC2 Directory B (production): └── teammate manually copies changed files from Directory A └── find-and-replace URLs └── run uvicorn directly on EC2 ./deploy.sh prod 1.3 ./deploy.sh prod 1.3 ./deploy.sh prod 1.3 ./deploy.sh [stack] [version] [branch] │ ├── git pull origin [branch] ├── tag current running images as backup ├── docker compose build ├── docker compose up -d └── prune dangling images ./deploy.sh [stack] [version] [branch] │ ├── git pull origin [branch] ├── tag current running images as backup ├── docker compose build ├── docker compose up -d └── prune dangling images ./deploy.sh [stack] [version] [branch] │ ├── git pull origin [branch] ├── tag current running images as backup ├── docker compose build ├── docker compose up -d └── prune dangling images ./deploy.sh test latest feature/new-rag-pipeline # verify on internal IP # merge PR ./deploy.sh prod 1.3 main ./deploy.sh test latest feature/new-rag-pipeline # verify on internal IP # merge PR ./deploy.sh prod 1.3 main ./deploy.sh test latest feature/new-rag-pipeline # verify on internal IP # merge PR ./deploy.sh prod 1.3 main # docker-compose.yml (base) services: backend: env_file: backend/.env # shared secrets # docker-compose.prod.yml (overlay) services: backend: environment: - OIDC_REDIRECT_URI_FRONTEND=https://public.dns.com # docker-compose.test.yml (overlay) services: backend: environment: - OIDC_REDIRECT_URI_FRONTEND=http://<IP>:<Port> # docker-compose.yml (base) services: backend: env_file: backend/.env # shared secrets # docker-compose.prod.yml (overlay) services: backend: environment: - OIDC_REDIRECT_URI_FRONTEND=https://public.dns.com # docker-compose.test.yml (overlay) services: backend: environment: - OIDC_REDIRECT_URI_FRONTEND=http://<IP>:<Port> # docker-compose.yml (base) services: backend: env_file: backend/.env # shared secrets # docker-compose.prod.yml (overlay) services: backend: environment: - OIDC_REDIRECT_URI_FRONTEND=https://public.dns.com # docker-compose.test.yml (overlay) services: backend: environment: - OIDC_REDIRECT_URI_FRONTEND=http://<IP>:<Port> # docker-compose.yml volumes: - ./backend/${CHROMA_DIR:-chromaDB}:/app/chromaDB - ./backend/${DATA_DIR:-data}:/app/data - ./backend/${LOGS_DIR:-logs}:/app/logs # docker-compose.yml volumes: - ./backend/${CHROMA_DIR:-chromaDB}:/app/chromaDB - ./backend/${DATA_DIR:-data}:/app/data - ./backend/${LOGS_DIR:-logs}:/app/logs # docker-compose.yml volumes: - ./backend/${CHROMA_DIR:-chromaDB}:/app/chromaDB - ./backend/${DATA_DIR:-data}:/app/data - ./backend/${LOGS_DIR:-logs}:/app/logs # .env.test CHROMA_DIR=chromaDB-test DATA_DIR=data-test LOGS_DIR=logs-test # .env.test CHROMA_DIR=chromaDB-test DATA_DIR=data-test LOGS_DIR=logs-test # .env.test CHROMA_DIR=chromaDB-test DATA_DIR=data-test LOGS_DIR=logs-test # Local development (override.yml auto-loaded) docker compose watch → Vite dev server with HMR, OIDC pointing to localhost # Production (explicit -f, override.yml skipped) docker compose -f docker-compose.yml -f docker-compose.prod.yml up → nginx serves built static files, OIDC pointing to public DNS # Test (explicit -f, override.yml skipped) docker compose --env-file .env.test -f docker-compose.yml -f docker-compose.test.yml up → nginx serves built static files, OIDC pointing to internal IP, test volumes # Local development (override.yml auto-loaded) docker compose watch → Vite dev server with HMR, OIDC pointing to localhost # Production (explicit -f, override.yml skipped) docker compose -f docker-compose.yml -f docker-compose.prod.yml up → nginx serves built static files, OIDC pointing to public DNS # Test (explicit -f, override.yml skipped) docker compose --env-file .env.test -f docker-compose.yml -f docker-compose.test.yml up → nginx serves built static files, OIDC pointing to internal IP, test volumes # Local development (override.yml auto-loaded) docker compose watch → Vite dev server with HMR, OIDC pointing to localhost # Production (explicit -f, override.yml skipped) docker compose -f docker-compose.yml -f docker-compose.prod.yml up → nginx serves built static files, OIDC pointing to public DNS # Test (explicit -f, override.yml skipped) docker compose --env-file .env.test -f docker-compose.yml -f docker-compose.test.yml up → nginx serves built static files, OIDC pointing to internal IP, test volumes resolvyst-backend:latest → resolvyst-backend:v1.2 → resolvyst-backend:20260403_1430 resolvyst-backend:latest → resolvyst-backend:v1.2 → resolvyst-backend:20260403_1430 resolvyst-backend:latest → resolvyst-backend:v1.2 → resolvyst-backend:20260403_1430 ./rollback.sh prod v1.2 ./rollback.sh prod v1.2 ./rollback.sh prod v1.2 - Stop old test stack (directory A) - Stop old prod stack (directory B) - Copy prod data (ChromaDB, knowledge base, secrets) into directory A - Create isolated test directories (start empty) - Pull latest code with all new compose files - Deploy prod from directory A → verify public DNS works - Deploy test from directory A → verify internal IP works - Archive directory B - Health checks in the compose file. Right now, deploy.sh reports success even if the backend crashes on startup. A curl check post-deploy would catch that. - Separate secrets per environment. The shared .env file works but is fragile; one wrong edit affects both stacks. - Automated smoke tests after deploy. Hit the health endpoint, verify the RAG pipeline returns a response, check that OIDC redirects correctly. - Git working tree check at the top of deploy.sh. Right now, nothing stops you from deploying with uncommitted changes on EC2.

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsultimateguidedockerizedproductionsysteminternactually

More from Tools

Tools: Why You Should Add Observability to Your Data Extraction with OpenTelemetry

2026-04-06 0

Tools: Is Railway Reliable for FastAPI in 2026? - Full Analysis

2026-04-06 0

Tools: Report: Kronveil v0.3: Multi-Cluster Federation, Custom Collector SDK, and Automated Runbooks

2026-04-06 0

Tools: Architecting Multi-Tenant VoIP for Scale: A Technical Deep Dive

2026-04-06 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Ultimate Guide: I Dockerized a Production AI System as an Intern. Here's What Actually Mattered.

The System I Walked Into

What I Built

The Constraints That Shaped Everything

No CI/CD

Shared Secrets, Different Environments

Volume Isolation Without Duplication

The Override File Pattern

The Migration

Rollback

What I'd Do Differently With More Access

🏷️ Tags

More from Tools

Tools: Why You Should Add Observability to Your Data Extraction with OpenTelemetry

Tools: Is Railway Reliable for FastAPI in 2026? - Full Analysis

Tools: Report: Kronveil v0.3: Multi-Cluster Federation, Custom Collector SDK, and Automated Runbooks

Tools: Architecting Multi-Tenant VoIP for Scale: A Technical Deep Dive

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting