Tools: I Built A Production Rag System On Azure Aks For $40/month — 's...

Tools: I Built A Production Rag System On Azure Aks For $40/month — 's...

Posted on Feb 9

• Originally published at github.com

A cloud architect's opinionated walkthrough: from blank terminal to 13 pods serving AI-powered answers, with cost breakdowns you can actually verify.

Last month, I set out to build something specific: a Retrieval-Augmented Generation system that could run on Azure Kubernetes Service — not as a proof-of-concept that lives in a Jupyter notebook, but as a real, deployable platform with ingestion pipelines, caching, observability, and a chat interface. The kind of system you'd hand to a team and say "here, extend this."

The constraint I gave myself was equally specific: keep the monthly bill under $50.

This article walks through what I built, the trade-offs I navigated, and the decisions I'd make differently if I were doing it again. If you're evaluating RAG architectures on Azure, this should save you a few weeks of trial and error.

All of this runs on a single Azure Kubernetes Service node.

Rather than describe the architecture in prose, here's the full cloud topology:

Cloud architecture: Azure managed services on the left, AKS cluster with 13 pods across 4 namespaces on the right.

Every component is deployed via Helm. Every Azure resource is provisioned via Terraform. The entire system goes from az login to serving queries in about 12 minutes.

Architecture diagrams are nice. But the real value is in why you chose one path over another. Here are the decisions I spent the most time on — and the reasoning I'd present to a team or a hiring manager.

Source: Dev.to