Tools: Scaling AI/ML Workloads: 3 Architecture Lessons from HashiConf 2023

Tools: Scaling AI/ML Workloads: 3 Architecture Lessons from HashiConf 2023

Source: Dev.to

From Infrastructure to Inference: Scaling AI/ML with the HashiStack ## 1. Orchestrating GPU Workloads with Nomad ## Architecture Decision: Specialized Node Pools ## 2. Managing "Model Sprawl" with Terraform Stacks ## Key Highlight: Infrastructure as a Single Unit ## 3. Securing LLM Secrets with Vault & Identity ## Architecture Decision: Dynamic Secrets via AppRole ## Final Thoughts Reflecting on my time at HashiConf 2023, one thing became crystal clear: The "AI Revolution" is actually an Infrastructure Revolution. Building a high-performing model is only part of the battle. The real challenge is the "plumbing"—securing LLM API keys, orchestrating expensive GPU resources, and ensuring reproducible environments. In this post, I’ll break down how to use the latest HashiCorp tools to solve the three biggest "Day 2" problems in AI/ML workloads. One of my favorite takeaways from the conference was the continued simplicity of Nomad for non-containerized and batch workloads. In the ML world, we often deal with raw Python scripts or specialized CUDA binaries that don't always play nice with the overhead of a massive Kubernetes cluster. Don't let your web-tier microservices fight your training jobs for resources. Use Nomad Node Pools to isolate your expensive GPU instances and ensure your training jobs have the headroom they need. The Code (Nomad Jobspec): This job specifically targets nodes labeled as gpu-nodes and requests a dedicated NVIDIA GPU for a batch training task. A massive highlight of HashiConf 2023 was the preview of Terraform Stacks. For AI teams, this is a game-changer. We often have interdependent infrastructure: a VPC, an S3 bucket for data, a SageMaker endpoint, and a Vector Database like Pinecone or Weaviate. Instead of managing five different workspaces and "wiring" them together with fragile data sources, Stacks allow you to define the entire ML environment as one repeatable unit across development, staging, and production. The Logic: If you change your GPU instance type in your "Compute" component, Terraform Stacks automatically handles the downstream updates to your "Serving" component. This reduces the manual orchestration of terraform apply chains that often lead to configuration drift in complex AI environments. The conference emphasized Identity-based security. If you are using OpenAI, Anthropic, or HuggingFace, you have sensitive API keys. Do not put them in hardcoded environment variables. Use Vault's AppRole to give your Python application a unique identity. The app "logs in" to Vault, proves its identity, and gets a short-lived token to read the API key. The Code (Python Integration): HashiConf 2023 showed that the future of DevOps isn't just about managing servers; it's about managing complexity at scale. Are you using the HashiStack for your AI workloads? I'd love to hear about your architecture decisions in the comments! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: job "llama-finetune-batch" { datacenters = ["dc1"] type = "batch" # Perfect for one-off training runs group "ml-engine" { constraint { attribute = "${node.class}" value = "gpu-nodes" } task "train" { driver = "docker" config { image = "nvidia/cuda:12.0-base" command = "python3" args = ["/local/train_script.py", "--epochs", "10"] } resources { cpu = 4000 memory = 8192 device "nvidia/gpu" { count = 1 } } } } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: job "llama-finetune-batch" { datacenters = ["dc1"] type = "batch" # Perfect for one-off training runs group "ml-engine" { constraint { attribute = "${node.class}" value = "gpu-nodes" } task "train" { driver = "docker" config { image = "nvidia/cuda:12.0-base" command = "python3" args = ["/local/train_script.py", "--epochs", "10"] } resources { cpu = 4000 memory = 8192 device "nvidia/gpu" { count = 1 } } } } } COMMAND_BLOCK: job "llama-finetune-batch" { datacenters = ["dc1"] type = "batch" # Perfect for one-off training runs group "ml-engine" { constraint { attribute = "${node.class}" value = "gpu-nodes" } task "train" { driver = "docker" config { image = "nvidia/cuda:12.0-base" command = "python3" args = ["/local/train_script.py", "--epochs", "10"] } resources { cpu = 4000 memory = 8192 device "nvidia/gpu" { count = 1 } } } } } COMMAND_BLOCK: import hvac import os # 1. Authenticate using the identity assigned by the platform client = hvac.Client(url=os.environ['VAULT_ADDR']) client.auth.approle.login( role_id=os.environ['VAULT_ROLE_ID'], secret_id=os.environ['VAULT_SECRET_ID'] ) # 2. Fetch the API key just-in-time secret_response = client.secrets.kv.v2.read_secret_version( path='ml-api-keys/openai', mount_point='secret' ) openai_api_key = secret_response['data']['data']['api_key'] # Now use the key for your inference call... Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: import hvac import os # 1. Authenticate using the identity assigned by the platform client = hvac.Client(url=os.environ['VAULT_ADDR']) client.auth.approle.login( role_id=os.environ['VAULT_ROLE_ID'], secret_id=os.environ['VAULT_SECRET_ID'] ) # 2. Fetch the API key just-in-time secret_response = client.secrets.kv.v2.read_secret_version( path='ml-api-keys/openai', mount_point='secret' ) openai_api_key = secret_response['data']['data']['api_key'] # Now use the key for your inference call... COMMAND_BLOCK: import hvac import os # 1. Authenticate using the identity assigned by the platform client = hvac.Client(url=os.environ['VAULT_ADDR']) client.auth.approle.login( role_id=os.environ['VAULT_ROLE_ID'], secret_id=os.environ['VAULT_SECRET_ID'] ) # 2. Fetch the API key just-in-time secret_response = client.secrets.kv.v2.read_secret_version( path='ml-api-keys/openai', mount_point='secret' ) openai_api_key = secret_response['data']['data']['api_key'] # Now use the key for your inference call... - Nomad handles the heavy lifting of GPUs. - Vault secures the "brains" (API keys and data). - Terraform Stacks manages the "skeleton" of the entire system.