Tools: Here is the rewritten article in the format requested: (2026)

Tools: Here is the rewritten article in the format requested: (2026)

Model Weight vs. Actual VRAM Usage

Bigger Isn't Always Better

GPU Selection Beyond Specs

Pre-Flight Environment Checks

Docker Setup for Reproducible Environments

Ollama Installation & Basic Operations

Method One: Via Docker (Recommended)

Method Two: Direct Host Installation

Essential Ollama Commands Here is the rewritten article in the format requested: 12 GPU Checks That Cut My Local AI Agent Setup Time by 75% Running a local AI agent like qwen3.5:9b on a consumer GPU often ends in errors like "Out of VRAM" or "model loading failed" due to misconfiguration, not insufficient power. My RTX 5070 Ti 16GB initially seemed overkill, but tests revealed VRAM needs aren't linear. Code to Check Actual VRAM Usage (NVIDIA) Honesty Moment: I initially wasted money on an overpowered GPU before realizing a 12GB mid-range card sufficed. Driver & Framework Support: Quantization Compatibility: Safe Quantization Starter Skip these at your peril; they save hours of debugging: Minimal Viable docker-compose.yml (NVIDIA) Installing NVIDIA Container Toolkit (Ubuntu 22.04 Example) Your Turn: What's the most common GPU misconfiguration you've encountered when setting up a local AI agent, and how did you resolve it? Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

nvidia-smi --query-gpu=memory.used --format=csv,noheader nvidia-smi --query-gpu=memory.used --format=csv,noheader nvidia-smi --query-gpu=memory.used --format=csv,noheader ollama run qwen3.5:9b --quantization Q4_K_M ollama run qwen3.5:9b --quantization Q4_K_M ollama run qwen3.5:9b --quantization Q4_K_M # 1. GPU Driver Check nvidia-smi # 2. CUDA Version Check nvidia-smi | grep "CUDA Version" # 3. OS Type (WSL2 vs. Native Linux) uname -r # 4. Free VRAM Check nvidia-smi --query-gpu=memory.free --format=csv,noheader # 5. Docker GPU Support Check docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # 1. GPU Driver Check nvidia-smi # 2. CUDA Version Check nvidia-smi | grep "CUDA Version" # 3. OS Type (WSL2 vs. Native Linux) uname -r # 4. Free VRAM Check nvidia-smi --query-gpu=memory.free --format=csv,noheader # 5. Docker GPU Support Check docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # 1. GPU Driver Check nvidia-smi # 2. CUDA Version Check nvidia-smi | grep "CUDA Version" # 3. OS Type (WSL2 vs. Native Linux) uname -r # 4. Free VRAM Check nvidia-smi --query-gpu=memory.free --format=csv,noheader # 5. Docker GPU Support Check docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi version: '3.8' services: ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] shm_size: '1gb' volumes: ollama_data: version: '3.8' services: ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] shm_size: '1gb' volumes: ollama_data: version: '3.8' services: ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] shm_size: '1gb' volumes: ollama_data: # ... (installation steps as provided in the chapter) # ... (installation steps as provided in the chapter) # ... (installation steps as provided in the chapter) docker-compose up -d docker-compose up -d docker-compose up -d curl -fsSL https://ollama.com/install.sh | sh ollama serve curl -fsSL https://ollama.com/install.sh | sh ollama serve curl -fsSL https://ollama.com/install.sh | sh ollama serve curl http://localhost:11434/api/generate -d '{"prompt": "Hello, how are you?"}' curl http://localhost:11434/api/generate -d '{"prompt": "Hello, how are you?"}' curl http://localhost:11434/api/generate -d '{"prompt": "Hello, how are you?"}' - qwen3.5:9b (Q4_K_M): 6.6GB (model) + KV cache + Working memory + Framework overhead (Ollama) - Peak VRAM Usage with 4K context: Easily exceeds 10GB, risking OOM on 12GB GPUs - Mid-range newer GPUs (e.g., RTX 4060 Ti 16GB, RX 7700 XT) often outperform older high-end cards due to better architecture. - Use Case Determines VRAM Need: Simple tasks: 6-8GB (e.g., RTX 3060 12GB) Longer contexts: 10-12GB+ Near-cloud tasks: 16GB+ (but overkill for most) - Simple tasks: 6-8GB (e.g., RTX 3060 12GB) - Longer contexts: 10-12GB+ - Near-cloud tasks: 16GB+ (but overkill for most) - Simple tasks: 6-8GB (e.g., RTX 3060 12GB) - Longer contexts: 10-12GB+ - Near-cloud tasks: 16GB+ (but overkill for most) - Driver & Framework Support: NVIDIA: Solid CUDA support (especially RTX 30/40 series) AMD: ROCm support, but limited for advanced features Example Compatibility Issue: qwen3.5:9b Q4_K_M runs on both RTX 4060 Ti and RX 7700 XT, but NVIDIA offers better stability. - NVIDIA: Solid CUDA support (especially RTX 30/40 series) - AMD: ROCm support, but limited for advanced features - Example Compatibility Issue: qwen3.5:9b Q4_K_M runs on both RTX 4060 Ti and RX 7700 XT, but NVIDIA offers better stability. - Quantization Compatibility: Q4_K_M: Robust (CUDA 11.7+) Q5_K_M: Newer drivers required Q6_K, Extreme Quantizations: Limited to newer/higher-end cards Real-World Impact: Testing Q2_K on an older GTX 1080 resulted in consistent segfaults. - Q4_K_M: Robust (CUDA 11.7+) - Q5_K_M: Newer drivers required - Q6_K, Extreme Quantizations: Limited to newer/higher-end cards - Real-World Impact: Testing Q2_K on an older GTX 1080 resulted in consistent segfaults. - NVIDIA: Solid CUDA support (especially RTX 30/40 series) - AMD: ROCm support, but limited for advanced features - Example Compatibility Issue: qwen3.5:9b Q4_K_M runs on both RTX 4060 Ti and RX 7700 XT, but NVIDIA offers better stability. - Q4_K_M: Robust (CUDA 11.7+) - Q5_K_M: Newer drivers required - Q6_K, Extreme Quantizations: Limited to newer/higher-end cards - Real-World Impact: Testing Q2_K on an older GTX 1080 resulted in consistent segfaults. - Environment isolation - Easy backup & migration - Resource limits - Fast recovery - Download & Run Model: ollama run qwen3.5:9b - List Models: ollama list - Remove Model: ollama rm <model_name> - Model Info: ollama show <model_name> - API Call Example - Product Link for Advanced Setup Guides: https://jacksonfire526.gumroad.com?utm_source=devto&utm_medium=article&utm_campaign=2026-04-02-local-agent-playbook - Free Resource: GPU Compatibility Checker Script: https://jacksonfire526.gumroad.com/l/cdliu?utm_source=devto&utm_medium=article&utm_campaign=2026-04-02-local-agent-playbook