Tools

Tools: Here is the rewritten article in the format requested: (2026)

2026-04-03 0 views admin

Model Weight vs. Actual VRAM Usage

Bigger Isn't Always Better

GPU Selection Beyond Specs

Pre-Flight Environment Checks

Docker Setup for Reproducible Environments

Ollama Installation & Basic Operations

Method One: Via Docker (Recommended)

Method Two: Direct Host Installation

Essential Ollama Commands Here is the rewritten article in the format requested: 12 GPU Checks That Cut My Local AI Agent Setup Time by 75% Running a local AI agent like qwen3.5:9b on a consumer GPU often ends in errors like "Out of VRAM" or "model loading failed" due to misconfiguration, not insufficient power. My RTX 5070 Ti 16GB initially seemed overkill, but tests revealed VRAM needs aren't linear. Code to Check Actual VRAM Usage (NVIDIA) Honesty Moment: I initially wasted money on an overpowered GPU before realizing a 12GB mid-range card sufficed. Driver & Framework Support: Quantization Compatibility: Safe Quantization Starter Skip these at your peril; they save hours of debugging: Minimal Viable docker-compose.yml (NVIDIA) Installing NVIDIA Container Toolkit (Ubuntu 22.04 Example) Your Turn: What's the most common GPU misconfiguration you've encountered when setting up a local AI agent, and how did you resolve it? Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

nvidia-smi --query-gpu=memory.used --format=csv,noheader nvidia-smi --query-gpu=memory.used --format=csv,noheader nvidia-smi --query-gpu=memory.used --format=csv,noheader ollama run qwen3.5:9b --quantization Q4_K_M ollama run qwen3.5:9b --quantization Q4_K_M ollama run qwen3.5:9b --quantization Q4_K_M # 1. GPU Driver Check nvidia-smi # 2. CUDA Version Check nvidia-smi | grep "CUDA Version" # 3. OS Type (WSL2 vs. Native Linux) uname -r # 4. Free VRAM Check nvidia-smi --query-gpu=memory.free --format=csv,noheader # 5. Docker GPU Support Check docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # 1. GPU Driver Check nvidia-smi # 2. CUDA Version Check nvidia-smi | grep "CUDA Version" # 3. OS Type (WSL2 vs. Native Linux) uname -r # 4. Free VRAM Check nvidia-smi --query-gpu=memory.free --format=csv,noheader # 5. Docker GPU Support Check docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # 1. GPU Driver Check nvidia-smi # 2. CUDA Version Check nvidia-smi | grep "CUDA Version" # 3. OS Type (WSL2 vs. Native Linux) uname -r # 4. Free VRAM Check nvidia-smi --query-gpu=memory.free --format=csv,noheader # 5. Docker GPU Support Check docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi version: '3.8' services: ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] shm_size: '1gb' volumes: ollama_data: version: '3.8' services: ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] shm_size: '1gb' volumes: ollama_data: version: '3.8' services: ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] shm_size: '1gb' volumes: ollama_data: # ... (installation steps as provided in the chapter) # ... (installation steps as provided in the chapter) # ... (installation steps as provided in the chapter) docker-compose up -d docker-compose up -d docker-compose up -d curl -fsSL https://ollama.com/install.sh | sh ollama serve curl -fsSL https://ollama.com/install.sh | sh ollama serve curl -fsSL https://ollama.com/install.sh | sh ollama serve curl http://localhost:11434/api/generate -d '{"prompt": "Hello, how are you?"}' curl http://localhost:11434/api/generate -d '{"prompt": "Hello, how are you?"}' curl http://localhost:11434/api/generate -d '{"prompt": "Hello, how are you?"}' - qwen3.5:9b (Q4_K_M): 6.6GB (model) + KV cache + Working memory + Framework overhead (Ollama) - Peak VRAM Usage with 4K context: Easily exceeds 10GB, risking OOM on 12GB GPUs - Mid-range newer GPUs (e.g., RTX 4060 Ti 16GB, RX 7700 XT) often outperform older high-end cards due to better architecture. - Use Case Determines VRAM Need: Simple tasks: 6-8GB (e.g., RTX 3060 12GB) Longer contexts: 10-12GB+ Near-cloud tasks: 16GB+ (but overkill for most) - Simple tasks: 6-8GB (e.g., RTX 3060 12GB) - Longer contexts: 10-12GB+ - Near-cloud tasks: 16GB+ (but overkill for most) - Simple tasks: 6-8GB (e.g., RTX 3060 12GB) - Longer contexts: 10-12GB+ - Near-cloud tasks: 16GB+ (but overkill for most) - Driver & Framework Support: NVIDIA: Solid CUDA support (especially RTX 30/40 series) AMD: ROCm support, but limited for advanced features Example Compatibility Issue: qwen3.5:9b Q4_K_M runs on both RTX 4060 Ti and RX 7700 XT, but NVIDIA offers better stability. - NVIDIA: Solid CUDA support (especially RTX 30/40 series) - AMD: ROCm support, but limited for advanced features - Example Compatibility Issue: qwen3.5:9b Q4_K_M runs on both RTX 4060 Ti and RX 7700 XT, but NVIDIA offers better stability. - Quantization Compatibility: Q4_K_M: Robust (CUDA 11.7+) Q5_K_M: Newer drivers required Q6_K, Extreme Quantizations: Limited to newer/higher-end cards Real-World Impact: Testing Q2_K on an older GTX 1080 resulted in consistent segfaults. - Q4_K_M: Robust (CUDA 11.7+) - Q5_K_M: Newer drivers required - Q6_K, Extreme Quantizations: Limited to newer/higher-end cards - Real-World Impact: Testing Q2_K on an older GTX 1080 resulted in consistent segfaults. - NVIDIA: Solid CUDA support (especially RTX 30/40 series) - AMD: ROCm support, but limited for advanced features - Example Compatibility Issue: qwen3.5:9b Q4_K_M runs on both RTX 4060 Ti and RX 7700 XT, but NVIDIA offers better stability. - Q4_K_M: Robust (CUDA 11.7+) - Q5_K_M: Newer drivers required - Q6_K, Extreme Quantizations: Limited to newer/higher-end cards - Real-World Impact: Testing Q2_K on an older GTX 1080 resulted in consistent segfaults. - Environment isolation - Easy backup & migration - Resource limits - Fast recovery - Download & Run Model: ollama run qwen3.5:9b - List Models: ollama list - Remove Model: ollama rm <model_name> - Model Info: ollama show <model_name> - API Call Example - Product Link for Advanced Setup Guides: https://jacksonfire526.gumroad.com?utm_source=devto&utm_medium=article&utm_campaign=2026-04-02-local-agent-playbook - Free Resource: GPU Compatibility Checker Script: https://jacksonfire526.gumroad.com/l/cdliu?utm_source=devto&utm_medium=article&utm_campaign=2026-04-02-local-agent-playbook

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsrewrittenarticleformatrequestedweight

More from Tools

Tools: MX Linux Pushes Back Against Age Verification: A Stand for Privacy and Open Source Principles

2026-04-03 0

Tools: Latest: DNS for Sysadmins: The Commands and Tools You Actually Need

2026-04-03 0

Tools: Essential Guide: Resolve.ai Alternative: Open Source AI for Incident Investigation

2026-04-03 0

Tools: We Should Write Java Code Differently: Frictionless Prod (2026)

2026-04-03 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Here is the rewritten article in the format requested: (2026)

Model Weight vs. Actual VRAM Usage

Bigger Isn't Always Better

GPU Selection Beyond Specs

Pre-Flight Environment Checks

Docker Setup for Reproducible Environments

Ollama Installation & Basic Operations

Method One: Via Docker (Recommended)

Method Two: Direct Host Installation

🏷️ Tags

More from Tools

Tools: MX Linux Pushes Back Against Age Verification: A Stand for Privacy and Open Source Principles

Tools: Latest: DNS for Sysadmins: The Commands and Tools You Actually Need

Tools: Essential Guide: Resolve.ai Alternative: Open Source AI for Incident Investigation

Tools: We Should Write Java Code Differently: Frictionless Prod (2026)

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting