Tools: to Run Local LLMs for Coding (No Cloud, No API Keys) How

Tools: to Run Local LLMs for Coding (No Cloud, No API Keys) How

Why Local LLMs for Coding?

The Stack: Ollama + Continue

Step 1: Install Ollama

Step 2: Pull a Coding Model

Step 3: Test It

Step 4: Connect to Your Editor

Real-World Performance

Optimizing Performance

GPU Acceleration

Multiple Models

Memory Management

Free Copilot Alternative? Yes, Actually

Quick Comparison

What's Next I got tired of paying for API calls. Every time I wanted an AI coding assistant, it was another subscription, another API key, another company reading my code. So I went local. Here's exactly how to do it. The tradeoff? You need decent hardware. But if you've got 16GB+ RAM and a GPU from the last few years, you're set. Forget complicated setups. Ollama makes running local models trivially easy, and Continue gives you a VS Code/Cursor-style experience without the cloud dependency. That's it. No Docker, no Python environments, no dependency hell. Not all models are equal for code. Here's what actually works: DeepSeek Coder v2 is genuinely impressive - it rivals GPT-4 for most coding tasks. If you're RAM-constrained, CodeLlama 7B still handles autocomplete and simple generations well. You should get a response in seconds. If it's slow, you're probably swapping to disk - try a smaller model. Here's where it gets good. Install the Continue extension for VS Code: Configure it to use Ollama. Create ~/.continue/config.json: All running locally. Zero API calls. I've been using this setup for three months. Here's the honest assessment: What still needs cloud models: For 80% of daily coding tasks, local is enough. For the other 20%, I still use Claude - but my API bill dropped from $80/month to under $15. If you have an NVIDIA GPU: For AMD GPUs on Linux, Ollama supports ROCm. M1/M2/M3 Macs get Metal acceleration automatically. First load takes 10-30 seconds. After that, it's instant. Models stay loaded in RAM. To unload: Or set automatic unloading in the Ollama config. This setup is a legitimate free Copilot alternative. The autocomplete is comparable, the chat is sometimes better (DeepSeek Coder handles Python and TypeScript particularly well), and you own your data. Is it as good as Copilot Enterprise or Claude? No. But it's free, private, and works offline. For indie devs and privacy-conscious teams, that's the right tradeoff. Local models are improving fast. Six months ago this wasn't viable. Now it's my daily driver. In another year, the gap with cloud models will shrink further. Start with Ollama + Continue. See if it fits your workflow. Worst case, you've lost 15 minutes. Best case, you've cut your AI coding costs to zero. More at dev.to/cumulus Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

# macOS/Linux -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # Windows - download from ollama.com # macOS/Linux -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # Windows - download from ollama.com # macOS/Linux -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # Windows - download from ollama.com # Best overall for coding (needs 16GB+ RAM) ollama pull deepseek-coder-v2:16b # Lighter option (8GB RAM) ollama pull codellama:7b # For code review and explanations ollama pull mistral:7b # Best overall for coding (needs 16GB+ RAM) ollama pull deepseek-coder-v2:16b # Lighter option (8GB RAM) ollama pull codellama:7b # For code review and explanations ollama pull mistral:7b # Best overall for coding (needs 16GB+ RAM) ollama pull deepseek-coder-v2:16b # Lighter option (8GB RAM) ollama pull codellama:7b # For code review and explanations ollama pull mistral:7b ollama run deepseek-coder-v2:16b >>> Write a Python function to parse JSON from a file safely ollama run deepseek-coder-v2:16b >>> Write a Python function to parse JSON from a file safely ollama run deepseek-coder-v2:16b >>> Write a Python function to parse JSON from a file safely { "models": [ { "title": "DeepSeek Coder Local", "provider": "ollama", "model": "deepseek-coder-v2:16b" } ], "tabAutocompleteModel": { "title": "CodeLlama", "provider": "ollama", "model": "codellama:7b" } } { "models": [ { "title": "DeepSeek Coder Local", "provider": "ollama", "model": "deepseek-coder-v2:16b" } ], "tabAutocompleteModel": { "title": "CodeLlama", "provider": "ollama", "model": "codellama:7b" } } { "models": [ { "title": "DeepSeek Coder Local", "provider": "ollama", "model": "deepseek-coder-v2:16b" } ], "tabAutocompleteModel": { "title": "CodeLlama", "provider": "ollama", "model": "codellama:7b" } } # Check if Ollama detects your GPU ollama ps # Should show CUDA if working # Check if Ollama detects your GPU ollama ps # Should show CUDA if working # Check if Ollama detects your GPU ollama ps # Should show CUDA if working # Terminal 1 - for chat ollama serve # Terminal 2 - load models ollama run deepseek-coder-v2:16b # stays in memory # Terminal 1 - for chat ollama serve # Terminal 2 - load models ollama run deepseek-coder-v2:16b # stays in memory # Terminal 1 - for chat ollama serve # Terminal 2 - load models ollama run deepseek-coder-v2:16b # stays in memory ollama -weight: 500;">stop deepseek-coder-v2:16b ollama -weight: 500;">stop deepseek-coder-v2:16b ollama -weight: 500;">stop deepseek-coder-v2:16b - Privacy - Your code never leaves your machine - Cost - Zero ongoing fees after initial setup - Speed - No network latency, works offline - Open VS Code - Extensions → Search "Continue" - Open Continue sidebar (Cmd/Ctrl + L) - Chat with your codebase (Cmd+L) - Inline edits (Cmd+I) - Tab autocomplete - Autocomplete (feels like Copilot) - Explaining code - Writing boilerplate - Simple refactoring - Regex and SQL generation - Complex multi-file changes - Understanding large codebases - Subtle bug detection - Cutting-edge reasoning (still reach for Claude for architecture) - Very large context windows