Tools: How to Run Local LLMs for Coding (No Cloud, No API Keys) - 2025 Update

Tools: How to Run Local LLMs for Coding (No Cloud, No API Keys) - 2025 Update

Why Local LLMs Actually Make Sense Now

Step 1: Install Ollama

Step 2: Pull a Coding Model

Step 3: Test It Works

Step 4: Connect to Your Editor

VS Code with Continue

Neovim with gen.nvim

Step 5: API Integration for Scripts

Performance Tuning

Model Recommendations by Use Case

What Local LLMs Won't Do

Wrapping Up I stopped sending my code to external APIs six months ago. Not for privacy reasons—though that's a nice bonus—but because local LLMs for coding have gotten genuinely good. Here's how to set up a complete local AI coding assistant in under 20 minutes. No subscriptions. No rate limits. No sending your proprietary code to someone else's servers. The gap between cloud models and local ones has shrunk dramatically. For most coding tasks—autocomplete, explaining code, writing tests, refactoring—a well-tuned 7B or 14B model running locally performs within 80-90% of GPT-4. That remaining 10-20%? It's usually in complex multi-file reasoning or obscure language edge cases. For daily coding, local models handle it fine. Ollama is the easiest way to run local LLMs. One binary, handles model downloads, provides an API. Windows:

Download from ollama.com and run the installer. Not all models are created equal for code. Here's what actually works: Best all-rounder (7B, runs on 8GB RAM): Better quality, needs 16GB RAM: Best local coding model (needs 32GB RAM): My daily driver is deepseek-coder:6.7b-instruct. Fast, accurate, fits in memory alongside my IDE and browser. You should see it generate code within seconds. If it's slow, you're either memory-constrained or need to close some Chrome tabs. Continue is the best free extension for local LLM integration. Ollama exposes a REST API on port 11434. Use it in your tooling: If responses are slow: Use a smaller context window: Enable GPU acceleration (if you have NVIDIA): Most 7B models run fine on CPU with 16GB RAM. For 13B+, you really want a GPU. Start small. The 6.7B model handles 90% of daily tasks. Scale up when you hit limits. Be realistic about limitations: For those cases, I keep a cloud API as backup. But 80% of my AI-assisted coding now runs locally. The setup takes 15 minutes. The models are free. The privacy is a bonus. If you're still paying for Copilot and only use it for autocomplete and simple explanations, try this for a week. You might not go back. More at dev.to/cumulus Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh ollama --version ollama --version ollama --version ollama pull deepseek-coder:6.7b-instruct ollama pull deepseek-coder:6.7b-instruct ollama pull deepseek-coder:6.7b-instruct ollama pull codellama:13b-instruct ollama pull codellama:13b-instruct ollama pull codellama:13b-instruct ollama pull deepseek-coder:33b-instruct ollama pull deepseek-coder:33b-instruct ollama pull deepseek-coder:33b-instruct ollama run deepseek-coder:6.7b-instruct "Write a Python function to validate email addresses using regex" ollama run deepseek-coder:6.7b-instruct "Write a Python function to validate email addresses using regex" ollama run deepseek-coder:6.7b-instruct "Write a Python function to validate email addresses using regex" { "models": [ { "title": "DeepSeek Local", "provider": "ollama", "model": "deepseek-coder:6.7b-instruct" } ], "tabAutocompleteModel": { "title": "DeepSeek Autocomplete", "provider": "ollama", "model": "deepseek-coder:6.7b-instruct" } } { "models": [ { "title": "DeepSeek Local", "provider": "ollama", "model": "deepseek-coder:6.7b-instruct" } ], "tabAutocompleteModel": { "title": "DeepSeek Autocomplete", "provider": "ollama", "model": "deepseek-coder:6.7b-instruct" } } { "models": [ { "title": "DeepSeek Local", "provider": "ollama", "model": "deepseek-coder:6.7b-instruct" } ], "tabAutocompleteModel": { "title": "DeepSeek Autocomplete", "provider": "ollama", "model": "deepseek-coder:6.7b-instruct" } } -- In your lazy.nvim config { "David-Kunz/gen.nvim", opts = { model = "deepseek-coder:6.7b-instruct", host = "localhost", port = "11434", } } -- In your lazy.nvim config { "David-Kunz/gen.nvim", opts = { model = "deepseek-coder:6.7b-instruct", host = "localhost", port = "11434", } } -- In your lazy.nvim config { "David-Kunz/gen.nvim", opts = { model = "deepseek-coder:6.7b-instruct", host = "localhost", port = "11434", } } import requests def ask_llm(prompt: str) -> str: response = requests.post( "http://localhost:11434/api/generate", json={ "model": "deepseek-coder:6.7b-instruct", "prompt": prompt, "stream": False } ) return response.json()["response"] # Generate a test code = open("my_module.py").read() tests = ask_llm(f"Write pytest tests for this code:\n\n{code}") print(tests) import requests def ask_llm(prompt: str) -> str: response = requests.post( "http://localhost:11434/api/generate", json={ "model": "deepseek-coder:6.7b-instruct", "prompt": prompt, "stream": False } ) return response.json()["response"] # Generate a test code = open("my_module.py").read() tests = ask_llm(f"Write pytest tests for this code:\n\n{code}") print(tests) import requests def ask_llm(prompt: str) -> str: response = requests.post( "http://localhost:11434/api/generate", json={ "model": "deepseek-coder:6.7b-instruct", "prompt": prompt, "stream": False } ) return response.json()["response"] # Generate a test code = open("my_module.py").read() tests = ask_llm(f"Write pytest tests for this code:\n\n{code}") print(tests) ollama run deepseek-coder:6.7b-instruct --num-ctx 2048 ollama run deepseek-coder:6.7b-instruct --num-ctx 2048 ollama run deepseek-coder:6.7b-instruct --num-ctx 2048 # Should auto-detect, but verify nvidia-smi # Should auto-detect, but verify nvidia-smi # Should auto-detect, but verify nvidia-smi - Zero latency dependency — Works offline, on planes, in cafes with garbage wifi - No token costs — Run it 1000 times a day, costs nothing - Privacy — Your code stays on your machine - Customization — Fine-tune on your codebase if you want - Install Continue from VS Code marketplace - Open settings (Ctrl+Shift+P → "Continue: Open Config") - Add this config: - Inline autocomplete (like Copilot) - Chat sidebar for questions - Cmd+L to explain selected code - Pre-commit hooks that generate test stubs - Documentation generators - Code review bots in CI - Large codebase understanding — They can't hold 50 files in context - Cutting-edge frameworks — Training data has a cutoff - Complex debugging — Claude and GPT-4 still win here