Tools: Run Claude Code for 99% Less With Ollama and OpenRouter (2026)

Tools: Run Claude Code for 99% Less With Ollama and OpenRouter (2026)

What Actually Changed (And Why It Matters)

Approach 1: Ollama — Free, Local, Unlimited

Prerequisites

Step 1: Install Ollama

Step 2: Pull a Coding Model

Step 3: Configure Claude Code to Use Ollama

Step 4: Verify It Works

Ollama Troubleshooting

Approach 2: OpenRouter — Cloud Models, Pennies Per Request

Step 1: Get an OpenRouter API Key

Step 2: Configure Claude Code

Step 3: Pick the Right Model for Your Task

The Tradeoffs: What You Gain and What You Lose

What You Keep ✅

What You Lose ❌

The Pro Setup: Switching Models on the Fly

What the Community Is Building

Which Approach Should You Pick?

The Bigger Picture At 12 PM Pacific today, Anthropic flipped the switch. Claude Max subscriptions — the $100/month and $200/month plans that gave you unlimited Opus 4.6 — no longer work with third-party tools like OpenClaw, Cline, or any harness outside Anthropic's own apps. If you were running Claude Code through a third-party client on your Max subscription, it stopped working this afternoon. The announcement came from Boris Cherny, Claude Code's creator, and was confirmed across multiple channels. The reaction was immediate: two separate tutorial videos dropped within hours, the "free Claude Code" community mobilized, and Hugging Face's CEO started posting CLI commands to run open-source models as direct replacements. ⚠️ The April 4 cutoff is real. Starting today at 12 PM PT, Claude Max subscriptions no longer cover usage on third-party tools. You need an API key from here on — which means per-token billing instead of a flat monthly fee. For heavy users, this could mean going from $100/month to $500-2,000/month overnight. But here's the thing Claude Code's harness doesn't care which model powers it. The agent framework — the file reading, code writing, git integration, terminal execution — is separate from the language model underneath. Swap the model, keep the workflow. That's exactly what we're going to do. This guide covers two approaches: Ollama (completely free, runs on your machine) and OpenRouter (pennies per request, cloud-hosted). Both work today. Both are tested. And both will save you 90-99% compared to API pricing. Let's be precise about what happened. Anthropic didn't shut down Claude Code. They didn't change the API. What they did was decouple the Max subscription from third-party tool access. Previously, your $100/month Max plan gave you unlimited Claude Opus 4.6 usage — and that included any tool that could authenticate through your Anthropic account. Power users on OpenClaw were getting hundreds of dollars worth of API calls for a flat fee. From Anthropic's perspective, these users were "freeloading at scale," as one analyst put it. Now, third-party tools require an API key with per-token billing: For a typical coding session — 50,000 input tokens and 10,000 output tokens — that's roughly $1.50 per session with Opus or $0.30 with Sonnet. Do 10 sessions a day and you're looking at $450/month with Opus. Heavy users report $1,000+ monthly bills on the API. 📊 The cost math that triggered the migration: Light users (5 sessions/day) go from $100/mo on Max to ~$225/mo on API Opus — or $0 with Ollama and ~$5/mo on OpenRouter. Heavy users (20+/day) face ~$900+/mo on API vs. still $0 locally. Power users who coded all day on Max? Looking at $2,000+/mo on the API. Ollama costs = electricity only. OpenRouter costs assume using capable free-tier or low-cost models like Qwen3.5, Gemma 4, or DeepSeek. The community response has been swift. Nate Herk published two tutorials the same day. Clément Delangue (Hugging Face CEO) posted literal CLI commands to run Gemma 4 locally as a Claude replacement. The "free Claude Code" tutorial is becoming its own genre. Ollama is an open-source tool that runs large language models on your own hardware. No API keys. No billing. No data leaving your machine. You download a model, point Claude Code at it, and you're coding. Start the Ollama server: This runs in the background and exposes a local API at http://localhost:11434. Not all models are equal for code generation. Here's what works well: ⚡ Which model should you pick? If you have 32GB+ RAM (like a MacBook Pro M2/M3/M4), go with qwen3.5:35b — it's the closest to Claude Sonnet quality for code. If you're on 16GB, gemma4:26b is excellent thanks to its MoE architecture (only 4B parameters are active at any time, so it runs fast despite the large model size). On 8GB, stick to qwen3.5:14b. Claude Code reads its model configuration from environment variables. Set these before launching: To make this permanent, add those exports to your ~/.zshrc or ~/.bashrc: You should see Claude Code launch normally. Try a simple prompt: If it generates code, reads files, and executes commands — you're running Claude Code for free. If your machine can't run local models (or you want frontier-quality output without the $15/MTok Opus price), OpenRouter is the play. It's a unified API that routes to 100+ models from different providers — many of them free or near-free. OpenRouter's strength is model selection. Match the model to the work: 💡 The hybrid strategy: Use a cheap model (Qwen 3.5 or Gemma 4) for routine coding, file exploration, and test writing. Switch to Sonnet 4.5 via OpenRouter only when you need frontier reasoning — complex refactors, subtle bugs, architecture decisions. This drops your average cost by 80-90% compared to running Opus for everything. Let's be honest about what you're giving up. This isn't a free lunch — it's a different lunch at a different price point. With Ollama (local models): Power users don't pick one approach. They set up aliases to switch between models depending on the task: Now you can type claude-local for free coding sessions, claude-cheap for daily work, and claude-opus only when you're tackling something that genuinely needs frontier intelligence. The "free Claude Code" movement isn't just about cost savings — it's about resilience. When your workflow depends on a single provider's pricing decisions, you're one announcement away from a 10x cost increase. Today proved that. This is a pattern we've seen before. Every time a closed provider tightens access, the open-source alternative gets a growth spike. The difference now is that open-source coding models are genuinely competitive — Gemma 4's 31B dense model ranked #3 on Arena AI's text leaderboard, and Qwen 3.5's coding variants are approaching Sonnet-level quality on SWE-bench. 📊 The open-source quality gap is closing fast: Qwen 3.6-Plus hits SWE-bench 78.8 (vs. Claude Opus 4.5's 80.9). Gemma 4 31B ranks #3 open model globally at ELO ~1452. DeepSeek V3.2 delivers strong reasoning at $0.27/MTok. Six months ago, the best open model scored ~65 on SWE-bench. The gap went from 25% to 3%. Today's announcement is a business decision, not a technical one. Anthropic is profitable on API usage and losing money on Max subscribers who use third-party tools heavily. The subsidy had to end. But the unintended consequence is acceleration. Every developer who sets up Ollama today is one more developer who knows how to run local models. Every OpenRouter account created this week is one more developer who understands model routing and cost optimization. The lock-in weakens with every migration guide that gets published. Claude Code as a harness is still excellent — arguably the best agent framework available. But the model powering it? That's now a commodity. Compare the options, pick the right tool for each task, and don't pay $15/MTok for work that a $0 local model handles just fine. The 99% cost reduction is real. The tradeoffs are real too. Now you know both sides. Running Claude Code with alternative models and want to share your setup? We're collecting community configurations — reach out via our GitHub. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

# macOS -weight: 500;">brew -weight: 500;">install ollama # Linux -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # Windows (via WSL2) -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # macOS -weight: 500;">brew -weight: 500;">install ollama # Linux -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # Windows (via WSL2) -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # macOS -weight: 500;">brew -weight: 500;">install ollama # Linux -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh # Windows (via WSL2) -weight: 500;">curl -fsSL https://ollama.com/-weight: 500;">install.sh | sh ollama serve ollama serve ollama serve # Best overall coding model for local use (35B, needs 24GB+ RAM) ollama pull qwen3.5:35b # Great MoE option — only 4B active params, runs on 16GB (26B total) ollama pull gemma4:26b # Smaller but capable (needs 8GB+ RAM) ollama pull qwen3.5:14b # Budget option — runs on almost anything (needs 4GB+ RAM) ollama pull qwen3.5:7b # Best overall coding model for local use (35B, needs 24GB+ RAM) ollama pull qwen3.5:35b # Great MoE option — only 4B active params, runs on 16GB (26B total) ollama pull gemma4:26b # Smaller but capable (needs 8GB+ RAM) ollama pull qwen3.5:14b # Budget option — runs on almost anything (needs 4GB+ RAM) ollama pull qwen3.5:7b # Best overall coding model for local use (35B, needs 24GB+ RAM) ollama pull qwen3.5:35b # Great MoE option — only 4B active params, runs on 16GB (26B total) ollama pull gemma4:26b # Smaller but capable (needs 8GB+ RAM) ollama pull qwen3.5:14b # Budget option — runs on almost anything (needs 4GB+ RAM) ollama pull qwen3.5:7b # Point Claude Code at your local Ollama instance export ANTHROPIC_BASE_URL="http://localhost:11434/v1" export ANTHROPIC_API_KEY="ollama" # Ollama doesn't need a real key export CLAUDE_CODE_MODEL="qwen3.5:35b" # Match the model you pulled # Now launch Claude Code normally claude # Point Claude Code at your local Ollama instance export ANTHROPIC_BASE_URL="http://localhost:11434/v1" export ANTHROPIC_API_KEY="ollama" # Ollama doesn't need a real key export CLAUDE_CODE_MODEL="qwen3.5:35b" # Match the model you pulled # Now launch Claude Code normally claude # Point Claude Code at your local Ollama instance export ANTHROPIC_BASE_URL="http://localhost:11434/v1" export ANTHROPIC_API_KEY="ollama" # Ollama doesn't need a real key export CLAUDE_CODE_MODEL="qwen3.5:35b" # Match the model you pulled # Now launch Claude Code normally claude echo 'export ANTHROPIC_BASE_URL="http://localhost:11434/v1"' >> ~/.zshrc echo 'export ANTHROPIC_API_KEY="ollama"' >> ~/.zshrc echo 'export CLAUDE_CODE_MODEL="qwen3.5:35b"' >> ~/.zshrc source ~/.zshrc echo 'export ANTHROPIC_BASE_URL="http://localhost:11434/v1"' >> ~/.zshrc echo 'export ANTHROPIC_API_KEY="ollama"' >> ~/.zshrc echo 'export CLAUDE_CODE_MODEL="qwen3.5:35b"' >> ~/.zshrc source ~/.zshrc echo 'export ANTHROPIC_BASE_URL="http://localhost:11434/v1"' >> ~/.zshrc echo 'export ANTHROPIC_API_KEY="ollama"' >> ~/.zshrc echo 'export CLAUDE_CODE_MODEL="qwen3.5:35b"' >> ~/.zshrc source ~/.zshrc Create a Python function that calculates the Fibonacci sequence using dynamic programming. Include type hints and docstring. Create a Python function that calculates the Fibonacci sequence using dynamic programming. Include type hints and docstring. Create a Python function that calculates the Fibonacci sequence using dynamic programming. Include type hints and docstring. # Point Claude Code at OpenRouter export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" export ANTHROPIC_API_KEY="sk-or-v1-your-key-here" # Pick your model — here are the best options: export CLAUDE_CODE_MODEL="qwen/qwen3.5-coder-next" # Strong coder, ~$0.50/MTok # export CLAUDE_CODE_MODEL="google/gemma-4-31b" # Free tier available # export CLAUDE_CODE_MODEL="deepseek/deepseek-v3.2" # Great reasoning, ~$0.27/MTok # export CLAUDE_CODE_MODEL="anthropic/claude-sonnet-4.5" # Full Claude, but cheaper than direct API claude # Point Claude Code at OpenRouter export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" export ANTHROPIC_API_KEY="sk-or-v1-your-key-here" # Pick your model — here are the best options: export CLAUDE_CODE_MODEL="qwen/qwen3.5-coder-next" # Strong coder, ~$0.50/MTok # export CLAUDE_CODE_MODEL="google/gemma-4-31b" # Free tier available # export CLAUDE_CODE_MODEL="deepseek/deepseek-v3.2" # Great reasoning, ~$0.27/MTok # export CLAUDE_CODE_MODEL="anthropic/claude-sonnet-4.5" # Full Claude, but cheaper than direct API claude # Point Claude Code at OpenRouter export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" export ANTHROPIC_API_KEY="sk-or-v1-your-key-here" # Pick your model — here are the best options: export CLAUDE_CODE_MODEL="qwen/qwen3.5-coder-next" # Strong coder, ~$0.50/MTok # export CLAUDE_CODE_MODEL="google/gemma-4-31b" # Free tier available # export CLAUDE_CODE_MODEL="deepseek/deepseek-v3.2" # Great reasoning, ~$0.27/MTok # export CLAUDE_CODE_MODEL="anthropic/claude-sonnet-4.5" # Full Claude, but cheaper than direct API claude # Add to ~/.zshrc or ~/.bashrc # Free local model — for exploration, simple tasks alias claude-local='ANTHROPIC_BASE_URL="http://localhost:11434/v1" ANTHROPIC_API_KEY="ollama" CLAUDE_CODE_MODEL="qwen3.5:35b" claude' # Cheap cloud model — for feature development alias claude-cheap='ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" ANTHROPIC_API_KEY="sk-or-v1-YOUR-KEY" CLAUDE_CODE_MODEL="qwen/qwen3.5-coder-next" claude' # Full Claude Sonnet — when quality matters alias claude-sonnet='ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" ANTHROPIC_API_KEY="sk-or-v1-YOUR-KEY" CLAUDE_CODE_MODEL="anthropic/claude-sonnet-4.5" claude' # Direct Anthropic API — when you need Opus alias claude-opus='ANTHROPIC_API_KEY="sk-ant-YOUR-KEY" CLAUDE_CODE_MODEL="claude-opus-4-6" claude' # Add to ~/.zshrc or ~/.bashrc # Free local model — for exploration, simple tasks alias claude-local='ANTHROPIC_BASE_URL="http://localhost:11434/v1" ANTHROPIC_API_KEY="ollama" CLAUDE_CODE_MODEL="qwen3.5:35b" claude' # Cheap cloud model — for feature development alias claude-cheap='ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" ANTHROPIC_API_KEY="sk-or-v1-YOUR-KEY" CLAUDE_CODE_MODEL="qwen/qwen3.5-coder-next" claude' # Full Claude Sonnet — when quality matters alias claude-sonnet='ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" ANTHROPIC_API_KEY="sk-or-v1-YOUR-KEY" CLAUDE_CODE_MODEL="anthropic/claude-sonnet-4.5" claude' # Direct Anthropic API — when you need Opus alias claude-opus='ANTHROPIC_API_KEY="sk-ant-YOUR-KEY" CLAUDE_CODE_MODEL="claude-opus-4-6" claude' # Add to ~/.zshrc or ~/.bashrc # Free local model — for exploration, simple tasks alias claude-local='ANTHROPIC_BASE_URL="http://localhost:11434/v1" ANTHROPIC_API_KEY="ollama" CLAUDE_CODE_MODEL="qwen3.5:35b" claude' # Cheap cloud model — for feature development alias claude-cheap='ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" ANTHROPIC_API_KEY="sk-or-v1-YOUR-KEY" CLAUDE_CODE_MODEL="qwen/qwen3.5-coder-next" claude' # Full Claude Sonnet — when quality matters alias claude-sonnet='ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" ANTHROPIC_API_KEY="sk-or-v1-YOUR-KEY" CLAUDE_CODE_MODEL="anthropic/claude-sonnet-4.5" claude' # Direct Anthropic API — when you need Opus alias claude-opus='ANTHROPIC_API_KEY="sk-ant-YOUR-KEY" CLAUDE_CODE_MODEL="claude-opus-4-6" claude' # Exploring a new codebase? Free. claude-local # Building a feature? Pennies. claude-cheap # Debugging a race condition in your distributed system? Worth paying for. claude-opus # Exploring a new codebase? Free. claude-local # Building a feature? Pennies. claude-cheap # Debugging a race condition in your distributed system? Worth paying for. claude-opus # Exploring a new codebase? Free. claude-local # Building a feature? Pennies. claude-cheap # Debugging a race condition in your distributed system? Worth paying for. claude-opus - Claude Opus 4.6: $15 per million input tokens, $75 per million output tokens - Claude Sonnet 4.5: $3 per million input tokens, $15 per million output tokens - macOS, Linux, or Windows (with WSL2) - 16GB+ RAM (32GB recommended for larger models) - ~20GB free disk space per model - A reasonably modern CPU — Apple Silicon (M1+) or a recent AMD/Intel with AVX2 - Go to openrouter.ai - Create an account (free) - Generate an API key from your dashboard - Add credits — $5 will last weeks for most users - The Claude Code harness — file reading, code writing, -weight: 500;">git operations, shell commands, the entire agent workflow - Multi-file editing — Claude Code's ability to work across your whole project - CLAUDE.md and hooks — your project context and automation rules still work - Terminal UI — same interface, same commands, same muscle memory - Raw intelligence drops. Qwen 3.5 35B is ~85% of Claude Sonnet on coding benchmarks. For complex multi-step reasoning, you'll notice the gap. The hidden cost of cheaper reasoning models is real — they make more subtle mistakes. - Context window shrinks. Most local models max out at 32K-128K tokens vs. Claude's 1M. For large codebases, this means Claude Code can't hold your entire project in context simultaneously. - Speed varies wildly. On an M4 Max, Qwen 3.5 35B runs at ~25 tok/s. On an older Intel MacBook, you might get 3-5 tok/s. Opus via API gives you ~80 tok/s consistently. - Your machine is busy. Running a 35B model uses 20-30GB of RAM and significant CPU/GPU. Don't expect to be running other heavy workloads simultaneously. - Latency is higher. Requests route through OpenRouter's proxy, adding 100-500ms per request compared to direct API calls. - Free models have rate limits. The free tier on models like Gemma 4 restricts requests per minute. Heavy sessions will hit these. - Model availability isn't guaranteed. If a provider goes down, that model goes down with it. OpenRouter's routing helps, but it's not immune. - You have 16GB+ RAM (32GB ideal) - Privacy matters — your code never leaves your machine - You do mostly routine coding (CRUD, scripts, tests, frontend) - You want zero ongoing costs - You're comfortable with ~85% of Claude's quality for most tasks - Your machine can't run large models (8GB laptop, Chromebook) - You want access to multiple model providers through one API - You need near-frontier quality but can't justify Opus pricing - You want the flexibility to switch models per task - You're OK with $5-25/month instead of $0 - You're a power user who wants the alias-switching setup above - Use local models for exploration and simple tasks (free) - Route to cloud models for complex work (cheap) - Only pay full Anthropic API rates for genuinely hard problems (rare)