Claude Code → talks to → Ollama (local server) → runs → Your model (no Anthropic servers involved)
Claude Code → talks to → Ollama (local server) → runs → Your model (no Anthropic servers involved)
Claude Code → talks to → Ollama (local server) → runs → Your model (no Anthropic servers involved)
Without a model: Ollama = empty server, useless
With a model: Ollama = fully local AI, free forever
Without a model: Ollama = empty server, useless
With a model: Ollama = fully local AI, free forever
Without a model: Ollama = empty server, useless
With a model: Ollama = fully local AI, free forever
Model fits in VRAM → GPU handles everything → Very fast ✅
Model too big for VRAM → spills into system RAM → Slower ⚠️
Model fits in VRAM → GPU handles everything → Very fast ✅
Model too big for VRAM → spills into system RAM → Slower ⚠️
Model fits in VRAM → GPU handles everything → Very fast ✅
Model too big for VRAM → spills into system RAM → Slower ⚠️
ollama pull gemma4
ollama pull gemma4
ollama pull gemma4
┌─────────────────────────────────────────────────────┐
│ YOUR COMPUTER │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Claude Code │───▶│ Ollama │ │
│ │ (terminal) │ │ :11434 (API) │ │
│ └─────────────┘ └──────┬───────┘ │
│ │ │
│ ┌─────────────┐ ┌──────▼───────┐ │
│ │ Open WebUI │───▶│ Gemma4 │ │
│ │ (browser) │ │ (the brain) │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ │
│ │ Python API │───▶ http://localhost:11434 │
│ │ scripts │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────┘ Zero data leaves your machine
┌─────────────────────────────────────────────────────┐
│ YOUR COMPUTER │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Claude Code │───▶│ Ollama │ │
│ │ (terminal) │ │ :11434 (API) │ │
│ └─────────────┘ └──────┬───────┘ │
│ │ │
│ ┌─────────────┐ ┌──────▼───────┐ │
│ │ Open WebUI │───▶│ Gemma4 │ │
│ │ (browser) │ │ (the brain) │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ │
│ │ Python API │───▶ http://localhost:11434 │
│ │ scripts │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────┘ Zero data leaves your machine
┌─────────────────────────────────────────────────────┐
│ YOUR COMPUTER │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Claude Code │───▶│ Ollama │ │
│ │ (terminal) │ │ :11434 (API) │ │
│ └─────────────┘ └──────┬───────┘ │
│ │ │
│ ┌─────────────┐ ┌──────▼───────┐ │
│ │ Open WebUI │───▶│ Gemma4 │ │
│ │ (browser) │ │ (the brain) │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ │
│ │ Python API │───▶ http://localhost:11434 │
│ │ scripts │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────┘ Zero data leaves your machine
ollama run gemma4 --ctx-size 32768
ollama run gemma4 --ctx-size 32768
ollama run gemma4 --ctx-size 32768
Claude (this chat) → Has web search tool → Knows current events ✅
Gemma4 (local) → No internet → Knowledge frozen at training ❌
Claude (this chat) → Has web search tool → Knows current events ✅
Gemma4 (local) → No internet → Knowledge frozen at training ❌
Claude (this chat) → Has web search tool → Knows current events ✅
Gemma4 (local) → No internet → Knowledge frozen at training ❌
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
ollama launch claude
ollama launch claude
ollama launch claude
gemma4 · API Usage Billing · [email protected]'s Organization
gemma4 · API Usage Billing · [email protected]'s Organization
gemma4 · API Usage Billing · [email protected]'s Organization
import requests def chat(prompt): response = requests.post( "http://localhost:11434/api/generate", json={ "model": "gemma4", "prompt": prompt, "stream": False } ) return response.json()["response"] print(chat("Write a hello world in ascii diagram of moon and earth"))
import requests def chat(prompt): response = requests.post( "http://localhost:11434/api/generate", json={ "model": "gemma4", "prompt": prompt, "stream": False } ) return response.json()["response"] print(chat("Write a hello world in ascii diagram of moon and earth"))
import requests def chat(prompt): response = requests.post( "http://localhost:11434/api/generate", json={ "model": "gemma4", "prompt": prompt, "stream": False } ) return response.json()["response"] print(chat("Write a hello world in ascii diagram of moon and earth"))
( ) / \ ----(---O---) (------) <-- Orbit Path / / \ / / \
| | | | | | |
( ) / \ ----(---O---) (------) <-- Orbit Path / / \ / / \
| | | | | | |
( ) / \ ----(---O---) (------) <-- Orbit Path / / \ / / \
| | | | | | |
ERROR: Could not find a version that satisfies the requirement open-webui
ERROR: Could not find a version that satisfies the requirement open-webui
ERROR: Could not find a version that satisfies the requirement open-webui
docker run -d ` -p 127.0.0.1:3000:8080 ` --name open-webui ` -v open-webui:/app/backend/data ` --add-host=host.docker.internal:host-gateway ` ghcr.io/open-webui/open-webui:main
docker run -d ` -p 127.0.0.1:3000:8080 ` --name open-webui ` -v open-webui:/app/backend/data ` --add-host=host.docker.internal:host-gateway ` ghcr.io/open-webui/open-webui:main
docker run -d ` -p 127.0.0.1:3000:8080 ` --name open-webui ` -v open-webui:/app/backend/data ` --add-host=host.docker.internal:host-gateway ` ghcr.io/open-webui/open-webui:main
netstat -ano | findstr :3000
# TCP 0.0.0.0:3000 LISTENING ← Docker up and running curl http://localhost:3000
# StatusCode: 200 OK ← Server responding
netstat -ano | findstr :3000
# TCP 0.0.0.0:3000 LISTENING ← Docker up and running curl http://localhost:3000
# StatusCode: 200 OK ← Server responding
netstat -ano | findstr :3000
# TCP 0.0.0.0:3000 LISTENING ← Docker up and running curl http://localhost:3000
# StatusCode: 200 OK ← Server responding
FORK/DOUBLE ATTACK When we attack two or more pieces at the same time then it is known
as fork or double attack Note- Knights are good at making fork.
FORK/DOUBLE ATTACK When we attack two or more pieces at the same time then it is known
as fork or double attack Note- Knights are good at making fork.
FORK/DOUBLE ATTACK When we attack two or more pieces at the same time then it is known
as fork or double attack Note- Knights are good at making fork.
You upload PDF ↓
Open WebUI splits it into chunks ↓
Converts chunks to embeddings (mathematical vectors) ↓
Stores in ChromaDB (local vector database) ↓
You ask a question ↓
ChromaDB finds the most relevant chunks ↓
Sends chunks to Gemma4 as context ↓
Gemma4 answers based on YOUR document
You upload PDF ↓
Open WebUI splits it into chunks ↓
Converts chunks to embeddings (mathematical vectors) ↓
Stores in ChromaDB (local vector database) ↓
You ask a question ↓
ChromaDB finds the most relevant chunks ↓
Sends chunks to Gemma4 as context ↓
Gemma4 answers based on YOUR document
You upload PDF ↓
Open WebUI splits it into chunks ↓
Converts chunks to embeddings (mathematical vectors) ↓
Stores in ChromaDB (local vector database) ↓
You ask a question ↓
ChromaDB finds the most relevant chunks ↓
Sends chunks to Gemma4 as context ↓
Gemma4 answers based on YOUR document
C:\Users\lavan\AppData\Roaming\open-webui\data\ 📁 vector_db ← document embeddings (ChromaDB) 📁 uploads ← original files 📄 webui.db ← chat history (SQLite)
C:\Users\lavan\AppData\Roaming\open-webui\data\ 📁 vector_db ← document embeddings (ChromaDB) 📁 uploads ← original files 📄 webui.db ← chat history (SQLite)
C:\Users\lavan\AppData\Roaming\open-webui\data\ 📁 vector_db ← document embeddings (ChromaDB) 📁 uploads ← original files 📄 webui.db ← chat history (SQLite)
✅ Ollama — model manager and local API server
✅ Gemma4 — the AI model (multimodal, ~12GB)
✅ Claude Code — agentic coding with local model
✅ Open WebUI — browser-based chat interface with document upload
✅ Python API — scripts calling the model directly
✅ Ollama — model manager and local API server
✅ Gemma4 — the AI model (multimodal, ~12GB)
✅ Claude Code — agentic coding with local model
✅ Open WebUI — browser-based chat interface with document upload
✅ Python API — scripts calling the model directly
✅ Ollama — model manager and local API server
✅ Gemma4 — the AI model (multimodal, ~12GB)
✅ Claude Code — agentic coding with local model
✅ Open WebUI — browser-based chat interface with document upload
✅ Python API — scripts calling the model directly - NVIDIA GPU with ~11GB VRAM
- Core i9 processor - Gemini/Claude: More recent training data, larger knowledge base, up-to-date tax law changes
- Gemma4 local: Good foundational knowledge, may be slightly behind on very recent rule changes, but your documents never leave your machine - ✅ File reading and editing across your project
- ✅ Terminal command execution
- ✅ Multi-step agentic coding tasks
- ✅ Git operations
- ✅ MCP connectors and plugins
- ✅ Project context awareness
- ⚠️ Intelligence capped at Gemma4's capability (weaker than Claude Sonnet/Opus) - ✅ Handwritten text extracted accurately
- ✅ Context understood (chess notes)
- ✅ Intelligent follow-up suggested
- ✅ 100% local — image never left my PC - Ollama: ollama.com
- Open WebUI: openwebui.com
- Claude Code: claude.ai/code
- Ollama + Claude Code docs: docs.ollama.com/integrations/claude-code
- Docker Desktop (free): docker.com/products/docker-desktop