Tools
Choosing the Right LLM for Cognee: Local Ollama Setup
2025-12-24
0 views
admin
Key Cognee Requirements ## Model Comparison Table ## Recommendations by Hardware ## Quick Ollama + Cognee Setup ## Embedding models ## Useful links Choosing the Best LLM for Cognee demands balancing graph-building quality, hallucination rates, and hardware constraints.
Cognee excels with larger, low-hallucination models (32B+) via Ollama but mid-size options work for lighter setups. Cognee relies on the LLM for entity extraction, relation inference, and metadata generation. Models under 32B often produce noisy graphs, while high hallucination (e.g., 90%+) pollutes nodes/edges, degrading retrieval. Official docs recommend deepseek-r1:32b or llama3.3-70b-instruct-q3_K_M paired with Mistral embeddings. Data synthesized from Cognee docs, model cards, and benchmarks, the hallucination level data even though looks out of wack, might be not far off... Match embedding dims (e.g., 768, 1024) across config and vector store. Qwen3 Embeddings (unproven in Cognee) could work at 1024-4096 dims if Ollama-supported. Prioritize low-hallucination models for production Cognee pipelines—your graphs will thank you.
Test on your hardware and monitor graph coherence. Didn't think much on this one, but here is a table I brought together, for future reference Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
# 1. Pull model (e.g., Devstral)
ollama pull devstral-small-2:24b # or qwen3:14b, etc. # 2. Install Cognee
pip install "cognee[ollama]" # 3. Env vars
export LLM_PROVIDER="ollama"
export LLM_MODEL="devstral-small-2:24b"
export EMBEDDING_PROVIDER="ollama"
export EMBEDDING_MODEL="nomic-embed-text" # 768 dims
export EMBEDDING_DIMENSIONS=768 # 4. Test graph
cognee add --file "your_data.txt" --name "test_graph" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# 1. Pull model (e.g., Devstral)
ollama pull devstral-small-2:24b # or qwen3:14b, etc. # 2. Install Cognee
pip install "cognee[ollama]" # 3. Env vars
export LLM_PROVIDER="ollama"
export LLM_MODEL="devstral-small-2:24b"
export EMBEDDING_PROVIDER="ollama"
export EMBEDDING_MODEL="nomic-embed-text" # 768 dims
export EMBEDDING_DIMENSIONS=768 # 4. Test graph
cognee add --file "your_data.txt" --name "test_graph" COMMAND_BLOCK:
# 1. Pull model (e.g., Devstral)
ollama pull devstral-small-2:24b # or qwen3:14b, etc. # 2. Install Cognee
pip install "cognee[ollama]" # 3. Env vars
export LLM_PROVIDER="ollama"
export LLM_MODEL="devstral-small-2:24b"
export EMBEDDING_PROVIDER="ollama"
export EMBEDDING_MODEL="nomic-embed-text" # 768 dims
export EMBEDDING_DIMENSIONS=768 # 4. Test graph
cognee add --file "your_data.txt" --name "test_graph" - High-end (32GB+ VRAM): Deepseek-r1:32b or Llama3.3-70b. These yield the cleanest graphs per Cognee guidance.
- Mid-range (16-24GB VRAM): Devstral Small 2. Low hallucination and coding prowess suit structured memory tasks.
- Budget (12-16GB VRAM): Qwen3:14b over gpt-oss:20b - avoid 91% hallucination pitfalls.
- Thinking to avoid gpt-oss:20b for Cognee; there are notes that its errors amplify in unfiltered graph construction. But the inferrence speed on my GPU is 2+ times faster.... - https://docs.cognee.ai/how_to_guides/local_models
- https://docs.cognee.ai/setup-configuration/embedding-providers
- https://arxiv.org/html/2508.10925v1
- https://github.com/vectara/hallucination-leaderboard
- https://ollama.com/library/nomic-embed-text-v2-moe
- Qwen3 Embedding
- How to Move Ollama Models to Different Drive or Folder
- Ollama cheatsheet
- Using a Neo4j Graph Database to Power the Internet of Things
how-totutorialguidedev.toaimlllmgptnodedatabasegitgithub