Tools

Tools: llm-tldr: Answering "Where is the authentication?" with 100ms - Accuracy and Limitations of Semantic Search

2026-01-29 0 views admin

Tools: llm-tldr: Answering "Where is the authentication?" with 100ms - Accuracy and Limitations of Semantic Search

Source: Dev.to

What is llm-tldr? ## Performance Metrics Claimed by the Official Documentation ## How Semantic Search Works ## Differences from grep ## Testing Environment ## Setup: The Initial Traps ## Trap 1: Semantic Index Not Created ## Trap 2: Path Aliases Not Resolved ## Validating the Power of Semantic Search ## Validation 1: Searching for Authentication Features ## Validation 2: Accuracy of Japanese Queries ## Validation 3: Cross-Functional Search ## Call Graph Expansion Feature ## Features That Did Not Work ## Failure 1: Impact Analysis ## Failure 2: Architecture Analysis ## Performance: The Power of Daemon Mode ## Practicality Evaluation ## Comparison with Serena (Conclusion Only) ## ✅ Highly Effective Cases ## 🤔 Limited Effectiveness Cases ## ❌ Unsuitable Cases ## Reality of Initial Costs ## Required Resources ## Ongoing Maintenance ## MCP Integration: Collaboration with Claude Code ## Setting Up in Claude Code ## Conclusion: Insights from Two Hours of Testing ## Reference Links Originally published on 2026-01-19 Original article (Japanese): llm-tldr: 「認証はどこ？」に100msで答えるセマンティック検索の精度と限界 Have you ever searched through a large codebase wondering, "Where is the authentication feature implemented?" Tools like grep and find can't help if you don't know the function name, and IDE symbol searches require exact matches. llm-tldr is a code analysis tool that features semantic search capabilities, allowing you to search for code using natural language queries. It can identify functions based on vague keywords like "authentication" or "PDF generation" by analyzing actual behavior. In this article, I will report on the accuracy and limitations based on tests conducted in a Next.js project consisting of 269 files. llm-tldr is a tool designed to efficiently provide information from a codebase to LLM (Large Language Model). Instead of passing the entire codebase, it extracts structured information, achieving a 95% reduction in tokens and 155 times faster processing. We will verify how well these numbers can be reproduced in actual projects. The semantic search in llm-tldr encodes each function into a 1024-dimensional vector along with the following information: The embedding model used is bge-large-en-v1.5 (1.3GB), and for vector search, FAISS (Facebook AI Similarity Search) is employed. Traditional tools like grep or ripgrep rely on string matching, so they cannot find anything unless the function name or comment contains "authentication." llm-tldr, on the other hand, vectorizes based on actual behavior (which functions are called, what data is handled), making it independent of comments or variable names. The environment used for testing in an actual project is as follows: Following the official documentation can lead you to an initial trap. Result: 0 code units, and the semantic index was not created. The first execution will take a few minutes to download the 1.3GB model. After that, 516 code units were indexed. Path aliases commonly used in Next.js projects, such as @/lib/..., cannot be correctly resolved by llm-tldr. Result: "importers": [] (empty) This is because llm-tldr does not read the paths setting from tsconfig.json. Currently, there is no workaround, and searches must be done using absolute paths. I confirmed the accuracy with three queries. Evaluation: ✅ As expected. Components related to login ranked highly. A score around 0.65 is reasonable for semantic search similarity. Result: Database-related scripts were appropriately detected. Evaluation: ✅ Japanese queries are effective. Although bge-large-en-v1.5 is an English model, it supports multilingual queries. Evaluation: ✅ Exceeds expectations. It detected not only PDF generation but also related document conversion functions, indicating that data flow and call graph information are effective. By adding the --expand option, the results include function call relationships (calls, called_by, related). This is useful for impact analysis during refactoring. There were features I expected to work but did not function correctly. Result: Function 'fetchVendors' not found in call graph Result: All items were empty. Possible Causes: It may not support the specific structure of Next.js projects (App Router, Server Components, etc.). llm-tldr has a mode that starts a daemon in the background. In practice, the first query took about 2 seconds, while subsequent queries were around 100ms. This is close to the official claim of "155 times faster." Serena is a tool based on the Language Server Protocol (LSP) that excels in "accurate editing and refactoring" using type information and reference resolution. In contrast, llm-tldr is strong in "finding features when you don't know the name" through static analysis with Tree-sitter + vector search. These two tools complement each other rather than compete, and I believe the following practical distinctions can be made: Exploring Large Codebases Impact Investigation Before Refactoring Providing Context to LLMs Architecture Analysis Simple Edits in Single Files Situations Requiring Real-Time Responses The initial costs of implementing llm-tldr cannot be ignored. The index does not update automatically, so maintenance is required using one of the following methods: Method 1: Git Hook (Recommended) Add to .git/hooks/post-commit: After changing 20 files, the daemon will automatically rebuild the semantic index. Method 2: Manual Update llm-tldr can operate as an MCP (Model Context Protocol) server, integrating with Claude Code / Claude Desktop. .claude/settings.json: This allows Claude to automatically use llm-tldr to understand the codebase. In practice, when I tested it, Claude answered the question, "Where is the authentication feature in this project?" based on the semantic search results from llm-tldr. The semantic search of llm-tldr was impressively effective in functioning with natural language queries beyond expectations. It is particularly practical in the following cases: Recommended Use Cases: Personally, I feel that llm-tldr has substantial value for implementation in large projects as a tool that "changes the experience of exploring codebases." The combination with Claude Code suggests new possibilities for AI-assisted development. I recommend trying it out on a small-scale project if you're interested. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: # Executing as per the official documentation tldr warm . Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Executing as per the official documentation tldr warm . COMMAND_BLOCK: # Executing as per the official documentation tldr warm . COMMAND_BLOCK: # Explicitly specify the language tldr semantic index . --lang typescript Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Explicitly specify the language tldr semantic index . --lang typescript COMMAND_BLOCK: # Explicitly specify the language tldr semantic index . --lang typescript COMMAND_BLOCK: # Searching for importers tldr importers "@/lib/supabase/server" . Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Searching for importers tldr importers "@/lib/supabase/server" . COMMAND_BLOCK: # Searching for importers tldr importers "@/lib/supabase/server" . CODE_BLOCK: tldr semantic search "authentication and login" --path . --k 5 Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: tldr semantic search "authentication and login" --path . --k 5 CODE_BLOCK: tldr semantic search "authentication and login" --path . --k 5 CODE_BLOCK: [ { "name": "LoginForm", "file": "components/auth/LoginForm.tsx", "score": 0.6509 }, { "name": "LoginPage", "file": "app/auth/login/page.tsx", "score": 0.6506 }, { "name": "UpdatePasswordForm", "file": "components/auth/UpdatePasswordForm.tsx", "score": 0.6165 } ] Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: [ { "name": "LoginForm", "file": "components/auth/LoginForm.tsx", "score": 0.6509 }, { "name": "LoginPage", "file": "app/auth/login/page.tsx", "score": 0.6506 }, { "name": "UpdatePasswordForm", "file": "components/auth/UpdatePasswordForm.tsx", "score": 0.6165 } ] CODE_BLOCK: [ { "name": "LoginForm", "file": "components/auth/LoginForm.tsx", "score": 0.6509 }, { "name": "LoginPage", "file": "app/auth/login/page.tsx", "score": 0.6506 }, { "name": "UpdatePasswordForm", "file": "components/auth/UpdatePasswordForm.tsx", "score": 0.6165 } ] CODE_BLOCK: tldr semantic search "Supabaseへのデータベース接続" --path . --k 5 Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: tldr semantic search "Supabaseへのデータベース接続" --path . --k 5 CODE_BLOCK: tldr semantic search "Supabaseへのデータベース接続" --path . --k 5 CODE_BLOCK: tldr semantic search "PDF generation and export" --path . --k 5 Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: tldr semantic search "PDF generation and export" --path . --k 5 CODE_BLOCK: tldr semantic search "PDF generation and export" --path . --k 5 CODE_BLOCK: [ { "name": "generateEstimatePDF", "file": "lib/utils/pdf-generator.ts", "score": 0.6862 }, { "name": "convertMarkdownToDocx", "file": "lib/utils/specification-export.ts", "score": 0.6582 } ] Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: [ { "name": "generateEstimatePDF", "file": "lib/utils/pdf-generator.ts", "score": 0.6862 }, { "name": "convertMarkdownToDocx", "file": "lib/utils/specification-export.ts", "score": 0.6582 } ] CODE_BLOCK: [ { "name": "generateEstimatePDF", "file": "lib/utils/pdf-generator.ts", "score": 0.6862 }, { "name": "convertMarkdownToDocx", "file": "lib/utils/specification-export.ts", "score": 0.6582 } ] CODE_BLOCK: tldr semantic search "chat message handling" --path . --k 3 --expand Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: tldr semantic search "chat message handling" --path . --k 3 --expand CODE_BLOCK: tldr semantic search "chat message handling" --path . --k 3 --expand CODE_BLOCK: tldr impact fetchVendors . Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: tldr impact fetchVendors . CODE_BLOCK: tldr impact fetchVendors . CODE_BLOCK: tldr arch . Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: tldr arch . CODE_BLOCK: tldr arch . CODE_BLOCK: { "entry_layer": [], "leaf_layer": [], "middle_layer_count": 0, "circular_dependencies": [] } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: { "entry_layer": [], "leaf_layer": [], "middle_layer_count": 0, "circular_dependencies": [] } CODE_BLOCK: { "entry_layer": [], "leaf_layer": [], "middle_layer_count": 0, "circular_dependencies": [] } CODE_BLOCK: tldr daemon start Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: tldr daemon start CODE_BLOCK: tldr daemon start CODE_BLOCK: #!/bin/bash git diff --name-only HEAD~1 | xargs -I{} tldr daemon notify {} --project . Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: #!/bin/bash git diff --name-only HEAD~1 | xargs -I{} tldr daemon notify {} --project . CODE_BLOCK: #!/bin/bash git diff --name-only HEAD~1 | xargs -I{} tldr daemon notify {} --project . COMMAND_BLOCK: tldr warm . # Full rebuild Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: tldr warm . # Full rebuild COMMAND_BLOCK: tldr warm . # Full rebuild CODE_BLOCK: { "mcpServers": { "tldr": { "command": "tldr-mcp", "args": ["--project", "."] } } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: { "mcpServers": { "tldr": { "command": "tldr-mcp", "args": ["--project", "."] } } } CODE_BLOCK: { "mcpServers": { "tldr": { "command": "tldr-mcp", "args": ["--project", "."] } } } - 5-layer code analysis architecture: Analyzes using five layers: AST, call graph, control flow, data flow, and program dependency. - Semantic search: Searches for functions using natural language queries (the focus of this article). - Support for 16 languages: Including TypeScript, Python, JavaScript, Go, Rust, Java, and more. - MCP integration: Works with Claude Code / Claude Desktop (MCP). - Signature + docstring (L1: AST) - Call relationships (L2: Call graph) - Complexity metrics (L3: Control flow) - Data flow patterns (L4: Data flow) - Dependencies (L5: Program dependency) - The first approximately 10 lines of code - Project: system-planner (TypeScript/Next.js project) - Size: 269 files, 517 edges (function call relationships) - Language Composition: 257 TypeScript files, 12 JavaScript files - llm-tldr Version: 1.5.2 - The embedding model (bge-large-en-v1.5, 1.3GB) needs to be downloaded. - Language specification (--lang) is mandatory. - Exact match of function name may be required? - The call graph index may not be complete. - TypeScript-specific issues (related to type definitions). - Queries are accelerated to 100ms (as claimed). - In-memory caching makes subsequent searches extremely fast. - Communication via Unix Socket. - Serena: Accurately trace known symbols / safely rename. - llm-tldr: Narrow down using natural language like "Where is authentication?" / gain an overview of unfamiliar codebases. - Exploring Large Codebases Ideal for searching for features like "Where is authentication?" or "Where is PDF generation?" in unfamiliar projects. More intuitive than traditional grep or find. - Ideal for searching for features like "Where is authentication?" or "Where is PDF generation?" in unfamiliar projects. - More intuitive than traditional grep or find. - Impact Investigation Before Refactoring Understand dependencies between functions using the call graph. Identify the scope of changes with the --expand option. - Understand dependencies between functions using the call graph. - Identify the scope of changes with the --expand option. - Providing Context to LLMs The 95% reduction in tokens allows context windows to fit even in large projects. Structured JSON output makes it easier for LLMs to understand. - The 95% reduction in tokens allows context windows to fit even in large projects. - Structured JSON output makes it easier for LLMs to understand. - Ideal for searching for features like "Where is authentication?" or "Where is PDF generation?" in unfamiliar projects. - More intuitive than traditional grep or find. - Understand dependencies between functions using the call graph. - Identify the scope of changes with the --expand option. - The 95% reduction in tokens allows context windows to fit even in large projects. - Structured JSON output makes it easier for LLMs to understand. - Debugging Program slicing with tldr slice is theoretically effective but was not validated this time. Data flow analysis (tldr dfg) is similar. - Program slicing with tldr slice is theoretically effective but was not validated this time. - Data flow analysis (tldr dfg) is similar. - Architecture Analysis If tldr arch worked correctly, it would be useful, but results were not obtained this time. - If tldr arch worked correctly, it would be useful, but results were not obtained this time. - Program slicing with tldr slice is theoretically effective but was not validated this time. - Data flow analysis (tldr dfg) is similar. - If tldr arch worked correctly, it would be useful, but results were not obtained this time. - Simple Edits in Single Files High overhead. IDE features are sufficient. - High overhead. - IDE features are sufficient. - Situations Requiring Real-Time Responses Index updates are necessary (after every 20 file changes). Continuous notification settings (like Git hooks) are required. - Index updates are necessary (after every 20 file changes). - Continuous notification settings (like Git hooks) are required. - High overhead. - IDE features are sufficient. - Index updates are necessary (after every 20 file changes). - Continuous notification settings (like Git hooks) are required. - Disk Space: 1.3GB (bge-large-en-v1.5 model) + index files. - Initial Setup Time: Model download: A few minutes (depending on connection speed). Index creation: About 2 minutes for a medium-sized project (269 files). - Model download: A few minutes (depending on connection speed). - Index creation: About 2 minutes for a medium-sized project (269 files). - Memory: A few hundred MB as a resident process when using daemon mode. - Model download: A few minutes (depending on connection speed). - Index creation: About 2 minutes for a medium-sized project (269 files). - Large projects with over 100,000 lines of code. - Microservice architectures. - Analysis of legacy code. - AI-assisted development (integration with Claude Code, etc.). - The initial setup cost (1.3GB download) is unavoidable. - Some features (arch, impact) may not work as expected. - There are challenges in resolving path aliases. - Index updates require settings like Git hooks. - llm-tldr GitHub Repository - Serena GitHub Repository - bge-large-en-v1.5 Embedding Model - FAISS: Facebook AI Similarity Search - Model Context Protocol (MCP) - Tree-sitter

🏷️ Tags

how-totutorialguidedev.toaillmserverbashrouterpythonjavascriptdatabasegitgithub