Tools: When vector search isn't enough: hybrid graph+vector queries in VelesQL - Full Analysis

Tools: When vector search isn't enough: hybrid graph+vector queries in VelesQL - Full Analysis

Why vectors alone fail on structured data

Modeling the problem as a graph

Pure vector search: the wrong answer

Hybrid graph+vector: the right answer

Going deeper: multi-hop traversal

Side-by-side comparison

When to use what: a decision guide

Getting started "Find me the documentation for the function that handles authentication." Sounds simple. Embed the question, run a similarity search, return the top results. Except here is what pure vector search actually returns: All four results are about authentication. All four are semantically relevant. But none of them are the documentation for the actual function that handles it. The vector search found similar text, not the relationship between a function and its documentation. This is the fundamental limitation of pure vector search: it matches meaning, not structure. Vector embeddings capture semantic similarity. "Dog" is close to "puppy." "Authentication" is close to "login." That works great for finding documents about a topic. But real-world data has structure. A codebase has files that contain functions. Functions call other functions. Documentation pages describe specific modules. Products belong to categories and have reviews written by users. When you ask "find the docs for the auth handler," you need two things: Pure vector search gives you (1) but not (2). You get four results that talk about authentication, but you miss the one document that is structurally connected to the function you care about. Let's build a concrete example: a knowledge base about a codebase. We have three types of entities (functions, documentation pages, and modules) connected by relationships (DOCUMENTS, BELONGS_TO, CALLS). Now let's populate it with entities. Each entity gets both a vector embedding (for similarity search) and a node in the graph (for traversal): We now have 8 entities with embeddings and a graph of 9 relationships. This is a miniature version of what a real codebase knowledge graph looks like. Let's search for documentation about the authentication handler: The correct answer (auth-handler-guide, which DOCUMENTS the handle_auth function) happens to rank first here. But notice the problem: the vector search has no idea why that document is relevant. It just matched on text similarity. The OAuth2 doc ranks third, even though it documents the auth module, not the auth function. And login ranks fourth despite being a different function entirely. In a larger dataset with hundreds of entities, that lucky first-place ranking disappears fast. Now let's combine both signals. First, use vector search to find the function. Then, walk the graph to find its documentation: One result. The right result. No ambiguity. The vector search found the function ("handle_auth" is semantically closest to "function that handles authentication"). The graph traversal followed the DOCUMENTS edge to find the exact documentation page. The graph lets you answer questions that are impossible with pure vector search. For example: "What functions does login call, and what are their docs?" Two hops. Starting from a semantic query ("login function"), the system found the function, walked its CALLS edges, then walked DOCUMENTS edges for each callee. This is the kind of structural reasoning that pure vector search simply cannot do. Pure vector search works well when: Graph+vector hybrid is the right choice when: The rule of thumb: if you catch yourself writing post-processing code to filter or re-rank vector results based on relationships, you need a graph. The full example from this article runs in under 2 seconds on a laptop. No Docker, no API keys, no cloud. VelesDB is a source-available (Elastic License 2.0) embedded database that combines vector, graph, and columnar storage in a single ~6MB binary. GitHub: github.com/cyberlife-coder/VelesDB

Docs: velesdb.com/en

Previous article in this series: VelesQL: one query language for vectors, text, and filters What's your experience with GraphRAG? Are you gluing together separate vector and graph databases, or have you found a single-engine approach that works? I'd love to hear what patterns are working in production. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

[0.82] "Authentication is handled via JWT tokens with a 24h expiry." [0.79] "The login() function validates user credentials against the database." [0.71] "OAuth2 flow documentation for third-party integrations." [0.68] "Password hashing uses bcrypt with a cost factor of 12." [0.82] "Authentication is handled via JWT tokens with a 24h expiry." [0.79] "The login() function validates user credentials against the database." [0.71] "OAuth2 flow documentation for third-party integrations." [0.68] "Password hashing uses bcrypt with a cost factor of 12." [0.82] "Authentication is handled via JWT tokens with a 24h expiry." [0.79] "The login() function validates user credentials against the database." [0.71] "OAuth2 flow documentation for third-party integrations." [0.68] "Password hashing uses bcrypt with a cost factor of 12." import velesdb from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-MiniLM-L6-v2") db = velesdb.Database("./codebase_kb") # Vector collection for semantic search collection = db.create_collection("entities", dimension=384, metric="cosine") # Graph collection for relationships graph = db.create_graph_collection("codebase_graph", dimension=384) import velesdb from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-MiniLM-L6-v2") db = velesdb.Database("./codebase_kb") # Vector collection for semantic search collection = db.create_collection("entities", dimension=384, metric="cosine") # Graph collection for relationships graph = db.create_graph_collection("codebase_graph", dimension=384) import velesdb from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-MiniLM-L6-v2") db = velesdb.Database("./codebase_kb") # Vector collection for semantic search collection = db.create_collection("entities", dimension=384, metric="cosine") # Graph collection for relationships graph = db.create_graph_collection("codebase_graph", dimension=384) entities = [ {"id": 1, "type": "function", "name": "handle_auth", "description": "Validates JWT tokens and returns user session"}, {"id": 2, "type": "function", "name": "hash_password", "description": "Hashes passwords using bcrypt with configurable cost factor"}, {"id": 3, "type": "function", "name": "login", "description": "Authenticates user credentials and issues JWT token"}, {"id": 4, "type": "doc", "name": "auth-handler-guide", "description": "Complete guide to the authentication handler: configuration, middleware setup, and error codes"}, {"id": 5, "type": "doc", "name": "oauth2-integration", "description": "OAuth2 flow documentation for third-party integrations"}, {"id": 6, "type": "doc", "name": "password-security", "description": "Password hashing implementation details and security considerations"}, {"id": 7, "type": "module", "name": "auth", "description": "Authentication and authorization module"}, {"id": 8, "type": "doc", "name": "login-endpoint", "description": "API reference for the login endpoint with request and response examples"}, ] # Store vectors points = [] for e in entities: embedding = model.encode(e["description"]).tolist() payload = {"type": e["type"], "name": e["name"], "description": e["description"]} points.append({"id": e["id"], "vector": embedding, "payload": payload}) collection.upsert(points) # Store graph nodes for e in entities: graph.store_node_payload(e["id"], { "name": e["name"], "type": e["type"], }) # Add relationships edges = [ {"id": 1, "source": 4, "target": 1, "label": "DOCUMENTS", "properties": {"scope": "full"}}, {"id": 2, "source": 6, "target": 2, "label": "DOCUMENTS", "properties": {"scope": "full"}}, {"id": 3, "source": 8, "target": 3, "label": "DOCUMENTS", "properties": {"scope": "api"}}, {"id": 4, "source": 5, "target": 7, "label": "DOCUMENTS", "properties": {"scope": "integration"}}, {"id": 5, "source": 1, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 6, "source": 2, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 7, "source": 3, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 8, "source": 3, "target": 1, "label": "CALLS", "properties": {"context": "during login flow"}}, {"id": 9, "source": 3, "target": 2, "label": "CALLS", "properties": {"context": "password verification"}}, ] for edge in edges: graph.add_edge(edge) entities = [ {"id": 1, "type": "function", "name": "handle_auth", "description": "Validates JWT tokens and returns user session"}, {"id": 2, "type": "function", "name": "hash_password", "description": "Hashes passwords using bcrypt with configurable cost factor"}, {"id": 3, "type": "function", "name": "login", "description": "Authenticates user credentials and issues JWT token"}, {"id": 4, "type": "doc", "name": "auth-handler-guide", "description": "Complete guide to the authentication handler: configuration, middleware setup, and error codes"}, {"id": 5, "type": "doc", "name": "oauth2-integration", "description": "OAuth2 flow documentation for third-party integrations"}, {"id": 6, "type": "doc", "name": "password-security", "description": "Password hashing implementation details and security considerations"}, {"id": 7, "type": "module", "name": "auth", "description": "Authentication and authorization module"}, {"id": 8, "type": "doc", "name": "login-endpoint", "description": "API reference for the login endpoint with request and response examples"}, ] # Store vectors points = [] for e in entities: embedding = model.encode(e["description"]).tolist() payload = {"type": e["type"], "name": e["name"], "description": e["description"]} points.append({"id": e["id"], "vector": embedding, "payload": payload}) collection.upsert(points) # Store graph nodes for e in entities: graph.store_node_payload(e["id"], { "name": e["name"], "type": e["type"], }) # Add relationships edges = [ {"id": 1, "source": 4, "target": 1, "label": "DOCUMENTS", "properties": {"scope": "full"}}, {"id": 2, "source": 6, "target": 2, "label": "DOCUMENTS", "properties": {"scope": "full"}}, {"id": 3, "source": 8, "target": 3, "label": "DOCUMENTS", "properties": {"scope": "api"}}, {"id": 4, "source": 5, "target": 7, "label": "DOCUMENTS", "properties": {"scope": "integration"}}, {"id": 5, "source": 1, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 6, "source": 2, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 7, "source": 3, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 8, "source": 3, "target": 1, "label": "CALLS", "properties": {"context": "during login flow"}}, {"id": 9, "source": 3, "target": 2, "label": "CALLS", "properties": {"context": "password verification"}}, ] for edge in edges: graph.add_edge(edge) entities = [ {"id": 1, "type": "function", "name": "handle_auth", "description": "Validates JWT tokens and returns user session"}, {"id": 2, "type": "function", "name": "hash_password", "description": "Hashes passwords using bcrypt with configurable cost factor"}, {"id": 3, "type": "function", "name": "login", "description": "Authenticates user credentials and issues JWT token"}, {"id": 4, "type": "doc", "name": "auth-handler-guide", "description": "Complete guide to the authentication handler: configuration, middleware setup, and error codes"}, {"id": 5, "type": "doc", "name": "oauth2-integration", "description": "OAuth2 flow documentation for third-party integrations"}, {"id": 6, "type": "doc", "name": "password-security", "description": "Password hashing implementation details and security considerations"}, {"id": 7, "type": "module", "name": "auth", "description": "Authentication and authorization module"}, {"id": 8, "type": "doc", "name": "login-endpoint", "description": "API reference for the login endpoint with request and response examples"}, ] # Store vectors points = [] for e in entities: embedding = model.encode(e["description"]).tolist() payload = {"type": e["type"], "name": e["name"], "description": e["description"]} points.append({"id": e["id"], "vector": embedding, "payload": payload}) collection.upsert(points) # Store graph nodes for e in entities: graph.store_node_payload(e["id"], { "name": e["name"], "type": e["type"], }) # Add relationships edges = [ {"id": 1, "source": 4, "target": 1, "label": "DOCUMENTS", "properties": {"scope": "full"}}, {"id": 2, "source": 6, "target": 2, "label": "DOCUMENTS", "properties": {"scope": "full"}}, {"id": 3, "source": 8, "target": 3, "label": "DOCUMENTS", "properties": {"scope": "api"}}, {"id": 4, "source": 5, "target": 7, "label": "DOCUMENTS", "properties": {"scope": "integration"}}, {"id": 5, "source": 1, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 6, "source": 2, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 7, "source": 3, "target": 7, "label": "BELONGS_TO", "properties": {}}, {"id": 8, "source": 3, "target": 1, "label": "CALLS", "properties": {"context": "during login flow"}}, {"id": 9, "source": 3, "target": 2, "label": "CALLS", "properties": {"context": "password verification"}}, ] for edge in edges: graph.add_edge(edge) query = "documentation for the function that handles authentication" query_vec = model.encode(query).tolist() results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v LIMIT 5", params={"v": query_vec} ) print("=== Pure vector search ===") for r in results: p = r["bindings"] print(f" [{r['fused_score']:.3f}] ({p['type']}) {p['name']}: {p['description'][:60]}") query = "documentation for the function that handles authentication" query_vec = model.encode(query).tolist() results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v LIMIT 5", params={"v": query_vec} ) print("=== Pure vector search ===") for r in results: p = r["bindings"] print(f" [{r['fused_score']:.3f}] ({p['type']}) {p['name']}: {p['description'][:60]}") query = "documentation for the function that handles authentication" query_vec = model.encode(query).tolist() results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v LIMIT 5", params={"v": query_vec} ) print("=== Pure vector search ===") for r in results: p = r["bindings"] print(f" [{r['fused_score']:.3f}] ({p['type']}) {p['name']}: {p['description'][:60]}") === Pure vector search === [0.724] (doc) auth-handler-guide: Complete guide to the authentication handler: configuration [0.691] (function) handle_auth: Validates JWT tokens and returns user session [0.654] (doc) oauth2-integration: OAuth2 flow documentation for third-party integrations [0.641] (function) login: Authenticates user credentials and issues JWT token [0.589] (doc) login-endpoint: API reference for the login endpoint with request and res === Pure vector search === [0.724] (doc) auth-handler-guide: Complete guide to the authentication handler: configuration [0.691] (function) handle_auth: Validates JWT tokens and returns user session [0.654] (doc) oauth2-integration: OAuth2 flow documentation for third-party integrations [0.641] (function) login: Authenticates user credentials and issues JWT token [0.589] (doc) login-endpoint: API reference for the login endpoint with request and res === Pure vector search === [0.724] (doc) auth-handler-guide: Complete guide to the authentication handler: configuration [0.691] (function) handle_auth: Validates JWT tokens and returns user session [0.654] (doc) oauth2-integration: OAuth2 flow documentation for third-party integrations [0.641] (function) login: Authenticates user credentials and issues JWT token [0.589] (doc) login-endpoint: API reference for the login endpoint with request and res # Step 1: find the function via vector search (filter by type) func_results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v AND type = 'function' LIMIT 1", params={"v": query_vec} ) if func_results: func = func_results[0] func_id = func["id"] func_name = func["bindings"]["name"] print(f"Found function: {func_name} (score={func['fused_score']:.3f})") # Step 2: traverse graph to find documentation incoming = graph.get_incoming(func_id) docs = [e for e in incoming if e["label"] == "DOCUMENTS"] print(f"\nDocumentation linked via graph:") for edge in docs: doc_node = graph.get_node_payload(edge["source"]) print(f" -> {doc_node['name']} (relationship: {edge['label']}, scope: {edge['properties']['scope']})") # Step 1: find the function via vector search (filter by type) func_results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v AND type = 'function' LIMIT 1", params={"v": query_vec} ) if func_results: func = func_results[0] func_id = func["id"] func_name = func["bindings"]["name"] print(f"Found function: {func_name} (score={func['fused_score']:.3f})") # Step 2: traverse graph to find documentation incoming = graph.get_incoming(func_id) docs = [e for e in incoming if e["label"] == "DOCUMENTS"] print(f"\nDocumentation linked via graph:") for edge in docs: doc_node = graph.get_node_payload(edge["source"]) print(f" -> {doc_node['name']} (relationship: {edge['label']}, scope: {edge['properties']['scope']})") # Step 1: find the function via vector search (filter by type) func_results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v AND type = 'function' LIMIT 1", params={"v": query_vec} ) if func_results: func = func_results[0] func_id = func["id"] func_name = func["bindings"]["name"] print(f"Found function: {func_name} (score={func['fused_score']:.3f})") # Step 2: traverse graph to find documentation incoming = graph.get_incoming(func_id) docs = [e for e in incoming if e["label"] == "DOCUMENTS"] print(f"\nDocumentation linked via graph:") for edge in docs: doc_node = graph.get_node_payload(edge["source"]) print(f" -> {doc_node['name']} (relationship: {edge['label']}, scope: {edge['properties']['scope']})") Found function: handle_auth (score=0.691) Documentation linked via graph: -> auth-handler-guide (relationship: DOCUMENTS, scope: full) Found function: handle_auth (score=0.691) Documentation linked via graph: -> auth-handler-guide (relationship: DOCUMENTS, scope: full) Found function: handle_auth (score=0.691) Documentation linked via graph: -> auth-handler-guide (relationship: DOCUMENTS, scope: full) # Find login function login_results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v AND name = 'login' LIMIT 1", params={"v": model.encode("login function").tolist()} ) login_id = login_results[0]["id"] # Hop 1: what does login() call? calls = graph.get_outgoing(login_id) call_edges = [e for e in calls if e["label"] == "CALLS"] print("login() calls:") for edge in call_edges: target = graph.get_node_payload(edge["target"]) print(f" -> {target['name']} ({edge['properties']['context']})") # Hop 2: get documentation for each called function incoming = graph.get_incoming(edge["target"]) doc_edges = [e for e in incoming if e["label"] == "DOCUMENTS"] for doc_edge in doc_edges: doc = graph.get_node_payload(doc_edge["source"]) print(f" docs: {doc['name']}") # Find login function login_results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v AND name = 'login' LIMIT 1", params={"v": model.encode("login function").tolist()} ) login_id = login_results[0]["id"] # Hop 1: what does login() call? calls = graph.get_outgoing(login_id) call_edges = [e for e in calls if e["label"] == "CALLS"] print("login() calls:") for edge in call_edges: target = graph.get_node_payload(edge["target"]) print(f" -> {target['name']} ({edge['properties']['context']})") # Hop 2: get documentation for each called function incoming = graph.get_incoming(edge["target"]) doc_edges = [e for e in incoming if e["label"] == "DOCUMENTS"] for doc_edge in doc_edges: doc = graph.get_node_payload(doc_edge["source"]) print(f" docs: {doc['name']}") # Find login function login_results = collection.query( "SELECT * FROM entities WHERE vector NEAR $v AND name = 'login' LIMIT 1", params={"v": model.encode("login function").tolist()} ) login_id = login_results[0]["id"] # Hop 1: what does login() call? calls = graph.get_outgoing(login_id) call_edges = [e for e in calls if e["label"] == "CALLS"] print("login() calls:") for edge in call_edges: target = graph.get_node_payload(edge["target"]) print(f" -> {target['name']} ({edge['properties']['context']})") # Hop 2: get documentation for each called function incoming = graph.get_incoming(edge["target"]) doc_edges = [e for e in incoming if e["label"] == "DOCUMENTS"] for doc_edge in doc_edges: doc = graph.get_node_payload(doc_edge["source"]) print(f" docs: {doc['name']}") login() calls: -> handle_auth (during login flow) docs: auth-handler-guide -> hash_password (password verification) docs: password-security login() calls: -> handle_auth (during login flow) docs: auth-handler-guide -> hash_password (password verification) docs: password-security login() calls: -> handle_auth (during login flow) docs: auth-handler-guide -> hash_password (password verification) docs: password-security pip install velesdb sentence-transformers pip install velesdb sentence-transformers pip install velesdb sentence-transformers - Semantic similarity to understand what "auth handler" means - Graph traversal to follow the DOCUMENTS relationship from the function to its docs - Your data is flat (documents, paragraphs, FAQ entries) - Relationships between items don't matter for retrieval quality - You need speed over precision (vector search is O(log n) with HNSW) - Your data has meaningful relationships (code dependencies, org charts, knowledge graphs) - The answer depends on following connections, not just matching text - You need to combine "find something similar" with "follow its relationships" - You are building a GraphRAG pipeline that reasons over structured knowledge