Tools
Tools: I Built Vector-Only Search First. Here's Why I Had to Rewrite It.
2026-02-20
0 views
admin
What Vector Search Is Good At ## Where It Fell Apart ## The Fix: Add BM25 Back ## What I Learned ## Edge Cases in the Merge ## Limitations I spent three weeks building a pure vector search for an e-commerce product catalog. Embedded everything with multilingual-e5-large, loaded it into Qdrant, and ran my first test queries. "Gift for someone who likes cooking" returned kitchen knives and spice sets. Great. "Nike Air Max 90 black" returned Adidas running shoes. "XJ-4520" (an actual product SKU) returned a random kitchen appliance. I had a semantic search engine that understood meaning but couldn't handle the simplest exact-match lookup. Embeddings map text into a high-dimensional space where similar meanings cluster together. When a customer types "gift for someone who likes cooking," the embedding lands near kitchen knives, cookbooks, and spice sets, even though none of those products contain the word "gift." For descriptive queries, it works well. I tested it across five languages and the model (intfloat/multilingual-e5-large) mapped them all into the same space. A query in Bulgarian against an English catalog returned correct results. No translation layer, no language detection. Just math. SKUs and model numbers. "XJ-4520" is a meaningless string to an embedding model. It gets projected somewhere in vector space, and the nearest neighbors are whatever other meaningless strings happen to be nearby. In my tests, SKU lookups almost never returned the right product. Brand + attribute combos. "Nike Air Max 90 black size 42" should return exactly one product. Vector search returned Nike products, but also Adidas and Puma, because they're all semantically "athletic shoes." The exact match was sometimes on page two. Numeric filters. "Under $50" or "500ml bottle" — embeddings don't understand numbers as constraints. They understand that 500ml is semantically related to "bottle" and "liquid," but they won't filter by numeric value. Short, specific queries. When a customer types just "Bosch" with nothing else, vector search returned random power tools. BM25 would return all Bosch products ranked by relevance. I ended up running BM25 and vector search in parallel against the same catalog, then merging results with normalized scores. BM25 handles exact matches: SKUs, brand names, specific attributes. Vector search handles everything else: descriptive queries, intent-based searches, cross-language. The merge is the interesting part. Both engines return scored results, but the scores aren't comparable (BM25 scores can be 0-25+, vector similarity is 0-1). You have to normalize both to the same range before combining. After normalization, I combine with configurable weights. The right ratio depends on the store. A parts supplier where customers search by part number needs heavier BM25. A fashion store where customers describe what they want needs heavier vector. Don't start with vector-only. Every tutorial I read at the time said "just embed your documents and search." None of them mentioned that exact-match queries break completely. BM25 is underrated. It's a 30-year-old algorithm and still the best thing we have for exact token matching. Qdrant added built-in BM25 support, which means you can run both in the same database without maintaining Elasticsearch on the side. Test with real queries, not demo queries. My initial tests all used descriptive sentences like "gift for a coffee lover." Those are the queries vector search is designed for. The moment I tested with what actual customers type (brand names, SKUs, "red shoes"), the problems showed up. Cross-encoder reranking cleans up the merge. After combining BM25 and vector results, I run a cross-encoder (ms-marco-MiniLM-L-6-v2) on the top candidates. It compares each result directly against the query and re-sorts. This catches cases where the merge ranked something incorrectly. Tutorials stop at "combine the scores." In production: I handle some of these with query analysis before search. If the query looks like a SKU (alphanumeric, no spaces), I skip vector search entirely. If it's a long descriptive sentence, I weight vector higher. But it's not clean. There's no universal solution. You tune it per store and keep adjusting. I built this as part of Emporiqa, a chat assistant for e-commerce stores. If you've hit similar search problems, I'd like to hear how you solved them. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
def normalize_scores(results: dict[str, float]) -> dict[str, float]: if not results: return {} min_score = min(results.values()) max_score = max(results.values()) if max_score == min_score: return {k: 1.0 for k in results} return { k: (v - min_score) / (max_score - min_score) for k, v in results.items() } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
def normalize_scores(results: dict[str, float]) -> dict[str, float]: if not results: return {} min_score = min(results.values()) max_score = max(results.values()) if max_score == min_score: return {k: 1.0 for k in results} return { k: (v - min_score) / (max_score - min_score) for k, v in results.items() } COMMAND_BLOCK:
def normalize_scores(results: dict[str, float]) -> dict[str, float]: if not results: return {} min_score = min(results.values()) max_score = max(results.values()) if max_score == min_score: return {k: 1.0 for k in results} return { k: (v - min_score) / (max_score - min_score) for k, v in results.items() } - What if BM25 returns 50 results and vector returns 3? The merge skews heavily toward BM25.
- What if the query is a single word? Vector search works poorly on single tokens.
- What about queries that are half descriptive, half specific? "Red Nike something for running" needs both engines equally. - I haven't built image search. A customer can't upload a photo and find matching products.
- Typo handling is basic. Heavy misspellings confuse both BM25 and vector search.
- No personalization. Every query is independent — the system doesn't learn from a customer's browsing history.
- Score caching adds complexity. Embeddings are expensive to compute per request, so I cache them, but cache invalidation on product updates is its own problem. - Qdrant for both vector and BM25 (single database, no Elasticsearch)
- intfloat/multilingual-e5-large for embeddings (1024 dims, 100+ languages)
- cross-encoder/ms-marco-MiniLM-L-6-v2 for reranking
- Python with async search execution
- LangGraph for orchestrating search as part of a larger chat agent
how-totutorialguidedev.toaimlpythondatabase