Tools
Tools: Your Chatbot Recommends Products You Don't Sell
2026-02-26
0 views
admin
The Agent Has No Inventory Model ## Check the Catalog First ## The Four Paths ## The Tradeoff: An Extra LLM Call ## What Still Goes Wrong ## The Result I was testing my product agent on a demo store that sells electronics. I typed "do you have leather jackets?" The agent searched, got zero results, and instead of admitting the store doesn't carry jackets, it generated product recommendations from its training data. Product names, prices, descriptions that weren't in the store. This is what happens when a ReAct agent with a search tool has no concept of what the store actually sells. It searches, gets nothing useful, and fills the gap with whatever the LLM thinks a helpful response should contain. I tried prompt engineering first. "Only recommend products from search results." "Never invent products." "If search returns no results, say so." It helped with the simple cases but broke down on subtler ones. The agent would find vaguely related products and stretch the recommendation to fit the query. Prompt engineering wasn't going to fix this. The agent needed a model of the store's inventory. A typical product search agent works like this: The agent knows nothing about the store until after it searches. If the customer asks about a category that doesn't exist, the agent can't know that in advance. It searches, gets poor results, and has to improvise a response. That's where hallucinations come from. I added a preprocessing step between the customer message and the agent. Before the agent runs, the preprocessor loads the store's actual catalog metadata and analyzes the query against it. The context comes from the database, cached per store. The preprocessor gives this to a fast LLM along with the customer's query: The LLM returns a structured response with one of four actions: For "leather jackets" on an electronics store, the preprocessor returns no_match with a response mentioning what the store does carry. The agent never runs, and there's nothing to hallucinate. no_match catches impossible queries early. "Do you have ski equipment?" on a bookstore. The preprocessor sees the categories (Fiction, Non-fiction, Children's, Academic), confirms ski equipment isn't among them, and says so. Saves an agent invocation and prevents the agent from trying to stretch book recommendations into ski gear. show_products handles small categories directly. "What teas do you have?" on a store with 6 teas. Instead of running the full search pipeline (embed the query, retrieve candidates, rerank, invoke the agent), the preprocessor fetches all 6 products and presents them. Less latency, and the customer sees everything without the agent deciding what to show. qualify fires when the query is genuinely ambiguous. "I need a gift" could match every category. The preprocessor asks what kind of gift, mentioning the actual categories available. But only when the ambiguity is real. "I need headphones" goes straight to search even on a store with 200 headphones, because the intent is clear. search is the default. The query goes to the full agent with hybrid search, reranking, and tool use. But the preprocessor passes along structured filters it already extracted: "Wireless Sony headphones under $200" becomes: The categories and brands are matched against real store data. The preprocessor won't extract "Nike" as a brand if the store doesn't carry Nike. The agent starts with filters grounded in real inventory. Every product query now makes at least two LLM calls: the preprocessor, then the agent. The preprocessor uses a fast, cheap model (GPT-4o-mini or DeepSeek). It costs fractions of a cent and adds under a second of latency. I considered doing this without an LLM. Pattern matching on category names, fuzzy string matching, keyword overlap. But the mapping requires understanding that "phones" means Smartphones, "TVs" means Televisions, "something for the kitchen" means Kitchen Appliances. String comparison can't do this reliably. The structured output (QueryAnalysis as a Pydantic model with response_model) means the LLM returns typed data, not free text. Four possible actions, each with specific fields. No parsing ambiguity. The preprocessor depends on the LLM's judgment for category matching. "Winter coats" on a store with an "Outerwear" category requires the LLM to know that winter coats are a subset of outerwear. It usually gets this right. Not always. Conversation context can get lost. If the customer asked about headphones three messages ago and now says "what about the wireless ones?" the preprocessor needs the conversation summary to understand this is still about headphones. The summary is passed in, but summarization sometimes drops details. The show_products path doesn't handle variation-heavy categories well. A category with 8 products that are all color variants of the same item shows 8 near-identical entries. I haven't solved this. And the preprocessor can be too aggressive with no_match. If a store sells "Outdoor Gear" and the customer asks for "camping equipment," the LLM might not connect the two. The fallback is to route to search when confidence is low, but some edge cases still produce unhelpful "we don't carry that" responses for products the store actually has under a different name. Before the preprocessor: the agent searched for everything, hallucinated on misses, and had no sense of what the store carried. After: impossible queries get a direct answer immediately. Small categories get shown directly. Ambiguous queries get a focused question. The agent only runs when there's something worth searching for, and it starts with pre-extracted filters instead of raw text. The preprocessor itself is about 80 lines of logic plus another 80 for the Pydantic models. The prompt template is 50 lines. It added about 200 lines of code to the pipeline and cut hallucination on out-of-stock and wrong-category queries significantly. I wrote about the hybrid search pipeline and the parallel agent system in separate posts. The preprocessor sits before both, deciding what kind of handling each query needs. The code is part of Emporiqa, a chat assistant for e-commerce stores. Free sandbox if you want to try it. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
class StoreContext(BaseModel): categories: list[str] # ["Smartphones", "Laptops", "Headphones"] brands: list[str] # ["Apple", "Samsung", "Sony"] total_products: int # 847 category_counts: dict[str, int] # {"Smartphones": 234, "Laptops": 156, ...} Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
class StoreContext(BaseModel): categories: list[str] # ["Smartphones", "Laptops", "Headphones"] brands: list[str] # ["Apple", "Samsung", "Sony"] total_products: int # 847 category_counts: dict[str, int] # {"Smartphones": 234, "Laptops": 156, ...} COMMAND_BLOCK:
class StoreContext(BaseModel): categories: list[str] # ["Smartphones", "Laptops", "Headphones"] brands: list[str] # ["Apple", "Samsung", "Sony"] total_products: int # 847 category_counts: dict[str, int] # {"Smartphones": 234, "Laptops": 156, ...} CODE_BLOCK:
Store categories (product count): Smartphones: 234, Laptops: 156, Headphones: 89, ...
Store brands: Apple, Samsung, Sony, ... Query: "do you have leather jackets?"
Context: First message Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
Store categories (product count): Smartphones: 234, Laptops: 156, Headphones: 89, ...
Store brands: Apple, Samsung, Sony, ... Query: "do you have leather jackets?"
Context: First message CODE_BLOCK:
Store categories (product count): Smartphones: 234, Laptops: 156, Headphones: 89, ...
Store brands: Apple, Samsung, Sony, ... Query: "do you have leather jackets?"
Context: First message COMMAND_BLOCK:
class ExtractedFilters(BaseModel): categories: list[str] | None # Matched to real store categories brand: list[str] | None price_min: float | None price_max: float | None attributes: list[AttributeFilter] # [{"name": "color", "value": "red"}] Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
class ExtractedFilters(BaseModel): categories: list[str] | None # Matched to real store categories brand: list[str] | None price_min: float | None price_max: float | None attributes: list[AttributeFilter] # [{"name": "color", "value": "red"}] COMMAND_BLOCK:
class ExtractedFilters(BaseModel): categories: list[str] | None # Matched to real store categories brand: list[str] | None price_min: float | None price_max: float | None attributes: list[AttributeFilter] # [{"name": "color", "value": "red"}] CODE_BLOCK:
{ "categories": ["Headphones"], "brand": ["Sony"], "price_max": 200, "attributes": [{"name": "connectivity", "value": "wireless"}]
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
{ "categories": ["Headphones"], "brand": ["Sony"], "price_max": 200, "attributes": [{"name": "connectivity", "value": "wireless"}]
} CODE_BLOCK:
{ "categories": ["Headphones"], "brand": ["Sony"], "price_max": 200, "attributes": [{"name": "connectivity", "value": "wireless"}]
} - Customer asks a question
- Agent decides to search
- Search returns results (or doesn't)
- Agent generates a response - no_match: Store doesn't carry this. Respond directly.
- show_products: Small category (10 or fewer products). Fetch and display them.
- qualify: Ambiguous query. Ask a clarifying question.
- search: Pass to the full agent with pre-extracted filters.
how-totutorialguidedev.toaillmgptdatabase