Tools: I built a search engine over 1,600+ cybersecurity articles — here's what I actually learned - Analysis

Tools: I built a search engine over 1,600+ cybersecurity articles — here's what I actually learned - Analysis

The stack I chose (and why)

What surprised me: the content problem

The retrieval part: what "RAG" actually means at this scale

Honest numbers

What I'd do differently

The takeaway A year ago I had a problem: 1,600+ cybersecurity articles spread across a Go backend, and a search bar that returned garbage. The standard MySQL LIKE '%keyword%' approach was embarrassing. Searching "pentest Active Directory" returned articles that happened to contain the word "pentest" on one side and "Directory" somewhere else — totally unrelated content ranked first. So I rebuilt it from scratch. Here's the honest version of what happened. My backend is Go Fiber. I needed something that: I went with Meilisearch. Not because it's technically the best for every use case, but because it hit every point above and took 20 minutes to set up. The index auto-syncs on startup and updates via CRUD hooks — so every time an article is created, updated or deleted, Meilisearch stays in sync. After the first week, I had a painful realization: search quality is mostly a content problem, not a tooling problem. Meilisearch was doing its job. But my articles had inconsistent metadata. Some had rich excerpts, others had none. Tags were applied loosely. Category assignments were sometimes wrong. Three things I fixed that made the biggest difference: 1. Enforce excerpt quality at write time I added validation that rejects articles without a proper excerpt (minimum length, no boilerplate phrases). This is boring to implement and nobody wants to do it. Do it anyway. 2. Category filtering beats keyword search For a domain-specific corpus, letting users pre-filter by category (news / guide / analysis / checklist) reduces the search space dramatically. Precision goes up even when relevance ranking isn't perfect. Meilisearch goes down. Rarely, but it does. I added a MySQL LIKE fallback that kicks in automatically: Users never noticed the degradation. That's the goal. I see a lot of articles about building RAG systems with vector embeddings, chunking strategies, cosine similarity, etc. That's the right approach when your questions are complex and open-ended. For a domain-specific article corpus with structured metadata, it's overkill. What I actually needed was: The architecture ended up being: No vector DB. No embeddings pipeline. No chunking headaches. For 1,600 articles averaging 2,000 words each, this works well. The 12MB index for 1,600+ articles is worth emphasizing — Meilisearch is lean. 1. Index full content, not just excerpts I indexed titles, slugs, excerpts and tags — but not the full article body. This means searching for a technical term that appears deep in an article content returns nothing. I'm fixing this progressively. 2. Add synonyms from day one Meilisearch has a synonyms API. I should have built a synonyms list for cybersecurity terminology immediately: I added these late, after noticing obvious query misses. 3. Log every failed search The most valuable dataset I have is the list of searches that returned zero results. It tells you exactly what content you're missing and what synonyms to add. I started logging these to a search_misses table — should have done it from the start. If you're building a content-heavy site and want good search without a massive infrastructure investment: The full search endpoint with category/difficulty/type filters, pagination and Meilisearch/MySQL fallback is about 80 lines of Go. Happy to share if useful. I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. The corpus covers pentesting, Active Directory, cloud security and compliance — including 17 free security hardening checklists (PDF + Excel). Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or

Code Block

Copy

// Sync article index on startup func SyncMeilisearch(client *meilisearch.Client, articles []Article) error { index := client.Index("articles") docs := make([]map[string]interface{}, len(articles)) for i, a := range articles { docs[i] = map[string]interface{}{ "id": a.ID, "title": a.Title, "slug": a.Slug, "excerpt": a.Excerpt, "category": a.Category, "tags": a.Tags, "published_at": a.PublishedAt, } } _, err := index.AddDocuments(docs) return err } CODE_BLOCK: // Sync article index on startup func SyncMeilisearch(client *meilisearch.Client, articles []Article) error { index := client.Index("articles") docs := make([]map[string]interface{}, len(articles)) for i, a := range articles { docs[i] = map[string]interface{}{ "id": a.ID, "title": a.Title, "slug": a.Slug, "excerpt": a.Excerpt, "category": a.Category, "tags": a.Tags, "published_at": a.PublishedAt, } } _, err := index.AddDocuments(docs) return err } CODE_BLOCK: // Sync article index on startup func SyncMeilisearch(client *meilisearch.Client, articles []Article) error { index := client.Index("articles") docs := make([]map[string]interface{}, len(articles)) for i, a := range articles { docs[i] = map[string]interface{}{ "id": a.ID, "title": a.Title, "slug": a.Slug, "excerpt": a.Excerpt, "category": a.Category, "tags": a.Tags, "published_at": a.PublishedAt, } } _, err := index.AddDocuments(docs) return err } CODE_BLOCK: GET /api/search?q=kerberoasting&cat=guide&limit=10 CODE_BLOCK: GET /api/search?q=kerberoasting&cat=guide&limit=10 CODE_BLOCK: GET /api/search?q=kerberoasting&cat=guide&limit=10 CODE_BLOCK: results, err := SearchMeilisearch(query, filters) if err != nil || len(results) == 0 { results, err = SearchMySQL(query, filters) // fallback } CODE_BLOCK: results, err := SearchMeilisearch(query, filters) if err != nil || len(results) == 0 { results, err = SearchMySQL(query, filters) // fallback } CODE_BLOCK: results, err := SearchMeilisearch(query, filters) if err != nil || len(results) == 0 { results, err = SearchMySQL(query, filters) // fallback } CODE_BLOCK: User query → Meilisearch (retrieval, ~10-30ms) → Top 3-5 articles (slug + title + excerpt) → LLM prompt context → Generated response / enriched content CODE_BLOCK: User query → Meilisearch (retrieval, ~10-30ms) → Top 3-5 articles (slug + title + excerpt) → LLM prompt context → Generated response / enriched content CODE_BLOCK: User query → Meilisearch (retrieval, ~10-30ms) → Top 3-5 articles (slug + title + excerpt) → LLM prompt context → Generated response / enriched content CODE_BLOCK: { "AD": ["Active Directory"], "pentest": ["penetration test", "intrusion test"], "MFA": ["multi-factor authentication", "2FA"] } CODE_BLOCK: { "AD": ["Active Directory"], "pentest": ["penetration test", "intrusion test"], "MFA": ["multi-factor authentication", "2FA"] } CODE_BLOCK: { "AD": ["Active Directory"], "pentest": ["penetration test", "intrusion test"], "MFA": ["multi-factor authentication", "2FA"] } - Handled typos (users search "kerberosting" not "kerberoasting") - Returned results in < 50ms - Could be self-hosted (no SaaS dependency for a small site) - Had a decent Go client - Fast keyword + semantic-ish retrieval (Meilisearch handles this with its ranking rules) - A way to surface the right article given a user query - Context injection into LLM prompts when generating summaries or related content - Meilisearch is genuinely good and genuinely easy - Content quality beats algorithmic cleverness every time - For domain-specific retrieval, you don't need vector embeddings unless your queries are conversational/open-ended - Log your zero-result searches — it's free product research