Tools

Tools: Stop Stitching Your RAG Stack: Why We Built seekdb

2026-03-04 0 views admin

Source: Dev.to

1. The Stitching Tax Is Real ## 2. One Engine Instead of Three ## 3. The Bottleneck Isn't Scale—It's Fragmentation ## 4. What You Stop Paying ## 5. Stop Stitching, Start Building Hi — we're the seekdb team. We're building seekdb, an open-source AI-native hybrid search database. This is our first post here; in the ones that follow, we'll share our story with seekdb. If your RAG setup looks like this—MySQL for metadata, a vector DB for embeddings, Elasticsearch for full-text, and hundreds of lines of glue code to fuse multi-source retrieval—you're paying the "stitching tax." Industry surveys suggest that a large share of production AI applications still run on multiple databases—relational, vector, and full-text in separate systems—because of data diversity and legacy architecture. That pattern remains common even in large enterprises. This article is about why we built seekdb, and what you actually get when you stop stitching. RAG, semantic search, and agents all need the same kinds of data: who, what, where (relational), what was said or written (full-text), and what it means semantically (vectors). In practice, that means MySQL/PostgreSQL for business data, Milvus/Pinecone/Qdrant for vectors, Elasticsearch for full-text, and a thick layer of application code: multi-path queries, normalization, score fusion, reranking. The result is obvious: three systems, two sync pipelines, and a pile of glue code. The business DB updates today; the vector store might still be yesterday's. You run three backup strategies, three monitoring stacks, three upgrade cycles. Every feature change can touch "DB A + DB B + app logic." This isn't a technology choice problem—it's architecture tax. Every new AI capability adds another layer. So we have a simple stance: AI apps shouldn't start by stitching databases. If one engine can handle relational, vector, full-text, and JSON, use one. If one query can express "vector similarity + keywords + filters," don't assemble it in the app. We're not saying distributed or multi-cluster is useless—we're saying that for most teams, getting "no stitching" right matters more than stacking more systems. seekdb is an open-source, AI-native hybrid search database, under Apache 2.0—commercial use, modification, distribution, and forking are all allowed. No vendor lock-in. Code and design live on GitHub; you can audit it, change it, and deploy it yourself. In one sentence: relational, vector, full-text, JSON, and GIS live in one store, one transaction model, one write path—scalar and vector indexes update together. No bugs where the business DB is updated but the vector DB hasn't caught up. The audience is clear: teams tired of "multi-DB + glue," and people building RAG from scratch who don't want to stitch three systems on day one. You want one database, one query interface, one ops stack. seekdb is built for that. For a huge slice of AI use cases, the bottleneck isn't "data won't fit on one machine"—it's too many systems, too many interfaces, too slow to iterate. We tackled that first: one process, one API, one SQL for hybrid search and in-database AI. When you truly need cross-DC or petabyte scale, you can add distribution then. Many teams never get there; they're already slowed down by stitching. So seekdb's "from complex to simple" isn't about removing features—it's moving "multi-system + glue" into a single engine. The complexity is still there; it's just inside the database instead of in your code and runbooks. We're not saying stitching is always wrong—at large scale with strong teams, a multi-system setup can work. But for teams that want to ship fast, want consistency, and want fewer footguns, "no stitching" is often the better first step. Stop stitching, start building. We built seekdb and made it fully open source to give teams that don't want to start from "stitching the databases" an auditable, modifiable, self-hostable option: one engine, one SQL, one ops stack. Get RAG and semantic search running first; scale later. Open source means: the docs and code are the full picture—no black box. Hit an issue? Open an issue, join the discussion, or send a PR. We iterate with the community. We would also love to hear your stories, insights, and perspectives on the future of AI and databases. Open source is more than a development model — it’s a mindset. That’s why we choose to build openly, together with the community. Because we truly believe: Great things start when people talk, share, and create freely. And that’s where the magic begins. From Zero To seekdb · Article 1 Templates let you quickly answer FAQs or store snippets for re-use. Really excited to share our first post on DEV about SeekDB! As I shared in the article:Open source is more than a development model — it’s a mindset. That’s why we choose to build openly, together with the community. Because we truly believe: Great things start when people talk, share, and create freely. And that’s where the magic begins. We believe great tools shouldn’t be locked behind closed walls. We believe in transparency, auditability, and the freedom to run, modify, and own your data stack. That’s why we made SeekDB fully open source:One engine. One SQL. One ops stack.No glue code. No sync lag. No complexity. T his is just the beginning. We don’t build for the community — we build with the community. If you care about open source, AI databases, or building cleaner, more reliable infrastructure, this one’s for you. Let’s grow the future of data together❤️. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - Run one SQL statement for vector similarity, full-text match, and relational filters—no querying three systems and merging in memory; - Run embedding, reranking, and LLM inference inside the database, so RAG's "retrieve → rerank → generate" has fewer hops and simpler ops; - Deploy as embedded (a single import in Python), single-node server, or client/server; 1C2G is enough, and it plays well with existing MySQL tooling. - Repo: github.com/oceanbase/seekdb (Apache 2.0 — Stars, Issues, PRs welcome) - Docs: seekdb documentation - Discord: https://discord.com/channels/1331061822945624085/1331061823465590805 - Press: OceanBase Releases seekdb (MarkTechPost) - Education UoE, IC - Joined Jan 29, 2026

how-totutorialguidedev.toaillmservermysqlpostgresqlapachenodepythondatabasegitgithub