Tools: From Documents to Answers: How RAG Works

Tools: From Documents to Answers: How RAG Works

Source: Dev.to

RAG INDEXING ## Architecture diagram ## 1) Document ingestion and preprocessing ## 2) Chunking ## 3) Embeddings ## RAG QUERY ## Query Processing Diagram ## Step 1: Convert User Query to Embedding ## Step 2: Similarity Search ## Step 3: LLM Response Generation The main steps to build a RAG pipeline are divided into two major processes: The indexing phase converts raw documents into structured vector representations so they can be efficiently retrieved using similarity search later. The first process starts with ingestion, cleaning, and converting the data into a proper format. This involves transforming raw data from the Bronze layer to the Gold layer. This is the very first and most crucial step, and it requires proper care before moving to the next stages. Ex- Suppose raw data is in bullet points like this AFTER PREPROCCESSING AND NORMALISATION Chunking means breaking large text into smaller pieces so the computer can understand and search it more effectively. Imagine you have a 5000-page book and you want to perform Q&A on top of it. To process the context properly, you split the text content based on: Once the chunks are ready with metadata such as: The chunking.json file (or Parquet for large-scale data) is stored, or it can be directly fed from memory into embedding models. Ex - chunk.json file. The real juice lies here where all the data is converted into numbers so computers can understand its meaning. Let’s say we embed the following sentence into a 3D space (in real-world scenarios, this can be 4000+ dimensions): The dog and cat are friends As shown in the image, the vectors for word "dog" and "cat" point in a similar direction (the cosine angle between them is small). However, the vector for the word "cricket" points in a different direction compared to "dog" or "cat". At this stage, all document chunks are converted into vector embeddings and stored inside the vector database. The indexing phase is now complete. The system has built a searchable semantic space. Now let’s understand how the system responds when a user submits a query. When a user submits a query, it is first converted into an embedding using the same model used during indexing. The retrieved results are then passed to an LLM for output generation and reasoning Convert the user query into vector embeddings using the same model that was used during document vector storage. Once the user query is converted into a vector, it is compared with all the stored document vectors in the vector database. Using cosine similarity, the system measures how close the query vector is to each document vector. The closest ones (top-k results) are selected and sent to the LLM. Ex - Suppose the user asks: What are the types of algorithms ? The system compares this query vector with stored chunks like: The chunk Types of Algorithms will have the highest similarity score, so it gets selected and passed to the LLM for generating the final answer. The original user query The retrieved document chunks (if found) It appends the retrieved content to the query context and generates the final output answer I’m currently learning more about RAG and Agentic AI step by step. If this helped you understand the pipeline better, feel free to like or follow for more as I share my journey. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: INTRODUCTION TO DATA SCIENCE!!! • DATA is everywhere in today's world • MACHINE learning helps in prediction • tools like PYTHON , R , SQL are used Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: INTRODUCTION TO DATA SCIENCE!!! • DATA is everywhere in today's world • MACHINE learning helps in prediction • tools like PYTHON , R , SQL are used CODE_BLOCK: INTRODUCTION TO DATA SCIENCE!!! • DATA is everywhere in today's world • MACHINE learning helps in prediction • tools like PYTHON , R , SQL are used CODE_BLOCK: Section: Introduction to Data Science Content: Data is everywhere in today's world. Machine learning helps in prediction. Tools like Python, R, and SQL are used. Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Section: Introduction to Data Science Content: Data is everywhere in today's world. Machine learning helps in prediction. Tools like Python, R, and SQL are used. CODE_BLOCK: Section: Introduction to Data Science Content: Data is everywhere in today's world. Machine learning helps in prediction. Tools like Python, R, and SQL are used. CODE_BLOCK: [ { "chunk_id": "ml_intro_chunk_0001", "chunk_index": 0, "doc_id": "machine_learning_basics", "section": "Introduction to Machine Learning", "content": "What is AI, Types of Algorithms", "page_start": 1, "page_end": 1, "char_start": 0, "word_count": 6, "language": "en" } ] Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: [ { "chunk_id": "ml_intro_chunk_0001", "chunk_index": 0, "doc_id": "machine_learning_basics", "section": "Introduction to Machine Learning", "content": "What is AI, Types of Algorithms", "page_start": 1, "page_end": 1, "char_start": 0, "word_count": 6, "language": "en" } ] CODE_BLOCK: [ { "chunk_id": "ml_intro_chunk_0001", "chunk_index": 0, "doc_id": "machine_learning_basics", "section": "Introduction to Machine Learning", "content": "What is AI, Types of Algorithms", "page_start": 1, "page_end": 1, "char_start": 0, "word_count": 6, "language": "en" } ] - RAG Indexing - Paragraphs. - Recursive patterns using delimiters like "\n\n" and "." - Other chunking strategies. - chunk_index. - Types of Algorithms - History of Computers - The original user query - The retrieved document chunks (if found)