RAG with Spring AI

RAG with Spring AI

Source: Dev.to

Build a Context-Aware Application ## Introduction: Spring AI vs. LangChain ## Prerequisites ## Step 1: Project Setup (Using Maven) ## Step 2: Configuration ## Step 3: Document Ingestion Service (ETL) ## Step 4: Implement the RAG Controller ## Step 5: Run and Test the Application ## Conclusion and Shortcomings of RAG RAG is in place now from at 2–3 years, still there is not much clarity on this aspect in Java ecosystem, though efforts have been made via LangChain4J but it is not that clear like mature frameworks like as spring. Retrieval-Augmented Generation (RAG) is a pattern that enhances Large Language Models (LLMs) by providing them with external, up-to-date, or proprietary data, which reduces hallucinations and grounds the response in facts. Spring AI provides an idiomatic and seamless way to implement RAG within the Spring Boot ecosystem. Spring AI is a framework that aims to apply Spring ecosystem design principles — such as portability (across models and vector stores) and modular design — to the AI domain. It is a natural choice for Java/Spring Boot developers as it fully embraces Spring conventions like Dependency Injection, auto-configuration, and POJOs (Plain Old Java Objects). Spring AI can be a strong alternative to LangChain (and its Java port, LangChain4j) for RAG, especially within an enterprise setting, because: - Seamless Spring Boot Integration: It uses Spring Boot starters, making setup incredibly fast. You get an automatically configured ChatClient and VectorStore by simply adding dependencies and properties. - Idiomatic Java: The APIs feel like other Spring APIs (like WebClient or JdbcTemplate), leveraging familiar patterns for Java developers. - Enterprise-Grade Features: It is backed by the Spring ecosystem, inheriting robust features like observability, security, and consistent configuration. - Focus on Abstraction: It provides high-level abstractions like the Advisor API for RAG, which encapsulates the entire retrieval and prompt augmentation logic, often requiring less boilerplate than manually stitching together a chain. To follow this tutorial, you will need: Create a new Spring Boot project (e.g., using start.spring.io) and add the following dependencies. We will use the OpenAI model and the PostgreSQL/PGVector vector store for a robust, production-ready setup. Configure the LLM API key and the PostgreSQL vector store in your application.properties (or application.yml). Note: For the PGVector store, you’ll need a running PostgreSQL database with the pgvector extension enabled. Using Docker Compose is recommended for local development. The first part of RAG is the Extract, Transform, Load (ETL) pipeline. We read a document, split it into smaller chunks (documents), generate embeddings for the chunks, and store them in the VectorStore. Create a service named IngestionService.java: Example content for src/main/resources/data/spring-ai-info.txt: The RAG logic is greatly simplified by Spring AI’s Advisor API, specifically QuestionAnswerAdvisor. This advisor automatically performs the retrieval and prompt augmentation before calling the LLM. Create a REST controller named RagController.java 1. Ensure PostgreSQL is running with pgvector enabled (e.g., via Docker). 2. Run the Spring Boot application. The IngestionService will execute upon startup, loading your document into the vector store. 3. Test the RAG endpoint using a browser or a tool like cURL: Query based on the context: RAG with Spring AI is a powerful and convenient pattern. However, the RAG approach itself, regardless of the framework, has inherent shortcomings: 1. The “Garbage In, Garbage Out” Problem: The quality of the final answer is directly dependent on the quality of the retrieved documents. If the source documents are poorly structured, incomplete, or the chunking is sub-optimal, the LLM will still provide a poor or hallucinated answer. Fix: Requires a robust ETL pipeline for document cleaning and structured chunking. 1. Need for Fine-Tuning Retrieval: Simple vector similarity search is not always enough. 2. Advanced scenarios require: 1. Context Window Management: The retrieved documents must fit within the LLM’s context window. If too many relevant chunks are found, they must be truncated or summarized, which can lead to incomplete answers. Integration Complexity (Spring AI Specific): While simple RAG is easy, more complex agentic workflows or highly customized multi-step reasoning often require more explicit configuration than the high-level Advisor abstraction, potentially leading to more code than in a framework designed primarily for chaining (like LangChain4j). Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>1.0.0</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pdf-document-reader</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <scope>runtime</scope> </dependency> </dependencies> Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>1.0.0</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pdf-document-reader</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <scope>runtime</scope> </dependency> </dependencies> COMMAND_BLOCK: <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>1.0.0</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pdf-document-reader</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <scope>runtime</scope> </dependency> </dependencies> COMMAND_BLOCK: Properties # LLM Configuration (OpenAI Example) spring.ai.openai.api-key=${OPENAI_API_KEY} spring.ai.openai.chat.model=gpt-4o-mini spring.ai.openai.embedding.model=text-embedding-3-small # PostgreSQL/PGVector Configuration spring.datasource.url=jdbc:postgresql://localhost:5432/ragdb spring.datasource.username=user spring.datasource.password=password spring.jpa.hibernate.ddl-auto=update # Spring AI Vector Store Schema Initialization # This creates the necessary table for the vector store spring.ai.vectorstore.pgvector.initialize-schema=true Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: Properties # LLM Configuration (OpenAI Example) spring.ai.openai.api-key=${OPENAI_API_KEY} spring.ai.openai.chat.model=gpt-4o-mini spring.ai.openai.embedding.model=text-embedding-3-small # PostgreSQL/PGVector Configuration spring.datasource.url=jdbc:postgresql://localhost:5432/ragdb spring.datasource.username=user spring.datasource.password=password spring.jpa.hibernate.ddl-auto=update # Spring AI Vector Store Schema Initialization # This creates the necessary table for the vector store spring.ai.vectorstore.pgvector.initialize-schema=true COMMAND_BLOCK: Properties # LLM Configuration (OpenAI Example) spring.ai.openai.api-key=${OPENAI_API_KEY} spring.ai.openai.chat.model=gpt-4o-mini spring.ai.openai.embedding.model=text-embedding-3-small # PostgreSQL/PGVector Configuration spring.datasource.url=jdbc:postgresql://localhost:5432/ragdb spring.datasource.username=user spring.datasource.password=password spring.jpa.hibernate.ddl-auto=update # Spring AI Vector Store Schema Initialization # This creates the necessary table for the vector store spring.ai.vectorstore.pgvector.initialize-schema=true COMMAND_BLOCK: package com.example.ragtutorial; import org.springframework.ai.document.Document; import org.springframework.ai.reader.TextReader; import org.springframework.ai.transformer.splitter.TokenTextSplitter; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.beans.factory.annotation.Value; import org.springframework.boot.CommandLineRunner; import org.springframework.core.io.Resource; import org.springframework.stereotype.Service; import java.util.List; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @Service public class IngestionService implements CommandLineRunner { private static final Logger log = LoggerFactory.getLogger(IngestionService.class); private final VectorStore vectorStore; // Use a text file for simplicity. Place it in src/main/resources/data/ @Value("classpath:/data/spring-ai-info.txt") private Resource dataResource; public IngestionService(VectorStore vectorStore) { this.vectorStore = vectorStore; } @Override public void run(String... args) { log.info("Starting RAG document ingestion..."); // 1. Extract: Read the document content TextReader textReader = new TextReader(dataResource); List<Document> rawDocuments = textReader.get(); // 2. Transform: Split the large document into smaller, manageable chunks // TokenTextSplitter ensures chunks fit within the LLM's context window TokenTextSplitter textSplitter = new TokenTextSplitter(); List<Document> splitDocuments = textSplitter.apply(rawDocuments); // 3. Load: Store the documents (which creates and stores embeddings) vectorStore.accept(splitDocuments); log.info("Document ingestion complete. {} chunks loaded into VectorStore.", splitDocuments.size()); } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: package com.example.ragtutorial; import org.springframework.ai.document.Document; import org.springframework.ai.reader.TextReader; import org.springframework.ai.transformer.splitter.TokenTextSplitter; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.beans.factory.annotation.Value; import org.springframework.boot.CommandLineRunner; import org.springframework.core.io.Resource; import org.springframework.stereotype.Service; import java.util.List; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @Service public class IngestionService implements CommandLineRunner { private static final Logger log = LoggerFactory.getLogger(IngestionService.class); private final VectorStore vectorStore; // Use a text file for simplicity. Place it in src/main/resources/data/ @Value("classpath:/data/spring-ai-info.txt") private Resource dataResource; public IngestionService(VectorStore vectorStore) { this.vectorStore = vectorStore; } @Override public void run(String... args) { log.info("Starting RAG document ingestion..."); // 1. Extract: Read the document content TextReader textReader = new TextReader(dataResource); List<Document> rawDocuments = textReader.get(); // 2. Transform: Split the large document into smaller, manageable chunks // TokenTextSplitter ensures chunks fit within the LLM's context window TokenTextSplitter textSplitter = new TokenTextSplitter(); List<Document> splitDocuments = textSplitter.apply(rawDocuments); // 3. Load: Store the documents (which creates and stores embeddings) vectorStore.accept(splitDocuments); log.info("Document ingestion complete. {} chunks loaded into VectorStore.", splitDocuments.size()); } } COMMAND_BLOCK: package com.example.ragtutorial; import org.springframework.ai.document.Document; import org.springframework.ai.reader.TextReader; import org.springframework.ai.transformer.splitter.TokenTextSplitter; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.beans.factory.annotation.Value; import org.springframework.boot.CommandLineRunner; import org.springframework.core.io.Resource; import org.springframework.stereotype.Service; import java.util.List; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @Service public class IngestionService implements CommandLineRunner { private static final Logger log = LoggerFactory.getLogger(IngestionService.class); private final VectorStore vectorStore; // Use a text file for simplicity. Place it in src/main/resources/data/ @Value("classpath:/data/spring-ai-info.txt") private Resource dataResource; public IngestionService(VectorStore vectorStore) { this.vectorStore = vectorStore; } @Override public void run(String... args) { log.info("Starting RAG document ingestion..."); // 1. Extract: Read the document content TextReader textReader = new TextReader(dataResource); List<Document> rawDocuments = textReader.get(); // 2. Transform: Split the large document into smaller, manageable chunks // TokenTextSplitter ensures chunks fit within the LLM's context window TokenTextSplitter textSplitter = new TokenTextSplitter(); List<Document> splitDocuments = textSplitter.apply(rawDocuments); // 3. Load: Store the documents (which creates and stores embeddings) vectorStore.accept(splitDocuments); log.info("Document ingestion complete. {} chunks loaded into VectorStore.", splitDocuments.size()); } } CODE_BLOCK: Spring AI is an application framework for AI engineering. Its goal is to apply Spring ecosystem design principles to the AI domain. It connects enterprise data and APIs with AI Models. It offers a portable API across different AI providers like OpenAI, Gemini, and Ollama. For RAG, it supports vector stores such as PGVector, Chroma, and Redis. The ChatClient API is used for communication, and the Advisor API simplifies patterns like RAG. Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Spring AI is an application framework for AI engineering. Its goal is to apply Spring ecosystem design principles to the AI domain. It connects enterprise data and APIs with AI Models. It offers a portable API across different AI providers like OpenAI, Gemini, and Ollama. For RAG, it supports vector stores such as PGVector, Chroma, and Redis. The ChatClient API is used for communication, and the Advisor API simplifies patterns like RAG. CODE_BLOCK: Spring AI is an application framework for AI engineering. Its goal is to apply Spring ecosystem design principles to the AI domain. It connects enterprise data and APIs with AI Models. It offers a portable API across different AI providers like OpenAI, Gemini, and Ollama. For RAG, it supports vector stores such as PGVector, Chroma, and Redis. The ChatClient API is used for communication, and the Advisor API simplifies patterns like RAG. CODE_BLOCK: package com.example.ragtutorial; import org.springframework.ai.chat.client.ChatClient; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; @RestController public class RagController { private final ChatClient chatClient; public RagController(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) { // Configure the ChatClient with the QuestionAnswerAdvisor // The QuestionAnswerAdvisor handles: // 1. Retrieving relevant documents from the VectorStore based on the user query. // 2. Augmenting the user's prompt with the retrieved documents as context. this.chatClient = chatClientBuilder // This is the core of RAG implementation in Spring AI .defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build()) .build(); } @GetMapping("/rag/query") public String ragQuery(@RequestParam(defaultValue = "What is Spring AI and what are its features?") String query) { // The advisor runs before this call, injecting the retrieved context into the prompt return this.chatClient.prompt() .user(query) .call() .content(); } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: package com.example.ragtutorial; import org.springframework.ai.chat.client.ChatClient; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; @RestController public class RagController { private final ChatClient chatClient; public RagController(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) { // Configure the ChatClient with the QuestionAnswerAdvisor // The QuestionAnswerAdvisor handles: // 1. Retrieving relevant documents from the VectorStore based on the user query. // 2. Augmenting the user's prompt with the retrieved documents as context. this.chatClient = chatClientBuilder // This is the core of RAG implementation in Spring AI .defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build()) .build(); } @GetMapping("/rag/query") public String ragQuery(@RequestParam(defaultValue = "What is Spring AI and what are its features?") String query) { // The advisor runs before this call, injecting the retrieved context into the prompt return this.chatClient.prompt() .user(query) .call() .content(); } } CODE_BLOCK: package com.example.ragtutorial; import org.springframework.ai.chat.client.ChatClient; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; @RestController public class RagController { private final ChatClient chatClient; public RagController(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) { // Configure the ChatClient with the QuestionAnswerAdvisor // The QuestionAnswerAdvisor handles: // 1. Retrieving relevant documents from the VectorStore based on the user query. // 2. Augmenting the user's prompt with the retrieved documents as context. this.chatClient = chatClientBuilder // This is the core of RAG implementation in Spring AI .defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build()) .build(); } @GetMapping("/rag/query") public String ragQuery(@RequestParam(defaultValue = "What is Spring AI and what are its features?") String query) { // The advisor runs before this call, injecting the retrieved context into the prompt return this.chatClient.prompt() .user(query) .call() .content(); } } COMMAND_BLOCK: curl 'http://localhost:8080/rag/query?query=What is the primary goal of Spring AI?' # Expected Output (grounded in your document): The primary goal of Spring AI is to apply Spring ecosystem design principles to the AI domain and to connect enterprise data and APIs with AI Models. Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: curl 'http://localhost:8080/rag/query?query=What is the primary goal of Spring AI?' # Expected Output (grounded in your document): The primary goal of Spring AI is to apply Spring ecosystem design principles to the AI domain and to connect enterprise data and APIs with AI Models. COMMAND_BLOCK: curl 'http://localhost:8080/rag/query?query=What is the primary goal of Spring AI?' # Expected Output (grounded in your document): The primary goal of Spring AI is to apply Spring ecosystem design principles to the AI domain and to connect enterprise data and APIs with AI Models. - Java 21 or later. - Maven or Gradle. - An API Key for an LLM provider (e.g., OpenAI, Google Gemini, etc.). We will use OpenAI for this example. - The latest Spring AI Bill of Materials (BOM). We will assume the latest stable version of Spring AI is used. - Re-ranking: Using a separate model to re-score the top-K retrieved documents for better relevance. - Query Transformation: Using the LLM to rewrite the user’s question into multiple, more specific queries to boost recall (MultiQueryExpander in Spring AI). - Hybrid Search: Combining vector search with traditional keyword search (lexical search) to cover more bases.