Tools: Building a RAG System with Azure OpenAI and Cognitive Search: Complete Guide

Tools: Building a RAG System with Azure OpenAI and Cognitive Search: Complete Guide

Source: Dev.to

Building a RAG System with Azure OpenAI and Cognitive Search: Complete Guide ## Introduction ## Why RAG Matters ## Architecture Overview ## Prerequisites ## Step 1: Setting Up Azure Resources ## Create Azure OpenAI Resource ## Create Cognitive Search ## Step 2: Indexing Documents ## Step 3: Querying the RAG System ## Step 4: Semantic Search Configuration ## Cost Optimization Tips ## Testing Your RAG System ## Production Considerations ## Key Takeaways ## Next Steps Retrieval-Augmented Generation (RAG) is transforming how we build AI applications. Instead of relying solely on what the model knows, RAG allows us to augment responses with your own data - documents, databases, or any structured information. In this guide, I'll walk you through building a production-ready RAG system using Azure OpenAI and Azure Cognitive Search. By the end, you'll have a system that can answer questions about your own documents with citations. Traditional LLM limitations: Here's a complete Python script to index your documents: For better results, configure semantic search in Cognitive Search: Enable semantic search on your index: GitHub Repository: [Link to be created - azure-openai-rag-starter] Tags: #azureopenai #cognitive-search #rag #ai #tutorial #azure Have questions or want to see more detailed implementation? Let me know in the comments! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: ┌─────────────┐ ┌──────────────────┐ ┌─────────────┐ │ Documents │────>│ Azure Cognitive │────>│ Azure │ │ (PDF, etc) │ │ Search │ │ OpenAI │ └─────────────┘ └──────────────────┘ └─────────────┘ │ │ v v ┌─────────────┐ ┌─────────────┐ │ Embedding │ │ GPT-4 │ │ Model │ │ Model │ └─────────────┘ └─────────────┘ Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: ┌─────────────┐ ┌──────────────────┐ ┌─────────────┐ │ Documents │────>│ Azure Cognitive │────>│ Azure │ │ (PDF, etc) │ │ Search │ │ OpenAI │ └─────────────┘ └──────────────────┘ └─────────────┘ │ │ v v ┌─────────────┐ ┌─────────────┐ │ Embedding │ │ GPT-4 │ │ Model │ │ Model │ └─────────────┘ └─────────────┘ CODE_BLOCK: ┌─────────────┐ ┌──────────────────┐ ┌─────────────┐ │ Documents │────>│ Azure Cognitive │────>│ Azure │ │ (PDF, etc) │ │ Search │ │ OpenAI │ └─────────────┘ └──────────────────┘ └─────────────┘ │ │ v v ┌─────────────┐ ┌─────────────┐ │ Embedding │ │ GPT-4 │ │ Model │ │ Model │ └─────────────┘ └─────────────┘ COMMAND_BLOCK: # Create OpenAI resource az cognitiveservices account create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --kind OpenAI \ --sku s0 \ --location eastus # Deploy GPT-4 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name gpt-4 \ --model-format OpenAI \ --model gpt-4 \ --version "0613" \ --sku-capacity 1 \ --sku-name "Standard" # Deploy text-embedding-ada-002 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name text-embedding-ada-002 \ --model-format OpenAI \ --model text-embedding-ada-002 \ --version "2" \ --sku-capacity 1 \ --sku-name "Standard" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create OpenAI resource az cognitiveservices account create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --kind OpenAI \ --sku s0 \ --location eastus # Deploy GPT-4 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name gpt-4 \ --model-format OpenAI \ --model gpt-4 \ --version "0613" \ --sku-capacity 1 \ --sku-name "Standard" # Deploy text-embedding-ada-002 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name text-embedding-ada-002 \ --model-format OpenAI \ --model text-embedding-ada-002 \ --version "2" \ --sku-capacity 1 \ --sku-name "Standard" COMMAND_BLOCK: # Create OpenAI resource az cognitiveservices account create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --kind OpenAI \ --sku s0 \ --location eastus # Deploy GPT-4 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name gpt-4 \ --model-format OpenAI \ --model gpt-4 \ --version "0613" \ --sku-capacity 1 \ --sku-name "Standard" # Deploy text-embedding-ada-002 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name text-embedding-ada-002 \ --model-format OpenAI \ --model text-embedding-ada-002 \ --version "2" \ --sku-capacity 1 \ --sku-name "Standard" COMMAND_BLOCK: # Create search service az search service create \ --name search-rag-demo \ --resource-group rg-rag-demo \ --sku free \ --location eastus Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create search service az search service create \ --name search-rag-demo \ --resource-group rg-rag-demo \ --sku free \ --location eastus COMMAND_BLOCK: # Create search service az search service create \ --name search-rag-demo \ --resource-group rg-rag-demo \ --sku free \ --location eastus COMMAND_BLOCK: import os import json from azure.core.credentials import AzureKeyCredential from azure.search.documents import SearchClient from azure.search.documents.indexes import SearchIndexClient from azure.search.documents.indexes.models import * from openai import AzureOpenAI from pypdf import PdfReader import tiktoken # Configuration AZURE_SEARCH_ENDPOINT = os.environ["AZURE_SEARCH_ENDPOINT"] AZURE_SEARCH_KEY = os.environ["AZURE_SEARCH_KEY"] AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"] AZURE_OPENAI_KEY = os.environ["AZURE_OPENAI_KEY"] INDEX_NAME = "rag-index" # Initialize clients search_client = SearchClient( endpoint=AZURE_SEARCH_ENDPOINT, index_name=INDEX_NAME, credential=AzureKeyCredential(AZURE_SEARCH_KEY) ) openai_client = AzureOpenAI( api_key=AZURE_OPENAI_KEY, api_version="2024-02-01", azure_endpoint=AZURE_OPENAI_ENDPOINT ) def extract_text_from_pdf(pdf_path): """Extract text from PDF document""" reader = PdfReader(pdf_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text def chunk_text(text, chunk_size=1000, overlap=100): """Split text into overlapping chunks""" tokenizer = tiktoken.get_encoding("cl100k_base") tokens = tokenizer.encode(text) chunks = [] for i in range(0, len(tokens), chunk_size - overlap): chunk_tokens = tokens[i:i + chunk_size] chunk_text = tokenizer.decode(chunk_tokens) chunks.append(chunk_text) return chunks def get_embedding(text): """Get embedding for text using Azure OpenAI""" response = openai_client.embeddings.create( input=text, model="text-embedding-ada-002" ) return response.data[0].embedding def index_documents(folder_path): """Index all documents from a folder""" documents = [] for filename in os.listdir(folder_path): if filename.endswith('.pdf'): filepath = os.path.join(folder_path, filename) text = extract_text_from_pdf(filepath) chunks = chunk_text(text) for i, chunk in enumerate(chunks): doc = { "id": f"{filename}-{i}", "content": chunk, "source": filename, "chunk_id": i } doc["embedding"] = get_embedding(chunk) documents.append(doc) # Upload to search index search_client.upload_documents(documents) print(f"Indexed {len(documents)} document chunks") if __name__ == "__main__": index_documents("./documents") Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: import os import json from azure.core.credentials import AzureKeyCredential from azure.search.documents import SearchClient from azure.search.documents.indexes import SearchIndexClient from azure.search.documents.indexes.models import * from openai import AzureOpenAI from pypdf import PdfReader import tiktoken # Configuration AZURE_SEARCH_ENDPOINT = os.environ["AZURE_SEARCH_ENDPOINT"] AZURE_SEARCH_KEY = os.environ["AZURE_SEARCH_KEY"] AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"] AZURE_OPENAI_KEY = os.environ["AZURE_OPENAI_KEY"] INDEX_NAME = "rag-index" # Initialize clients search_client = SearchClient( endpoint=AZURE_SEARCH_ENDPOINT, index_name=INDEX_NAME, credential=AzureKeyCredential(AZURE_SEARCH_KEY) ) openai_client = AzureOpenAI( api_key=AZURE_OPENAI_KEY, api_version="2024-02-01", azure_endpoint=AZURE_OPENAI_ENDPOINT ) def extract_text_from_pdf(pdf_path): """Extract text from PDF document""" reader = PdfReader(pdf_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text def chunk_text(text, chunk_size=1000, overlap=100): """Split text into overlapping chunks""" tokenizer = tiktoken.get_encoding("cl100k_base") tokens = tokenizer.encode(text) chunks = [] for i in range(0, len(tokens), chunk_size - overlap): chunk_tokens = tokens[i:i + chunk_size] chunk_text = tokenizer.decode(chunk_tokens) chunks.append(chunk_text) return chunks def get_embedding(text): """Get embedding for text using Azure OpenAI""" response = openai_client.embeddings.create( input=text, model="text-embedding-ada-002" ) return response.data[0].embedding def index_documents(folder_path): """Index all documents from a folder""" documents = [] for filename in os.listdir(folder_path): if filename.endswith('.pdf'): filepath = os.path.join(folder_path, filename) text = extract_text_from_pdf(filepath) chunks = chunk_text(text) for i, chunk in enumerate(chunks): doc = { "id": f"{filename}-{i}", "content": chunk, "source": filename, "chunk_id": i } doc["embedding"] = get_embedding(chunk) documents.append(doc) # Upload to search index search_client.upload_documents(documents) print(f"Indexed {len(documents)} document chunks") if __name__ == "__main__": index_documents("./documents") COMMAND_BLOCK: import os import json from azure.core.credentials import AzureKeyCredential from azure.search.documents import SearchClient from azure.search.documents.indexes import SearchIndexClient from azure.search.documents.indexes.models import * from openai import AzureOpenAI from pypdf import PdfReader import tiktoken # Configuration AZURE_SEARCH_ENDPOINT = os.environ["AZURE_SEARCH_ENDPOINT"] AZURE_SEARCH_KEY = os.environ["AZURE_SEARCH_KEY"] AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"] AZURE_OPENAI_KEY = os.environ["AZURE_OPENAI_KEY"] INDEX_NAME = "rag-index" # Initialize clients search_client = SearchClient( endpoint=AZURE_SEARCH_ENDPOINT, index_name=INDEX_NAME, credential=AzureKeyCredential(AZURE_SEARCH_KEY) ) openai_client = AzureOpenAI( api_key=AZURE_OPENAI_KEY, api_version="2024-02-01", azure_endpoint=AZURE_OPENAI_ENDPOINT ) def extract_text_from_pdf(pdf_path): """Extract text from PDF document""" reader = PdfReader(pdf_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text def chunk_text(text, chunk_size=1000, overlap=100): """Split text into overlapping chunks""" tokenizer = tiktoken.get_encoding("cl100k_base") tokens = tokenizer.encode(text) chunks = [] for i in range(0, len(tokens), chunk_size - overlap): chunk_tokens = tokens[i:i + chunk_size] chunk_text = tokenizer.decode(chunk_tokens) chunks.append(chunk_text) return chunks def get_embedding(text): """Get embedding for text using Azure OpenAI""" response = openai_client.embeddings.create( input=text, model="text-embedding-ada-002" ) return response.data[0].embedding def index_documents(folder_path): """Index all documents from a folder""" documents = [] for filename in os.listdir(folder_path): if filename.endswith('.pdf'): filepath = os.path.join(folder_path, filename) text = extract_text_from_pdf(filepath) chunks = chunk_text(text) for i, chunk in enumerate(chunks): doc = { "id": f"{filename}-{i}", "content": chunk, "source": filename, "chunk_id": i } doc["embedding"] = get_embedding(chunk) documents.append(doc) # Upload to search index search_client.upload_documents(documents) print(f"Indexed {len(documents)} document chunks") if __name__ == "__main__": index_documents("./documents") COMMAND_BLOCK: def query_rag_system(query, top_k=5): """Query the RAG system and get augmented response""" # Get query embedding get_embedding(query) # Search query_embedding = for relevant documents search_results = search_client.search( search_text=query, vector_queries=[{ "kind": "vector", "field": "embedding", "vector": query_embedding, "k": top_k }], select=["content", "source", "chunk_id"], top=top_k ) # Build context from results context = "\n\n".join([ f"[Source: {result['source']}]\n{result['content']}" for result in search_results ]) # Generate response with context system_prompt = f"""You are a helpful assistant that answers questions based on the provided context. Always cite your sources. Context: {context} """ response = openai_client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], temperature=0.3 ) return { "answer": response.choices[0].message.content, "sources": [ {"source": r["source"], "chunk": r["chunk_id"]} for r in search_results ] } # Example usage result = query_rag_system("What are the key security considerations?") print(result["answer"]) print("\nSources:") for source in result["sources"]: print(f" - {source['source']} (chunk {source['chunk']})") Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: def query_rag_system(query, top_k=5): """Query the RAG system and get augmented response""" # Get query embedding get_embedding(query) # Search query_embedding = for relevant documents search_results = search_client.search( search_text=query, vector_queries=[{ "kind": "vector", "field": "embedding", "vector": query_embedding, "k": top_k }], select=["content", "source", "chunk_id"], top=top_k ) # Build context from results context = "\n\n".join([ f"[Source: {result['source']}]\n{result['content']}" for result in search_results ]) # Generate response with context system_prompt = f"""You are a helpful assistant that answers questions based on the provided context. Always cite your sources. Context: {context} """ response = openai_client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], temperature=0.3 ) return { "answer": response.choices[0].message.content, "sources": [ {"source": r["source"], "chunk": r["chunk_id"]} for r in search_results ] } # Example usage result = query_rag_system("What are the key security considerations?") print(result["answer"]) print("\nSources:") for source in result["sources"]: print(f" - {source['source']} (chunk {source['chunk']})") COMMAND_BLOCK: def query_rag_system(query, top_k=5): """Query the RAG system and get augmented response""" # Get query embedding get_embedding(query) # Search query_embedding = for relevant documents search_results = search_client.search( search_text=query, vector_queries=[{ "kind": "vector", "field": "embedding", "vector": query_embedding, "k": top_k }], select=["content", "source", "chunk_id"], top=top_k ) # Build context from results context = "\n\n".join([ f"[Source: {result['source']}]\n{result['content']}" for result in search_results ]) # Generate response with context system_prompt = f"""You are a helpful assistant that answers questions based on the provided context. Always cite your sources. Context: {context} """ response = openai_client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], temperature=0.3 ) return { "answer": response.choices[0].message.content, "sources": [ {"source": r["source"], "chunk": r["chunk_id"]} for r in search_results ] } # Example usage result = query_rag_system("What are the key security considerations?") print(result["answer"]) print("\nSources:") for source in result["sources"]: print(f" - {source['source']} (chunk {source['chunk']})") CODE_BLOCK: { "semanticConfiguration": { "name": "semantic-config", "prioritizedFields": { "titleField": { "fieldName": "source" }, "prioritizedContentFields": [ { "fieldName": "content" } ] } } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: { "semanticConfiguration": { "name": "semantic-config", "prioritizedFields": { "titleField": { "fieldName": "source" }, "prioritizedContentFields": [ { "fieldName": "content" } ] } } } CODE_BLOCK: { "semanticConfiguration": { "name": "semantic-config", "prioritizedFields": { "titleField": { "fieldName": "source" }, "prioritizedContentFields": [ { "fieldName": "content" } ] } } } COMMAND_BLOCK: from azure.search.documents.indexes.models import ( SemanticConfiguration, SemanticField, SemanticSettings ) semantic_config = SemanticConfiguration( name="default", prioritized_fields=SemanticPrioritizedFields( title_field=SemanticField(field_name="source"), prioritized_content_fields=[ SemanticField(field_name="content") ] ) ) # Apply to index index.semantic_settings = SemanticSettings( configurations=[semantic_config] ) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: from azure.search.documents.indexes.models import ( SemanticConfiguration, SemanticField, SemanticSettings ) semantic_config = SemanticConfiguration( name="default", prioritized_fields=SemanticPrioritizedFields( title_field=SemanticField(field_name="source"), prioritized_content_fields=[ SemanticField(field_name="content") ] ) ) # Apply to index index.semantic_settings = SemanticSettings( configurations=[semantic_config] ) COMMAND_BLOCK: from azure.search.documents.indexes.models import ( SemanticConfiguration, SemanticField, SemanticSettings ) semantic_config = SemanticConfiguration( name="default", prioritized_fields=SemanticPrioritizedFields( title_field=SemanticField(field_name="source"), prioritized_content_fields=[ SemanticField(field_name="content") ] ) ) # Apply to index index.semantic_settings = SemanticSettings( configurations=[semantic_config] ) COMMAND_BLOCK: # Simple caching implementation from functools import lru_cache @lru_cache(maxsize=1000) def cached_query(query): return query_rag_system(query) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Simple caching implementation from functools import lru_cache @lru_cache(maxsize=1000) def cached_query(query): return query_rag_system(query) COMMAND_BLOCK: # Simple caching implementation from functools import lru_cache @lru_cache(maxsize=1000) def cached_query(query): return query_rag_system(query) COMMAND_BLOCK: # Test cases test_queries = [ "What is the main topic of the document?", "Summarize the key findings", "What are the recommendations?" ] for query in test_queries: print(f"\nQuery: {query}") result = query_rag_system(query) print(f"Answer: {result['answer'][:200]}...") Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Test cases test_queries = [ "What is the main topic of the document?", "Summarize the key findings", "What are the recommendations?" ] for query in test_queries: print(f"\nQuery: {query}") result = query_rag_system(query) print(f"Answer: {result['answer'][:200]}...") COMMAND_BLOCK: # Test cases test_queries = [ "What is the main topic of the document?", "Summarize the key findings", "What are the recommendations?" ] for query in test_queries: print(f"\nQuery: {query}") result = query_rag_system(query) print(f"Answer: {result['answer'][:200]}...") - Knowledge cutoff dates - Hallucinations on specific domains - No access to private data - Grounding responses in your data - Providing source citations - Keeping data in your control - Azure subscription - Azure OpenAI resource with GPT-4 deployment - Azure Cognitive Search resource - Azure AI services (for embeddings) - Node.js 18+ or Python 3.9+ - Use the free tier for Cognitive Search during development - Implement caching for repeated queries - Batch embeddings - process multiple documents together - Monitor usage via Azure Cost Management - Security Use managed identities Implement role-based access Encrypt data at rest - Use managed identities - Implement role-based access - Encrypt data at rest - Monitoring Log all queries and responses Track token usage Set up alerts for errors - Log all queries and responses - Track token usage - Set up alerts for errors - Scalability Use Azure AD auth Implement rate limiting Consider vector database alternatives - Use Azure AD auth - Implement rate limiting - Consider vector database alternatives - Use managed identities - Implement role-based access - Encrypt data at rest - Log all queries and responses - Track token usage - Set up alerts for errors - Use Azure AD auth - Implement rate limiting - Consider vector database alternatives - RAG combines retrieval with generation for accurate, grounded responses - Azure Cognitive Search provides excellent vector and semantic search - Proper chunking and embedding are critical for quality results - Always cite sources in production systems - Add support for more document formats (DOCX, PPTX) - Implement hybrid search (keyword + vector) - Add user authentication and authorization - Build a web UI with streaming responses