Tools: Building a RAG System with Azure OpenAI and Cognitive Search: Complete Guide

Tools: Building a RAG System with Azure OpenAI and Cognitive Search: Complete Guide

Building a RAG System with Azure OpenAI and Cognitive Search: Complete Guide ## Introduction ## Why RAG Matters ## Architecture Overview ## Prerequisites ## Step 1: Setting Up Azure Resources ## Create Azure OpenAI Resource ## Create Cognitive Search ## Step 2: Indexing Documents ## Step 3: Querying the RAG System ## Step 4: Semantic Search Configuration ## Cost Optimization Tips ## Testing Your RAG System ## Production Considerations ## Key Takeaways ## Next Steps Retrieval-Augmented Generation (RAG) is transforming how we build AI applications. Instead of relying solely on what the model knows, RAG allows us to augment responses with your own data - documents, databases, or any structured information. In this guide, I'll walk you through building a production-ready RAG system using Azure OpenAI and Azure Cognitive Search. By the end, you'll have a system that can answer questions about your own documents with citations. Traditional LLM limitations: Here's a complete Python script to index your documents: For better results, configure semantic search in Cognitive Search: Enable semantic search on your index: GitHub Repository: [Link to be created - azure-openai-rag-starter] Tags: #azureopenai #cognitive-search #rag #ai #tutorial #azure Have questions or want to see more detailed implementation? Let me know in the comments! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or CODE_BLOCK: ┌─────────────┐ ┌──────────────────┐ ┌─────────────┐ │ Documents │────>│ Azure Cognitive │────>│ Azure │ │ (PDF, etc) │ │ Search │ │ OpenAI │ └─────────────┘ └──────────────────┘ └─────────────┘ │ │ v v ┌─────────────┐ ┌─────────────┐ │ Embedding │ │ GPT-4 │ │ Model │ │ Model │ └─────────────┘ └─────────────┘ CODE_BLOCK: ┌─────────────┐ ┌──────────────────┐ ┌─────────────┐ │ Documents │────>│ Azure Cognitive │────>│ Azure │ │ (PDF, etc) │ │ Search │ │ OpenAI │ └─────────────┘ └──────────────────┘ └─────────────┘ │ │ v v ┌─────────────┐ ┌─────────────┐ │ Embedding │ │ GPT-4 │ │ Model │ │ Model │ └─────────────┘ └─────────────┘ CODE_BLOCK: ┌─────────────┐ ┌──────────────────┐ ┌─────────────┐ │ Documents │────>│ Azure Cognitive │────>│ Azure │ │ (PDF, etc) │ │ Search │ │ OpenAI │ └─────────────┘ └──────────────────┘ └─────────────┘ │ │ v v ┌─────────────┐ ┌─────────────┐ │ Embedding │ │ GPT-4 │ │ Model │ │ Model │ └─────────────┘ └─────────────┘ COMMAND_BLOCK: # Create OpenAI resource az cognitiveservices account create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --kind OpenAI \ --sku s0 \ --location eastus # Deploy GPT-4 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name gpt-4 \ --model-format OpenAI \ --model gpt-4 \ --version "0613" \ --sku-capacity 1 \ --sku-name "Standard" # Deploy text-embedding-ada-002 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name text-embedding-ada-002 \ --model-format OpenAI \ --model text-embedding-ada-002 \ --version "2" \ --sku-capacity 1 \ --sku-name "Standard" COMMAND_BLOCK: # Create OpenAI resource az cognitiveservices account create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --kind OpenAI \ --sku s0 \ --location eastus # Deploy GPT-4 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name gpt-4 \ --model-format OpenAI \ --model gpt-4 \ --version "0613" \ --sku-capacity 1 \ --sku-name "Standard" # Deploy text-embedding-ada-002 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name text-embedding-ada-002 \ --model-format OpenAI \ --model text-embedding-ada-002 \ --version "2" \ --sku-capacity 1 \ --sku-name "Standard" COMMAND_BLOCK: # Create OpenAI resource az cognitiveservices account create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --kind OpenAI \ --sku s0 \ --location eastus # Deploy GPT-4 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name gpt-4 \ --model-format OpenAI \ --model gpt-4 \ --version "0613" \ --sku-capacity 1 \ --sku-name "Standard" # Deploy text-embedding-ada-002 az cognitiveservices account deployment create \ --name openai-rag-demo \ --resource-group rg-rag-demo \ --deployment-name text-embedding-ada-002 \ --model-format OpenAI \ --model text-embedding-ada-002 \ --version "2" \ --sku-capacity 1 \ --sku-name "Standard" COMMAND_BLOCK: # Create search service az search service create \ --name search-rag-demo \ --resource-group rg-rag-demo \ --sku free \ --location eastus COMMAND_BLOCK: # Create search service az search service create \ --name search-rag-demo \ --resource-group rg-rag-demo \ --sku free \ --location eastus COMMAND_BLOCK: # Create search service az search service create \ --name search-rag-demo \ --resource-group rg-rag-demo \ --sku free \ --location eastus COMMAND_BLOCK: import os import json from azure.core.credentials import AzureKeyCredential from azure.search.documents import SearchClient from azure.search.documents.indexes import SearchIndexClient from azure.search.documents.indexes.models import * from openai import AzureOpenAI from pypdf import PdfReader import tiktoken # Configuration AZURE_SEARCH_ENDPOINT = os.environ["AZURE_SEARCH_ENDPOINT"] AZURE_SEARCH_KEY = os.environ["AZURE_SEARCH_KEY"] AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"] AZURE_OPENAI_KEY = os.environ["AZURE_OPENAI_KEY"] INDEX_NAME = "rag-index" # Initialize clients search_client = SearchClient( endpoint=AZURE_SEARCH_ENDPOINT, index_name=INDEX_NAME, credential=AzureKeyCredential(AZURE_SEARCH_KEY) ) openai_client = AzureOpenAI( api_key=AZURE_OPENAI_KEY, api_version="2024-02-01", azure_endpoint=AZURE_OPENAI_ENDPOINT ) def extract_text_from_pdf(pdf_path): """Extract text from PDF document""" reader = PdfReader(pdf_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text def chunk_text(text, chunk_size=1000, overlap=100): """Split text into overlapping chunks""" tokenizer = tiktoken.get_encoding("cl100k_base") tokens = tokenizer.encode(text) chunks = [] for i in range(0, len(tokens), chunk_size - overlap): chunk_tokens = tokens[i:i + chunk_size] chunk_text = tokenizer.decode(chunk_tokens) chunks.append(chunk_text) return chunks def get_embedding(text): """Get embedding for text using Azure OpenAI""" response = openai_client.embeddings.create( input=text, model="text-embedding-ada-002" ) return response.data[0].embedding def index_documents(folder_path): """Index all documents from a folder""" documents = [] for filename in os.listdir(folder_path): if filename.endswith('.pdf'): filepath = os.path.join(folder_path, filename) text = extract_text_from_pdf(filepath) chunks = chunk_text(text) for i, chunk in enumerate(chunks): doc = { "id": f"{filename}-{i}", "content": chunk, "source": filename, "chunk_id": i } doc["embedding"] = get_embedding(chunk) documents.append(doc) # Upload to search index search_client.upload_documents(documents) print(f"Indexed {len(documents)} document chunks") if __name__ == "__main__": index_documents("./documents") COMMAND_BLOCK: import os import json from azure.core.credentials import AzureKeyCredential from azure.search.documents import SearchClient from azure.search.documents.indexes import SearchIndexClient from azure.search.documents.indexes.models import * from openai import AzureOpenAI from pypdf import PdfReader import tiktoken # Configuration AZURE_SEARCH_ENDPOINT = os.environ["AZURE_SEARCH_ENDPOINT"] AZURE_SEARCH_KEY = os.environ["AZURE_SEARCH_KEY"] AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"] AZURE_OPENAI_KEY = os.environ["AZURE_OPENAI_KEY"] INDEX_NAME = "rag-index" # Initialize clients search_client = SearchClient( endpoint=AZURE_SEARCH_ENDPOINT, index_name=INDEX_NAME, credential=AzureKeyCredential(AZURE_SEARCH_KEY) ) openai_client = AzureOpenAI( api_key=AZURE_OPENAI_KEY, api_version="2024-02-01", azure_endpoint=AZURE_OPENAI_ENDPOINT ) def extract_text_from_pdf(pdf_path): """Extract text from PDF document""" reader = PdfReader(pdf_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text def chunk_text(text, chunk_size=1000, overlap=100): """Split text into overlapping chunks""" tokenizer = tiktoken.get_encoding("cl100k_base") tokens = tokenizer.encode(text) chunks = [] for i in range(0, len(tokens), chunk_size - overlap): chunk_tokens = tokens[i:i + chunk_size] chunk_text = tokenizer.decode(chunk_tokens) chunks.append(chunk_text) return chunks def get_embedding(text): """Get embedding for text using Azure OpenAI""" response = openai_client.embeddings.create( input=text, model="text-embedding-ada-002" ) return response.data[0].embedding def index_documents(folder_path): """Index all documents from a folder""" documents = [] for filename in os.listdir(folder_path): if filename.endswith('.pdf'): filepath = os.path.join(folder_path, filename) text = extract_text_from_pdf(filepath) chunks = chunk_text(text) for i, chunk in enumerate(chunks): doc = { "id": f"{filename}-{i}", "content": chunk, "source": filename, "chunk_id": i } doc["embedding"] = get_embedding(chunk) documents.append(doc) # Upload to search index search_client.upload_documents(documents) print(f"Indexed {len(documents)} document chunks") if __name__ == "__main__": index_documents("./documents") COMMAND_BLOCK: import os import json from azure.core.credentials import AzureKeyCredential from azure.search.documents import SearchClient from azure.search.documents.indexes import SearchIndexClient from azure.search.documents.indexes.models import * from openai import AzureOpenAI from pypdf import PdfReader import tiktoken # Configuration AZURE_SEARCH_ENDPOINT = os.environ["AZURE_SEARCH_ENDPOINT"] AZURE_SEARCH_KEY = os.environ["AZURE_SEARCH_KEY"] AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"] AZURE_OPENAI_KEY = os.environ["AZURE_OPENAI_KEY"] INDEX_NAME = "rag-index" # Initialize clients search_client = SearchClient( endpoint=AZURE_SEARCH_ENDPOINT, index_name=INDEX_NAME, credential=AzureKeyCredential(AZURE_SEARCH_KEY) ) openai_client = AzureOpenAI( api_key=AZURE_OPENAI_KEY, api_version="2024-02-01", azure_endpoint=AZURE_OPENAI_ENDPOINT ) def extract_text_from_pdf(pdf_path): """Extract text from PDF document""" reader = PdfReader(pdf_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text def chunk_text(text, chunk_size=1000, overlap=100): """Split text into overlapping chunks""" tokenizer = tiktoken.get_encoding("cl100k_base") tokens = tokenizer.encode(text) chunks = [] for i in range(0, len(tokens), chunk_size - overlap): chunk_tokens = tokens[i:i + chunk_size] chunk_text = tokenizer.decode(chunk_tokens) chunks.append(chunk_text) return chunks def get_embedding(text): """Get embedding for text using Azure OpenAI""" response = openai_client.embeddings.create( input=text, model="text-embedding-ada-002" ) return response.data[0].embedding def index_documents(folder_path): """Index all documents from a folder""" documents = [] for filename in os.listdir(folder_path): if filename.endswith('.pdf'): filepath = os.path.join(folder_path, filename) text = extract_text_from_pdf(filepath) chunks = chunk_text(text) for i, chunk in enumerate(chunks): doc = { "id": f"{filename}-{i}", "content": chunk, "source": filename, "chunk_id": i } doc["embedding"] = get_embedding(chunk) documents.append(doc) # Upload to search index search_client.upload_documents(documents) print(f"Indexed {len(documents)} document chunks") if __name__ == "__main__": index_documents("./documents") COMMAND_BLOCK: def query_rag_system(query, top_k=5): """Query the RAG system and get augmented response""" # Get query embedding get_embedding(query) # Search query_embedding = for relevant documents search_results = search_client.search( search_text=query, vector_queries=[{ "kind": "vector", "field": "embedding", "vector": query_embedding, "k": top_k }], select=["content", "source", "chunk_id"], top=top_k ) # Build context from results context = "\n\n".join([ f"[ {result['source']}]\n{result['content']}" for result in search_results ]) # Generate response with context system_prompt = f"""You are a helpful assistant that answers questions based on the provided context. Always cite your sources. Context: {context} """ response = openai_client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], temperature=0.3 ) return { "answer": response.choices[0].message.content, "sources": [ {"source": r["source"], "chunk": r["chunk_id"]} for r in search_results ] } # Example usage result = query_rag_system("What are the key security considerations?") print(result["answer"]) print("\nSources:") for source in result["sources"]: print(f" - {source['source']} (chunk {source['chunk']})") COMMAND_BLOCK: def query_rag_system(query, top_k=5): """Query the RAG system and get augmented response""" # Get query embedding get_embedding(query) # Search query_embedding = for relevant documents search_results = search_client.search( search_text=query, vector_queries=[{ "kind": "vector", "field": "embedding", "vector": query_embedding, "k": top_k }], select=["content", "source", "chunk_id"], top=top_k ) # Build context from results context = "\n\n".join([ f"[ {result['source']}]\n{result['content']}" for result in search_results ]) # Generate response with context system_prompt = f"""You are a helpful assistant that answers questions based on the provided context. Always cite your sources. Context: {context} """ response = openai_client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], temperature=0.3 ) return { "answer": response.choices[0].message.content, "sources": [ {"source": r["source"], "chunk": r["chunk_id"]} for r in search_results ] } # Example usage result = query_rag_system("What are the key security considerations?") print(result["answer"]) print("\nSources:") for source in result["sources"]: print(f" - {source['source']} (chunk {source['chunk']})") COMMAND_BLOCK: def query_rag_system(query, top_k=5): """Query the RAG system and get augmented response""" # Get query embedding get_embedding(query) # Search query_embedding = for relevant documents search_results = search_client.search( search_text=query, vector_queries=[{ "kind": "vector", "field": "embedding", "vector": query_embedding, "k": top_k }], select=["content", "source", "chunk_id"], top=top_k ) # Build context from results context = "\n\n".join([ f"[ {result['source']}]\n{result['content']}" for result in search_results ]) # Generate response with context system_prompt = f"""You are a helpful assistant that answers questions based on the provided context. Always cite your sources. Context: {context} """ response = openai_client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ], temperature=0.3 ) return { "answer": response.choices[0].message.content, "sources": [ {"source": r["source"], "chunk": r["chunk_id"]} for r in search_results ] } # Example usage result = query_rag_system("What are the key security considerations?") print(result["answer"]) print("\nSources:") for source in result["sources"]: print(f" - {source['source']} (chunk {source['chunk']})") CODE_BLOCK: { "semanticConfiguration": { "name": "semantic-config", "prioritizedFields": { "titleField": { "fieldName": "source" }, "prioritizedContentFields": [ { "fieldName": "content" } ] } } } CODE_BLOCK: { "semanticConfiguration": { "name": "semantic-config", "prioritizedFields": { "titleField": { "fieldName": "source" }, "prioritizedContentFields": [ { "fieldName": "content" } ] } } } CODE_BLOCK: { "semanticConfiguration": { "name": "semantic-config", "prioritizedFields": { "titleField": { "fieldName": "source" }, "prioritizedContentFields": [ { "fieldName": "content" } ] } } } COMMAND_BLOCK: from azure.search.documents.indexes.models import ( SemanticConfiguration, SemanticField, SemanticSettings ) semantic_config = SemanticConfiguration( name="default", prioritized_fields=SemanticPrioritizedFields( title_field=SemanticField(field_name="source"), prioritized_content_fields=[ SemanticField(field_name="content") ] ) ) # Apply to index index.semantic_settings = SemanticSettings( configurations=[semantic_config] ) COMMAND_BLOCK: from azure.search.documents.indexes.models import ( SemanticConfiguration, SemanticField, SemanticSettings ) semantic_config = SemanticConfiguration( name="default", prioritized_fields=SemanticPrioritizedFields( title_field=SemanticField(field_name="source"), prioritized_content_fields=[ SemanticField(field_name="content") ] ) ) # Apply to index index.semantic_settings = SemanticSettings( configurations=[semantic_config] ) COMMAND_BLOCK: from azure.search.documents.indexes.models import ( SemanticConfiguration, SemanticField, SemanticSettings ) semantic_config = SemanticConfiguration( name="default", prioritized_fields=SemanticPrioritizedFields( title_field=SemanticField(field_name="source"), prioritized_content_fields=[ SemanticField(field_name="content") ] ) ) # Apply to index index.semantic_settings = SemanticSettings( configurations=[semantic_config] ) COMMAND_BLOCK: # Simple caching implementation from functools import lru_cache @lru_cache(maxsize=1000) def cached_query(query): return query_rag_system(query) COMMAND_BLOCK: # Simple caching implementation from functools import lru_cache @lru_cache(maxsize=1000) def cached_query(query): return query_rag_system(query) COMMAND_BLOCK: # Simple caching implementation from functools import lru_cache @lru_cache(maxsize=1000) def cached_query(query): return query_rag_system(query) COMMAND_BLOCK: # Test cases test_queries = [ "What is the main topic of the document?", "Summarize the key findings", "What are the recommendations?" ] for query in test_queries: print(f"\nQuery: {query}") result = query_rag_system(query) print(f"Answer: {result['answer'][:200]}...") COMMAND_BLOCK: # Test cases test_queries = [ "What is the main topic of the document?", "Summarize the key findings", "What are the recommendations?" ] for query in test_queries: print(f"\nQuery: {query}") result = query_rag_system(query) print(f"Answer: {result['answer'][:200]}...") COMMAND_BLOCK: # Test cases test_queries = [ "What is the main topic of the document?", "Summarize the key findings", "What are the recommendations?" ] for query in test_queries: print(f"\nQuery: {query}") result = query_rag_system(query) print(f"Answer: {result['answer'][:200]}...") - Knowledge cutoff dates - Hallucinations on specific domains - No access to private data - Grounding responses in your data - Providing source citations - Keeping data in your control - Azure subscription - Azure OpenAI resource with GPT-4 deployment - Azure Cognitive Search resource - Azure AI services (for embeddings) - Node.js 18+ or Python 3.9+ - Use the free tier for Cognitive Search during development - Implement caching for repeated queries - Batch embeddings - process multiple documents together - Monitor usage via Azure Cost Management - Security Use managed identities Implement role-based access Encrypt data at rest - Use managed identities - Implement role-based access - Encrypt data at rest - Monitoring Log all queries and responses Track token usage Set up alerts for errors - Log all queries and responses - Track token usage - Set up alerts for errors - Scalability Use Azure AD auth Implement rate limiting Consider vector database alternatives - Use Azure AD auth - Implement rate limiting - Consider vector database alternatives - Use managed identities - Implement role-based access - Encrypt data at rest - Log all queries and responses - Track token usage - Set up alerts for errors - Use Azure AD auth - Implement rate limiting - Consider vector database alternatives - RAG combines retrieval with generation for accurate, grounded responses - Azure Cognitive Search provides excellent vector and semantic search - Proper chunking and embedding are critical for quality results - Always cite sources in production systems - Add support for more document formats (DOCX, PPTX) - Implement hybrid search (keyword + vector) - Add user authentication and authorization - Build a web UI with streaming responses