Tools

Building a Safe AI Database Assistant with Azure OpenAI, LangChain & Function Calling

2025-12-19 0 views admin

Dev.to

🧩 Problem Statement ## 📊 Dataset ## 🧱 Architecture Overview ## 🔹 Part 1: Talking to Azure OpenAI via LangChain ## 🔹 Part 2: DataFrame Agent (CSV Analysis) ## DataFrame Tool ## Enforcing Tool Usage ## 🔹 Part 3: Moving from CSV → SQL (SQLite) ## 🔹 Part 4: SQL Agent with LangGraph ## 🔹 Part 5: Function Calling (No Raw SQL) ## Example Functions ## Function Registry (Critical!) ## 🔹 Part 6: Azure OpenAI Function Calling (No Assistant API) ## 🔹 Part 7: Assistant API (Persistent Context) ## Creating the Assistant ## Assistant Loop (Key Concept) ## 🧠 Key Takeaways ## ✅ What This Design Solves ## 🧩 Mental Model ## 🎯 When to Use What? ## 🚀 Final Thoughts ## Connect With Me From raw CSVs to a production-ready AI assistant that queries data safely — without hallucinating SQL. In this post, I’ll walk through how I built an AI-powered data analyst using: When working with AI models and databases, common problems include: Goal: Build an AI assistant that: We use the COVID all-states history dataset, which includes: The dataset is first used as: The model decides WHAT to do. Your backend decides HOW it is done. We start by connecting to Azure OpenAI using AzureChatOpenAI: A simple sanity check: We load the CSV into pandas and expose controlled computation via a tool. ⚠️ Note: In production, replace eval with a restricted execution layer. The prompt forces the model to: We convert the CSV into SQLite: Now the same dataset can be queried via SQL. Using LangGraph’s ReAct agent: The system prompt enforces: Instead of letting the model generate SQL, we define pre-approved backend functions. Using Chat Completions + functions: If the model calls a function: The assistant remembers conversation context, but never caches database results. This approach mirrors how real production AI systems are built: Let’s learn and build cool data science and AI projects together! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or CODE_BLOCK: User Question ↓ Azure OpenAI (Assistant / LangChain) ↓ Tool Selection (Function / SQL / DataFrame) ↓ Safe Backend Execution ↓ Result ↓ Final Explanation CODE_BLOCK: User Question ↓ Azure OpenAI (Assistant / LangChain) ↓ Tool Selection (Function / SQL / DataFrame) ↓ Safe Backend Execution ↓ Result ↓ Final Explanation CODE_BLOCK: User Question ↓ Azure OpenAI (Assistant / LangChain) ↓ Tool Selection (Function / SQL / DataFrame) ↓ Safe Backend Execution ↓ Result ↓ Final Explanation CODE_BLOCK: llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini" ) CODE_BLOCK: llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini" ) CODE_BLOCK: llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini" ) CODE_BLOCK: response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!") ]) print(response.content) CODE_BLOCK: response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!") ]) print(response.content) CODE_BLOCK: response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!") ]) print(response.content) COMMAND_BLOCK: @tool def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) COMMAND_BLOCK: @tool def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) COMMAND_BLOCK: @tool def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) CODE_BLOCK: llm_with_tools = llm.bind_tools([run_df]) CODE_BLOCK: llm_with_tools = llm.bind_tools([run_df]) CODE_BLOCK: llm_with_tools = llm.bind_tools([run_df]) CODE_BLOCK: engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False ) CODE_BLOCK: engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False ) CODE_BLOCK: engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False ) CODE_BLOCK: agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools() ) CODE_BLOCK: agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools() ) CODE_BLOCK: agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools() ) CODE_BLOCK: def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date, } CODE_BLOCK: FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date, } CODE_BLOCK: FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date, } CODE_BLOCK: response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto" ) CODE_BLOCK: response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto" ) CODE_BLOCK: response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto" ) CODE_BLOCK: assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions] ) CODE_BLOCK: assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions] ) CODE_BLOCK: assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions] ) COMMAND_BLOCK: while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break COMMAND_BLOCK: while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break COMMAND_BLOCK: while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break - Azure OpenAI - Function Calling - Analyze CSV data using pandas - Query a SQL database safely - Choose predefined backend functions automatically - Explain results clearly - Avoid hallucinations and unsafe SQL - ❌ Hallucinated SQL queries - ❌ Unsafe eval or raw SQL execution - ❌ No control over what the model can access - ❌ No explanation of how results were computed - Answers analytical questions about COVID data - Uses only allowed tools - Never guesses - Explains every answer - hospitalizedIncrease - positiveIncrease - A pandas DataFrame - A SQLite database - Use the tool - Perform actual pandas calculations - Explain results - Only valid tables - Only specific columns - No hallucinated values - Markdown-only output - ✅ Only allowed functions run - ❌ No arbitrary code execution - Extract arguments - Route via FUNCTION_MAP - Execute backend logic - Send result back - Get final grounded answer - Prevents SQL hallucinations - Enforces backend safety - Keeps AI answers grounded in data - Scales cleanly as tools grow - AI decides what - Backend controls how - Data remains authoritative - Explanations remain transparent - 💼 LinkedIn: https://www.linkedin.com/in/singla-khushi/ - 🔗 GitHub: https://github.com/KhushiSingla-tech - 📩 Comments below are always welcome!

🏷️ Tags

toolsutilitiessecurity toolsbuildingdatabaseassistantazureopenailangchainfunction

Building a Safe AI Database Assistant with Azure OpenAI, LangChain & Function Calling

🏷️ Tags

More from Tools

Tools: How We Generate AI Network Digests for MegaETH at MiniBlocks.io

Tools: How My AI Agent's Memory Created an Optimism Feedback Loop

Tools: Your Boss Can Read Your Mind Now: The AI Surveillance Explosion in the American Workplace

Tools: Surveillance Capitalism Is the Business Model of AI — And You're the Product

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting