Tools
Building a Safe AI Database Assistant with Azure OpenAI, LangChain & Function Calling
2025-12-19
0 views
admin
π§© Problem Statement ## π Dataset ## π§± Architecture Overview ## πΉ Part 1: Talking to Azure OpenAI via LangChain ## πΉ Part 2: DataFrame Agent (CSV Analysis) ## DataFrame Tool ## Enforcing Tool Usage ## πΉ Part 3: Moving from CSV β SQL (SQLite) ## πΉ Part 4: SQL Agent with LangGraph ## πΉ Part 5: Function Calling (No Raw SQL) ## Example Functions ## Function Registry (Critical!) ## πΉ Part 6: Azure OpenAI Function Calling (No Assistant API) ## πΉ Part 7: Assistant API (Persistent Context) ## Creating the Assistant ## Assistant Loop (Key Concept) ## π§ Key Takeaways ## β
What This Design Solves ## π§© Mental Model ## π― When to Use What? ## π Final Thoughts ## Connect With Me From raw CSVs to a production-ready AI assistant that queries data safely β without hallucinating SQL. In this post, Iβll walk through how I built an AI-powered data analyst using: When working with AI models and databases, common problems include: Goal: Build an AI assistant that: We use the COVID all-states history dataset, which includes: The dataset is first used as: The model decides WHAT to do. Your backend decides HOW it is done. We start by connecting to Azure OpenAI using AzureChatOpenAI: A simple sanity check: We load the CSV into pandas and expose controlled computation via a tool. β οΈ Note: In production, replace eval with a restricted execution layer. The prompt forces the model to: We convert the CSV into SQLite: Now the same dataset can be queried via SQL. Using LangGraphβs ReAct agent: The system prompt enforces: Instead of letting the model generate SQL, we define pre-approved backend functions. Using Chat Completions + functions: If the model calls a function: The assistant remembers conversation context, but never caches database results. This approach mirrors how real production AI systems are built: Letβs learn and build cool data science and AI projects together! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or CODE_BLOCK: User Question β Azure OpenAI (Assistant / LangChain) β Tool Selection (Function / SQL / DataFrame) β Safe Backend Execution β Result β Final Explanation CODE_BLOCK: User Question β Azure OpenAI (Assistant / LangChain) β Tool Selection (Function / SQL / DataFrame) β Safe Backend Execution β Result β Final Explanation CODE_BLOCK: User Question β Azure OpenAI (Assistant / LangChain) β Tool Selection (Function / SQL / DataFrame) β Safe Backend Execution β Result β Final Explanation CODE_BLOCK: llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini" ) CODE_BLOCK: llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini" ) CODE_BLOCK: llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini" ) CODE_BLOCK: response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!") ]) print(response.content) CODE_BLOCK: response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!") ]) print(response.content) CODE_BLOCK: response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!") ]) print(response.content) COMMAND_BLOCK: @tool def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) COMMAND_BLOCK: @tool def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) COMMAND_BLOCK: @tool def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) CODE_BLOCK: llm_with_tools = llm.bind_tools([run_df]) CODE_BLOCK: llm_with_tools = llm.bind_tools([run_df]) CODE_BLOCK: llm_with_tools = llm.bind_tools([run_df]) CODE_BLOCK: engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False ) CODE_BLOCK: engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False ) CODE_BLOCK: engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False ) CODE_BLOCK: agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools() ) CODE_BLOCK: agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools() ) CODE_BLOCK: agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools() ) CODE_BLOCK: def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK: FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date, } CODE_BLOCK: FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date, } CODE_BLOCK: FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date, } CODE_BLOCK: response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto" ) CODE_BLOCK: response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto" ) CODE_BLOCK: response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto" ) CODE_BLOCK: assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions] ) CODE_BLOCK: assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions] ) CODE_BLOCK: assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions] ) COMMAND_BLOCK: while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break COMMAND_BLOCK: while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break COMMAND_BLOCK: while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break - Azure OpenAI - Function Calling - Analyze CSV data using pandas - Query a SQL database safely - Choose predefined backend functions automatically - Explain results clearly - Avoid hallucinations and unsafe SQL - β Hallucinated SQL queries - β Unsafe eval or raw SQL execution - β No control over what the model can access - β No explanation of how results were computed - Answers analytical questions about COVID data - Uses only allowed tools - Never guesses - Explains every answer - hospitalizedIncrease - positiveIncrease - A pandas DataFrame - A SQLite database - Use the tool - Perform actual pandas calculations - Explain results - Only valid tables - Only specific columns - No hallucinated values - Markdown-only output - β
Only allowed functions run - β No arbitrary code execution - Extract arguments - Route via FUNCTION_MAP - Execute backend logic - Send result back - Get final grounded answer - Prevents SQL hallucinations - Enforces backend safety - Keeps AI answers grounded in data - Scales cleanly as tools grow - AI decides what - Backend controls how - Data remains authoritative - Explanations remain transparent - πΌ LinkedIn: https://www.linkedin.com/in/singla-khushi/ - π GitHub: https://github.com/KhushiSingla-tech - π© Comments below are always welcome!
toolsutilitiessecurity toolsbuildingdatabaseassistantazureopenailangchainfunction