Tools
Building a Safe AI Database Assistant with Azure OpenAI, LangChain & Function Calling
2025-12-19
0 views
admin
π§© Problem Statement ## π Dataset ## π§± Architecture Overview ## πΉ Part 1: Talking to Azure OpenAI via LangChain ## πΉ Part 2: DataFrame Agent (CSV Analysis) ## DataFrame Tool ## Enforcing Tool Usage ## πΉ Part 3: Moving from CSV β SQL (SQLite) ## πΉ Part 4: SQL Agent with LangGraph ## πΉ Part 5: Function Calling (No Raw SQL) ## Example Functions ## Function Registry (Critical!) ## πΉ Part 6: Azure OpenAI Function Calling (No Assistant API) ## πΉ Part 7: Assistant API (Persistent Context) ## Creating the Assistant ## Assistant Loop (Key Concept) ## π§ Key Takeaways ## β
What This Design Solves ## π§© Mental Model ## π― When to Use What? ## π Final Thoughts ## Connect With Me From raw CSVs to a production-ready AI assistant that queries data safely β without hallucinating SQL. In this post, Iβll walk through how I built an AI-powered data analyst using: When working with AI models and databases, common problems include: Goal:
Build an AI assistant that: We use the COVID all-states history dataset, which includes: The dataset is first used as: The model decides WHAT to do.
Your backend decides HOW it is done. We start by connecting to Azure OpenAI using AzureChatOpenAI: A simple sanity check: We load the CSV into pandas and expose controlled computation via a tool. β οΈ Note: In production, replace eval with a restricted execution layer. The prompt forces the model to: We convert the CSV into SQLite: Now the same dataset can be queried via SQL. Using LangGraphβs ReAct agent: The system prompt enforces: Instead of letting the model generate SQL, we define pre-approved backend functions. Using Chat Completions + functions: If the model calls a function: The assistant remembers conversation context,
but never caches database results. This approach mirrors how real production AI systems are built: Letβs learn and build cool data science and AI projects together! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
User Question β
Azure OpenAI (Assistant / LangChain) β
Tool Selection (Function / SQL / DataFrame) β
Safe Backend Execution β
Result β
Final Explanation Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
User Question β
Azure OpenAI (Assistant / LangChain) β
Tool Selection (Function / SQL / DataFrame) β
Safe Backend Execution β
Result β
Final Explanation CODE_BLOCK:
User Question β
Azure OpenAI (Assistant / LangChain) β
Tool Selection (Function / SQL / DataFrame) β
Safe Backend Execution β
Result β
Final Explanation CODE_BLOCK:
llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini"
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini"
) CODE_BLOCK:
llm = AzureChatOpenAI( azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/", api_key="YOUR_API_KEY", api_version="2024-12-01-preview", model="gpt-4o-mini"
) CODE_BLOCK:
response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!")
])
print(response.content) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!")
])
print(response.content) CODE_BLOCK:
response = llm.invoke([ HumanMessage(content="Hello, Azure OpenAI via LangChain!")
])
print(response.content) COMMAND_BLOCK:
@tool
def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
@tool
def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) COMMAND_BLOCK:
@tool
def run_df(query: str) -> str: """Run Python code on the global dataframe `df` and return the result.""" return str(eval(query)) CODE_BLOCK:
llm_with_tools = llm.bind_tools([run_df]) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
llm_with_tools = llm.bind_tools([run_df]) CODE_BLOCK:
llm_with_tools = llm.bind_tools([run_df]) CODE_BLOCK:
engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False
) CODE_BLOCK:
engine = create_engine("sqlite:///./db/test.db") df.to_sql( name="all_states_history", con=engine, if_exists="replace", index=False
) CODE_BLOCK:
agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools()
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools()
) CODE_BLOCK:
agent_executor_SQL = create_react_agent( model=llm, tools=toolkit.get_tools()
) CODE_BLOCK:
def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK:
def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK:
def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK:
def get_positive_cases_for_state_on_date(state_abbr, specific_date): ... CODE_BLOCK:
FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,
} CODE_BLOCK:
FUNCTION_MAP = { "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date, "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,
} CODE_BLOCK:
response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto"
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto"
) CODE_BLOCK:
response = client.chat.completions.create( model="gpt-4o-mini", messages=messages, functions=functions, function_call="auto"
) CODE_BLOCK:
assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions]
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions]
) CODE_BLOCK:
assistant = client.beta.assistants.create( name="Covid Data Assistant", model="gpt-4o-mini", tools=[{"type": "function", "function": fn} for fn in functions]
) COMMAND_BLOCK:
while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break COMMAND_BLOCK:
while True: run_status = client.beta.threads.runs.retrieve(...) if run_status.status == "requires_action": # extract function name # dispatch via FUNCTION_MAP # submit tool output elif run_status.status == "completed": break - Azure OpenAI
- Function Calling - Analyze CSV data using pandas
- Query a SQL database safely
- Choose predefined backend functions automatically
- Explain results clearly
- Avoid hallucinations and unsafe SQL - β Hallucinated SQL queries
- β Unsafe eval or raw SQL execution
- β No control over what the model can access
- β No explanation of how results were computed - Answers analytical questions about COVID data
- Uses only allowed tools
- Never guesses
- Explains every answer - hospitalizedIncrease
- positiveIncrease - A pandas DataFrame
- A SQLite database - Use the tool
- Perform actual pandas calculations
- Explain results - Only valid tables
- Only specific columns
- No hallucinated values
- Markdown-only output - β
Only allowed functions run
- β No arbitrary code execution - Extract arguments
- Route via FUNCTION_MAP
- Execute backend logic
- Send result back
- Get final grounded answer - Prevents SQL hallucinations
- Enforces backend safety
- Keeps AI answers grounded in data
- Scales cleanly as tools grow - AI decides what
- Backend controls how
- Data remains authoritative
- Explanations remain transparent - πΌ LinkedIn: https://www.linkedin.com/in/singla-khushi/
- π GitHub: https://github.com/KhushiSingla-tech
- π© Comments below are always welcome!
how-totutorialguidedev.toaiopenaillmgptpythondatabasegitgithub