Tools: How to Dynamically Switch Local LLMs with LangChain

Tools: How to Dynamically Switch Local LLMs with LangChain

Source: Dev.to

The Architecture ## The Code Implementation In production RAG applications, you often need to evaluate different models against the same prompts. Does Llama 3.2 handle formatting better? Is Gemma 3 (12B) better at reasoning? Hardcoding model swaps or reloading your application is slow. In this post, I'll show you how to use LangChain's configurable_alternatives to build a Streamlit app where you can hot-swap models per request. We want a User Interface (Streamlit) that passes a configuration object to our Logic Layer (LangChain). The Logic Layer then routes the request to the appropriate Model Provider (Ollama). The core magic lies in the configurable_alternatives method on any Runnable (including LLMs). This approach decouples your Chain Definition from your Execution Configuration. You build the pipeline once, and modify its behavior at runtime. This is essential for: By combining Streamlit for the frontend and LangChain's LCEL for dynamic routing, we built a robust "Model Playground" in under 50 lines of code. Github repo is here: https://github.com/harishkotra/langchain-ollama-cookbook Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: graph TD User[User via Streamlit] -->|Selects Model| UI[UI Config] UI -->|config='gemma'| Chain[LangChain Runnable] subgraph "Swappable Backend" Chain -->|Default| Llama[Llama 3.2] Chain -->|Alternative| Gemma[Gemma 3 (12B)] end Llama --> Output Gemma --> Output Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: graph TD User[User via Streamlit] -->|Selects Model| UI[UI Config] UI -->|config='gemma'| Chain[LangChain Runnable] subgraph "Swappable Backend" Chain -->|Default| Llama[Llama 3.2] Chain -->|Alternative| Gemma[Gemma 3 (12B)] end Llama --> Output Gemma --> Output COMMAND_BLOCK: graph TD User[User via Streamlit] -->|Selects Model| UI[UI Config] UI -->|config='gemma'| Chain[LangChain Runnable] subgraph "Swappable Backend" Chain -->|Default| Llama[Llama 3.2] Chain -->|Alternative| Gemma[Gemma 3 (12B)] end Llama --> Output Gemma --> Output COMMAND_BLOCK: from langchain_ollama import OllamaLLM from langchain_core.runnables import ConfigurableField # 1. Base Model llm = OllamaLLM(model="llama3.2") # 2. Add Alternatives llm_swappable = llm.configurable_alternatives( ConfigurableField(id="model_provider"), default_key="llama", gemma=OllamaLLM(model="gemma3:12b") ) # 3. Invoke with Config response = llm_swappable.invoke( "Why is the sky blue?", config={"configurable": {"model_provider": "gemma"}} ) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: from langchain_ollama import OllamaLLM from langchain_core.runnables import ConfigurableField # 1. Base Model llm = OllamaLLM(model="llama3.2") # 2. Add Alternatives llm_swappable = llm.configurable_alternatives( ConfigurableField(id="model_provider"), default_key="llama", gemma=OllamaLLM(model="gemma3:12b") ) # 3. Invoke with Config response = llm_swappable.invoke( "Why is the sky blue?", config={"configurable": {"model_provider": "gemma"}} ) COMMAND_BLOCK: from langchain_ollama import OllamaLLM from langchain_core.runnables import ConfigurableField # 1. Base Model llm = OllamaLLM(model="llama3.2") # 2. Add Alternatives llm_swappable = llm.configurable_alternatives( ConfigurableField(id="model_provider"), default_key="llama", gemma=OllamaLLM(model="gemma3:12b") ) # 3. Invoke with Config response = llm_swappable.invoke( "Why is the sky blue?", config={"configurable": {"model_provider": "gemma"}} ) - A/B Testing: Randomly routing 50% of traffic to a new model. - User Preference: Letting power users choose their "engine". - Fallback: If the primary model times out, switch to a backup.