Tools
How Bifrost Integrates With Your Existing LLM Stack (No Refactoring Required)
2025-12-12
0 views
admin
The Problem ## OpenAI-Compatible API ## Before ## Works With Every Major Framework ## LangChain ## LlamaIndex ## LiteLLM ## Anthropic SDK ## Multiple Providers, One Interface ## Configuration ## Your code ## Built-In Observability Integration ## Maxim AI ## Prometheus ## OpenTelemetry ## Framework-Specific Integrations ## Claude Code ## LibreChat ## MCP (Model Context Protocol) Support ## Configure MCP servers ## Deployment Options ## Docker ## Docker Compose ## Kubernetes ## Real Integration Example ## Before (Direct OpenAI) ## After (Through Bifrost) ## Migration Checklist ## 1. Install Bifrost ## 2. Add API keys ## 3. Update base URL ## 4. Test one request ## 5. Deploy ## Try It Yourself ## The Bottom Line You’ve built your LLM application. It works. Now you want better observability, load balancing, or caching. Most solutions require: We built Bifrost to be different: drop it in, change one URL, done. Bifrost speaks OpenAI’s API format. If your code works with OpenAI, it works with Bifrost. One line changed. That’s it. Because Bifrost is OpenAI-compatible, it works with any framework that supports OpenAI. Same pattern everywhere: change the base URL, keep everything else. Bifrost routes to multiple providers through the same API. Switch providers by changing the model name. No refactoring required. Bifrost integrates with observability platforms out of the box. Every request is automatically traced to the Maxim dashboard. Zero instrumentation code. Metrics exposed at /metrics. Plug into your existing Prometheus setup. Standard OTLP export to any OpenTelemetry collector. Update your Claude Code config: All Claude Code requests now flow through Bifrost. Track token usage, costs, and cache responses automatically. Add to librechat.yaml: Universal model access across all configured providers. Bifrost supports MCP for tool calling and context management. Your LLM calls automatically gain access to MCP tools. No manual tool definitions required. Terraform examples are available in the docs. One line changed. All features enabled. Verify it works and check the dashboard. Everything else stays the same. Total migration time: ~10 minutes. Full integration examples for LangChain, LiteLLM, and more are available in the GitHub repo. Bifrost integrates with your existing stack in minutes: No refactoring. No new SDKs. Just drop it in. Built by the team at Maxim AI — we also build evaluation and observability tools for production AI agents. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
import openai openai.api_key = "sk-..." response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}]
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
import openai openai.api_key = "sk-..." response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}]
) CODE_BLOCK:
import openai openai.api_key = "sk-..." response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}]
) COMMAND_BLOCK:
import openai openai.api_base = "http://localhost:8080/openai" # Only change
openai.api_key = "sk-..." # Your actual API key response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}]
) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
import openai openai.api_base = "http://localhost:8080/openai" # Only change
openai.api_key = "sk-..." # Your actual API key response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}]
) COMMAND_BLOCK:
import openai openai.api_base = "http://localhost:8080/openai" # Only change
openai.api_key = "sk-..." # Your actual API key response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}]
) CODE_BLOCK:
from langchain.chat_models import ChatOpenAI llm = ChatOpenAI( openai_api_base="http://localhost:8080/langchain", openai_api_key="sk-..."
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
from langchain.chat_models import ChatOpenAI llm = ChatOpenAI( openai_api_base="http://localhost:8080/langchain", openai_api_key="sk-..."
) CODE_BLOCK:
from langchain.chat_models import ChatOpenAI llm = ChatOpenAI( openai_api_base="http://localhost:8080/langchain", openai_api_key="sk-..."
) CODE_BLOCK:
from llama_index.llms import OpenAI llm = OpenAI( api_base="http://localhost:8080/openai", api_key="sk-..."
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
from llama_index.llms import OpenAI llm = OpenAI( api_base="http://localhost:8080/openai", api_key="sk-..."
) CODE_BLOCK:
from llama_index.llms import OpenAI llm = OpenAI( api_base="http://localhost:8080/openai", api_key="sk-..."
) CODE_BLOCK:
import litellm response = litellm.completion( model="gpt-4", messages=[{"role": "user", "content": "Hello"}], base_url="http://localhost:8080/litellm"
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
import litellm response = litellm.completion( model="gpt-4", messages=[{"role": "user", "content": "Hello"}], base_url="http://localhost:8080/litellm"
) CODE_BLOCK:
import litellm response = litellm.completion( model="gpt-4", messages=[{"role": "user", "content": "Hello"}], base_url="http://localhost:8080/litellm"
) CODE_BLOCK:
import anthropic client = anthropic.Anthropic( base_url="http://localhost:8080/anthropic", api_key="sk-ant-..."
) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
import anthropic client = anthropic.Anthropic( base_url="http://localhost:8080/anthropic", api_key="sk-ant-..."
) CODE_BLOCK:
import anthropic client = anthropic.Anthropic( base_url="http://localhost:8080/anthropic", api_key="sk-ant-..."
) CODE_BLOCK:
{ "providers": [ { "name": "openai", "api_key": "sk-...", "models": ["gpt-4", "gpt-4o-mini"] }, { "name": "anthropic", "api_key": "sk-ant-...", "models": ["claude-sonnet-4", "claude-opus-4"] }, { "name": "azure", "api_key": "...", "endpoint": "https://your-resource.openai.azure.com" } ]
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
{ "providers": [ { "name": "openai", "api_key": "sk-...", "models": ["gpt-4", "gpt-4o-mini"] }, { "name": "anthropic", "api_key": "sk-ant-...", "models": ["claude-sonnet-4", "claude-opus-4"] }, { "name": "azure", "api_key": "...", "endpoint": "https://your-resource.openai.azure.com" } ]
} CODE_BLOCK:
{ "providers": [ { "name": "openai", "api_key": "sk-...", "models": ["gpt-4", "gpt-4o-mini"] }, { "name": "anthropic", "api_key": "sk-ant-...", "models": ["claude-sonnet-4", "claude-opus-4"] }, { "name": "azure", "api_key": "...", "endpoint": "https://your-resource.openai.azure.com" } ]
} COMMAND_BLOCK:
# OpenAI
response = client.chat.completions.create( model="gpt-4", # Routes to OpenAI messages=[...]
) # Anthropic (same code structure)
response = client.chat.completions.create( model="anthropic/claude-sonnet-4", # Routes to Anthropic messages=[...]
) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# OpenAI
response = client.chat.completions.create( model="gpt-4", # Routes to OpenAI messages=[...]
) # Anthropic (same code structure)
response = client.chat.completions.create( model="anthropic/claude-sonnet-4", # Routes to Anthropic messages=[...]
) COMMAND_BLOCK:
# OpenAI
response = client.chat.completions.create( model="gpt-4", # Routes to OpenAI messages=[...]
) # Anthropic (same code structure)
response = client.chat.completions.create( model="anthropic/claude-sonnet-4", # Routes to Anthropic messages=[...]
) CODE_BLOCK:
{ "plugins": [ { "name": "maxim", "config": { "api_key": "your-maxim-key", "repo_id": "your-repo-id" } } ]
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
{ "plugins": [ { "name": "maxim", "config": { "api_key": "your-maxim-key", "repo_id": "your-repo-id" } } ]
} CODE_BLOCK:
{ "plugins": [ { "name": "maxim", "config": { "api_key": "your-maxim-key", "repo_id": "your-repo-id" } } ]
} CODE_BLOCK:
{ "metrics": { "enabled": true, "port": 9090 }
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
{ "metrics": { "enabled": true, "port": 9090 }
} CODE_BLOCK:
{ "metrics": { "enabled": true, "port": 9090 }
} CODE_BLOCK:
{ "otel": { "enabled": true, "endpoint": "http://your-collector:4318" }
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
{ "otel": { "enabled": true, "endpoint": "http://your-collector:4318" }
} CODE_BLOCK:
{ "otel": { "enabled": true, "endpoint": "http://your-collector:4318" }
} CODE_BLOCK:
{ "baseURL": "http://localhost:8080/openai", "provider": "anthropic"
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
{ "baseURL": "http://localhost:8080/openai", "provider": "anthropic"
} CODE_BLOCK:
{ "baseURL": "http://localhost:8080/openai", "provider": "anthropic"
} CODE_BLOCK:
custom: - name: "Bifrost" apiKey: "dummy" baseURL: "http://localhost:8080/v1" models: default: ["openai/gpt-4o"] Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
custom: - name: "Bifrost" apiKey: "dummy" baseURL: "http://localhost:8080/v1" models: default: ["openai/gpt-4o"] CODE_BLOCK:
custom: - name: "Bifrost" apiKey: "dummy" baseURL: "http://localhost:8080/v1" models: default: ["openai/gpt-4o"] CODE_BLOCK:
{ "mcp": { "servers": [ { "name": "filesystem", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem"] }, { "name": "brave-search", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-brave-search"], "env": { "BRAVE_API_KEY": "your-key" } } ] }
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
{ "mcp": { "servers": [ { "name": "filesystem", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem"] }, { "name": "brave-search", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-brave-search"], "env": { "BRAVE_API_KEY": "your-key" } } ] }
} CODE_BLOCK:
{ "mcp": { "servers": [ { "name": "filesystem", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem"] }, { "name": "brave-search", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-brave-search"], "env": { "BRAVE_API_KEY": "your-key" } } ] }
} COMMAND_BLOCK:
docker run -p 8080:8080 \ -e OPENAI_API_KEY=sk-... \ maximhq/bifrost:latest Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
docker run -p 8080:8080 \ -e OPENAI_API_KEY=sk-... \ maximhq/bifrost:latest COMMAND_BLOCK:
docker run -p 8080:8080 \ -e OPENAI_API_KEY=sk-... \ maximhq/bifrost:latest CODE_BLOCK:
services: bifrost: image: maximhq/bifrost:latest ports: - "8080:8080" environment: - OPENAI_API_KEY=sk-... volumes: - ./data:/app/data Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
services: bifrost: image: maximhq/bifrost:latest ports: - "8080:8080" environment: - OPENAI_API_KEY=sk-... volumes: - ./data:/app/data CODE_BLOCK:
services: bifrost: image: maximhq/bifrost:latest ports: - "8080:8080" environment: - OPENAI_API_KEY=sk-... volumes: - ./data:/app/data CODE_BLOCK:
apiVersion: apps/v1
kind: Deployment
metadata: name: bifrost
spec: replicas: 3 template: spec: containers: - name: bifrost image: maximhq/bifrost:latest ports: - containerPort: 8080 Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
apiVersion: apps/v1
kind: Deployment
metadata: name: bifrost
spec: replicas: 3 template: spec: containers: - name: bifrost image: maximhq/bifrost:latest ports: - containerPort: 8080 CODE_BLOCK:
apiVersion: apps/v1
kind: Deployment
metadata: name: bifrost
spec: replicas: 3 template: spec: containers: - name: bifrost image: maximhq/bifrost:latest ports: - containerPort: 8080 COMMAND_BLOCK:
import openai
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent openai.api_key = "sk-..." llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm) # No observability
# No caching
# No load balancing
# No failover Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
import openai
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent openai.api_key = "sk-..." llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm) # No observability
# No caching
# No load balancing
# No failover COMMAND_BLOCK:
import openai
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent openai.api_key = "sk-..." llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm) # No observability
# No caching
# No load balancing
# No failover COMMAND_BLOCK:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent llm = ChatOpenAI( model="gpt-4", openai_api_base="http://localhost:8080/langchain"
) agent = initialize_agent(tools, llm) # Automatic observability ✓
# Semantic caching ✓
# Multi-key load balancing ✓
# Provider failover ✓ Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent llm = ChatOpenAI( model="gpt-4", openai_api_base="http://localhost:8080/langchain"
) agent = initialize_agent(tools, llm) # Automatic observability ✓
# Semantic caching ✓
# Multi-key load balancing ✓
# Provider failover ✓ COMMAND_BLOCK:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent llm = ChatOpenAI( model="gpt-4", openai_api_base="http://localhost:8080/langchain"
) agent = initialize_agent(tools, llm) # Automatic observability ✓
# Semantic caching ✓
# Multi-key load balancing ✓
# Provider failover ✓ COMMAND_BLOCK:
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost COMMAND_BLOCK:
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost CODE_BLOCK:
openai.api_base = "http://localhost:8080/openai" Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
openai.api_base = "http://localhost:8080/openai" CODE_BLOCK:
openai.api_base = "http://localhost:8080/openai" CODE_BLOCK:
openai_api_base = "http://localhost:8080/langchain" Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
openai_api_base = "http://localhost:8080/langchain" CODE_BLOCK:
openai_api_base = "http://localhost:8080/langchain" COMMAND_BLOCK:
git clone https://github.com/maximhq/bifrost
cd bifrost
docker compose up Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
git clone https://github.com/maximhq/bifrost
cd bifrost
docker compose up COMMAND_BLOCK:
git clone https://github.com/maximhq/bifrost
cd bifrost
docker compose up - Rewriting your API calls
- Learning new SDKs
- Refactoring working code
- Testing everything again - Visit http://localhost:8080
- Add your provider keys - OpenAI-compatible API (works everywhere)
- Change one URL, keep all your code
- Multi-provider support through one interface
- Built-in observability with zero instrumentation
how-totutorialguidedev.toaimlopenaillmgptserverswitchdockerkubernetesterraformgit