# bifrost-config.yaml
providers: - name: openai-primary provider: openai model: gpt-4o weight: 70 api_key: ${OPENAI_API_KEY} - name: anthropic-fallback provider: anthropic model: claude-sonnet-4-20250514 weight: 30 api_key: ${ANTHROPIC_API_KEY} routing: strategy: weighted fallback: enabled: true max_retries: 2
# bifrost-config.yaml
providers: - name: openai-primary provider: openai model: gpt-4o weight: 70 api_key: ${OPENAI_API_KEY} - name: anthropic-fallback provider: anthropic model: claude-sonnet-4-20250514 weight: 30 api_key: ${ANTHROPIC_API_KEY} routing: strategy: weighted fallback: enabled: true max_retries: 2
# bifrost-config.yaml
providers: - name: openai-primary provider: openai model: gpt-4o weight: 70 api_key: ${OPENAI_API_KEY} - name: anthropic-fallback provider: anthropic model: claude-sonnet-4-20250514 weight: 30 api_key: ${ANTHROPIC_API_KEY} routing: strategy: weighted fallback: enabled: true max_retries: 2
npx -y @maximhq/bifrost
npx -y @maximhq/bifrost
npx -y @maximhq/bifrost
-weight: 500;">docker run -p 8080:8080 maximhq/bifrost
-weight: 500;">docker run -p 8080:8080 maximhq/bifrost
-weight: 500;">docker run -p 8080:8080 maximhq/bifrost
model_list: - model_name: gpt-4 litellm_params: model: openai/gpt-4 api_key: sk-xxx - model_name: gpt-4 litellm_params: model: azure/gpt-4 api_key: sk-yyy router_settings: routing_strategy: least-busy num_retries: 3
model_list: - model_name: gpt-4 litellm_params: model: openai/gpt-4 api_key: sk-xxx - model_name: gpt-4 litellm_params: model: azure/gpt-4 api_key: sk-yyy router_settings: routing_strategy: least-busy num_retries: 3
model_list: - model_name: gpt-4 litellm_params: model: openai/gpt-4 api_key: sk-xxx - model_name: gpt-4 litellm_params: model: azure/gpt-4 api_key: sk-yyy router_settings: routing_strategy: least-busy num_retries: 3 - Failover: When OpenAI returns 429s or 500s, traffic should automatically shift to Anthropic or another provider. No manual intervention.
- Weighted distribution: Split traffic 70/30 across providers for cost optimization or A/B testing model quality.
- Latency-based routing: Send requests to whichever provider responds fastest at that moment.
- Budget-aware routing: Stop sending traffic to a provider when your spend cap is hit.