Tools

Tools: The Gateway Pattern - One API, Any Model

2026-03-04 0 views admin

Tools: The Gateway Pattern - One API, Any Model

Source: Dev.to

Why Not Just Use LiteLLM? ## The Gateway Architecture ## Provider Adapter Pattern ## The Gateway Lambda ## Fallback Chain Implementation ## Streaming with Server-Sent Events ## Configuration-Driven Provider Selection ## Cost Tracking Built-In ## Real-World Usage ## What This Solved ## Example Implementation After our Lambda approach fell apart, I needed a new architecture. Something that could handle any AI provider through one clean API. Something that could stream properly. Something that wouldn't fight us at every turn. The solution was the gateway pattern. One API endpoint that normalizes requests across all providers. Send the same JSON payload whether you're using OpenAI, Claude, or Bedrock. The gateway handles provider-specific formatting, retries, fallbacks, and streaming. Before building our own, I seriously considered LiteLLM. It's a clever proxy that does exactly this - normalize API calls across providers. But we had specific needs: LiteLLM is great, but it's designed as a general proxy. We needed something purpose-built for our AWS-native architecture. Here's the high-level architecture: The key insight: Lambda is perfect for the gateway logic. It's a short-lived proxy that routes requests and formats responses. The actual AI processing happens in the providers' infrastructure, not in Lambda. Every AI provider has different request/response formats. The adapter pattern lets us normalize them behind a common interface: Now each provider implements this interface: The Bedrock adapter looks similar but handles AWS SDK authentication and different model naming conventions: The main Lambda function routes requests to the appropriate provider: One of the most powerful features is automatic fallback. If one provider fails, we try the next: This saved us multiple times when OpenAI had outages or rate limits. Requests automatically failed over to Claude or Bedrock without any client changes. The streaming implementation was tricky but crucial. API Gateway supports server-sent events (SSE), but you have to format responses correctly: The best part: you can switch providers with just configuration. No code changes: When OpenAI raised prices, we updated the config to prefer Claude. When Bedrock added new models, we added them to the fallback chain. Zero downtime, zero code changes. Every request gets tracked automatically: This gives us real-time visibility into AI spending. We can track costs per user, per feature, per model. When Claude released Haiku (their cheaper model), we could immediately see the cost savings. Here's how clients use the gateway: The gateway pattern eliminated our major pain points: Most importantly, it let us focus on building features instead of fighting integration complexity. The real test came during OpenAI's major outage in December. Our gateway automatically failed over to Claude for all requests. Users didn't even notice. That's when I knew we'd built something valuable. You can see the complete gateway implementation in our examples repo at https://github.com/tysoncung/ai-platform-aws-examples/tree/main/01-multi-provider-gateway. It includes: In the next article, we'll dive into RAG (Retrieval Augmented Generation) and show you how to build a document search pipeline that actually works in production. Most RAG tutorials use toy examples that break on real documents - we'll show you how to handle the edge cases. This is part 3 of an 8-part series on building a production AI platform. Find the complete code examples at https://github.com/tysoncung/ai-platform-aws-examples. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: Client Request | API Gateway (with CORS) | Lambda Gateway (routing + auth) | Provider Adapter (OpenAI | Anthropic | Bedrock) | AI Service | Streaming Response (SSE) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Client Request | API Gateway (with CORS) | Lambda Gateway (routing + auth) | Provider Adapter (OpenAI | Anthropic | Bedrock) | AI Service | Streaming Response (SSE) CODE_BLOCK: Client Request | API Gateway (with CORS) | Lambda Gateway (routing + auth) | Provider Adapter (OpenAI | Anthropic | Bedrock) | AI Service | Streaming Response (SSE) CODE_BLOCK: // Base interface all providers must implement interface AIProvider { name: string; supportsStreaming: boolean; createChatCompletion( request: NormalizedChatRequest ): Promise<NormalizedChatResponse>; createStreamingCompletion( request: NormalizedChatRequest ): AsyncGenerator<NormalizedStreamChunk>; } // Normalized request format (what clients send) interface NormalizedChatRequest { model: string; messages: Array<{ role: 'user' | 'assistant' | 'system'; content: string; }>; maxTokens?: number; temperature?: number; stream?: boolean; } // Normalized response format (what clients receive) interface NormalizedChatResponse { id: string; provider: string; model: string; content: string; usage: { promptTokens: number; completionTokens: number; totalTokens: number; }; cost: number; } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: // Base interface all providers must implement interface AIProvider { name: string; supportsStreaming: boolean; createChatCompletion( request: NormalizedChatRequest ): Promise<NormalizedChatResponse>; createStreamingCompletion( request: NormalizedChatRequest ): AsyncGenerator<NormalizedStreamChunk>; } // Normalized request format (what clients send) interface NormalizedChatRequest { model: string; messages: Array<{ role: 'user' | 'assistant' | 'system'; content: string; }>; maxTokens?: number; temperature?: number; stream?: boolean; } // Normalized response format (what clients receive) interface NormalizedChatResponse { id: string; provider: string; model: string; content: string; usage: { promptTokens: number; completionTokens: number; totalTokens: number; }; cost: number; } CODE_BLOCK: // Base interface all providers must implement interface AIProvider { name: string; supportsStreaming: boolean; createChatCompletion( request: NormalizedChatRequest ): Promise<NormalizedChatResponse>; createStreamingCompletion( request: NormalizedChatRequest ): AsyncGenerator<NormalizedStreamChunk>; } // Normalized request format (what clients send) interface NormalizedChatRequest { model: string; messages: Array<{ role: 'user' | 'assistant' | 'system'; content: string; }>; maxTokens?: number; temperature?: number; stream?: boolean; } // Normalized response format (what clients receive) interface NormalizedChatResponse { id: string; provider: string; model: string; content: string; usage: { promptTokens: number; completionTokens: number; totalTokens: number; }; cost: number; } COMMAND_BLOCK: // OpenAI adapter export class OpenAIProvider implements AIProvider { name = 'openai'; supportsStreaming = true; constructor(private apiKey: string) {} async createChatCompletion(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const openai = new OpenAI({ apiKey: this.apiKey }); const response = await openai.chat.completions.create({ model: request.model, messages: request.messages, max_tokens: request.maxTokens, temperature: request.temperature, }); return { id: response.id, provider: 'openai', model: response.model, content: response.choices[0].message.content || '', usage: { promptTokens: response.usage?.prompt_tokens || 0, completionTokens: response.usage?.completion_tokens || 0, totalTokens: response.usage?.total_tokens || 0, }, cost: this.calculateCost(request.model, response.usage), }; } async* createStreamingCompletion(request: NormalizedChatRequest): AsyncGenerator<NormalizedStreamChunk> { const openai = new OpenAI({ apiKey: this.apiKey }); const stream = await openai.chat.completions.create({ model: request.model, messages: request.messages, max_tokens: request.maxTokens, temperature: request.temperature, stream: true, }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content || ''; if (content) { yield { id: chunk.id, content, finished: chunk.choices[0]?.finish_reason !== null, }; } } } private calculateCost(model: string, usage: any): number { const rates = { 'gpt-4': { prompt: 0.03, completion: 0.06 }, 'gpt-3.5-turbo': { prompt: 0.001, completion: 0.002 }, }; const rate = rates[model] || rates['gpt-3.5-turbo']; return (usage.prompt_tokens * rate.prompt + usage.completion_tokens * rate.completion) / 1000; } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: // OpenAI adapter export class OpenAIProvider implements AIProvider { name = 'openai'; supportsStreaming = true; constructor(private apiKey: string) {} async createChatCompletion(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const openai = new OpenAI({ apiKey: this.apiKey }); const response = await openai.chat.completions.create({ model: request.model, messages: request.messages, max_tokens: request.maxTokens, temperature: request.temperature, }); return { id: response.id, provider: 'openai', model: response.model, content: response.choices[0].message.content || '', usage: { promptTokens: response.usage?.prompt_tokens || 0, completionTokens: response.usage?.completion_tokens || 0, totalTokens: response.usage?.total_tokens || 0, }, cost: this.calculateCost(request.model, response.usage), }; } async* createStreamingCompletion(request: NormalizedChatRequest): AsyncGenerator<NormalizedStreamChunk> { const openai = new OpenAI({ apiKey: this.apiKey }); const stream = await openai.chat.completions.create({ model: request.model, messages: request.messages, max_tokens: request.maxTokens, temperature: request.temperature, stream: true, }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content || ''; if (content) { yield { id: chunk.id, content, finished: chunk.choices[0]?.finish_reason !== null, }; } } } private calculateCost(model: string, usage: any): number { const rates = { 'gpt-4': { prompt: 0.03, completion: 0.06 }, 'gpt-3.5-turbo': { prompt: 0.001, completion: 0.002 }, }; const rate = rates[model] || rates['gpt-3.5-turbo']; return (usage.prompt_tokens * rate.prompt + usage.completion_tokens * rate.completion) / 1000; } } COMMAND_BLOCK: // OpenAI adapter export class OpenAIProvider implements AIProvider { name = 'openai'; supportsStreaming = true; constructor(private apiKey: string) {} async createChatCompletion(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const openai = new OpenAI({ apiKey: this.apiKey }); const response = await openai.chat.completions.create({ model: request.model, messages: request.messages, max_tokens: request.maxTokens, temperature: request.temperature, }); return { id: response.id, provider: 'openai', model: response.model, content: response.choices[0].message.content || '', usage: { promptTokens: response.usage?.prompt_tokens || 0, completionTokens: response.usage?.completion_tokens || 0, totalTokens: response.usage?.total_tokens || 0, }, cost: this.calculateCost(request.model, response.usage), }; } async* createStreamingCompletion(request: NormalizedChatRequest): AsyncGenerator<NormalizedStreamChunk> { const openai = new OpenAI({ apiKey: this.apiKey }); const stream = await openai.chat.completions.create({ model: request.model, messages: request.messages, max_tokens: request.maxTokens, temperature: request.temperature, stream: true, }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content || ''; if (content) { yield { id: chunk.id, content, finished: chunk.choices[0]?.finish_reason !== null, }; } } } private calculateCost(model: string, usage: any): number { const rates = { 'gpt-4': { prompt: 0.03, completion: 0.06 }, 'gpt-3.5-turbo': { prompt: 0.001, completion: 0.002 }, }; const rate = rates[model] || rates['gpt-3.5-turbo']; return (usage.prompt_tokens * rate.prompt + usage.completion_tokens * rate.completion) / 1000; } } COMMAND_BLOCK: // Bedrock adapter export class BedrockProvider implements AIProvider { name = 'bedrock'; supportsStreaming = true; constructor(private region: string = 'us-east-1') {} async createChatCompletion(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const client = new BedrockRuntimeClient({ region: this.region }); // Bedrock has different request formats per model const modelId = this.mapModelName(request.model); const body = this.formatBedrockRequest(modelId, request); const command = new InvokeModelCommand({ modelId, body: JSON.stringify(body), }); const response = await client.send(command); const result = JSON.parse(new TextDecoder().decode(response.body)); return this.formatBedrockResponse(result, modelId); } private mapModelName(model: string): string { const modelMap = { 'claude-3-sonnet': 'anthropic.claude-3-sonnet-20240229-v1:0', 'claude-3-haiku': 'anthropic.claude-3-haiku-20240307-v1:0', 'claude-3.5-sonnet': 'anthropic.claude-3-5-sonnet-20240620-v1:0', }; return modelMap[model] || model; } private formatBedrockRequest(modelId: string, request: NormalizedChatRequest): any { if (modelId.includes('anthropic')) { return { anthropic_version: 'bedrock-2023-05-31', max_tokens: request.maxTokens || 4096, temperature: request.temperature || 0.7, messages: request.messages, }; } // Handle other model families... throw new Error(`Unsupported model: ${modelId}`); } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: // Bedrock adapter export class BedrockProvider implements AIProvider { name = 'bedrock'; supportsStreaming = true; constructor(private region: string = 'us-east-1') {} async createChatCompletion(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const client = new BedrockRuntimeClient({ region: this.region }); // Bedrock has different request formats per model const modelId = this.mapModelName(request.model); const body = this.formatBedrockRequest(modelId, request); const command = new InvokeModelCommand({ modelId, body: JSON.stringify(body), }); const response = await client.send(command); const result = JSON.parse(new TextDecoder().decode(response.body)); return this.formatBedrockResponse(result, modelId); } private mapModelName(model: string): string { const modelMap = { 'claude-3-sonnet': 'anthropic.claude-3-sonnet-20240229-v1:0', 'claude-3-haiku': 'anthropic.claude-3-haiku-20240307-v1:0', 'claude-3.5-sonnet': 'anthropic.claude-3-5-sonnet-20240620-v1:0', }; return modelMap[model] || model; } private formatBedrockRequest(modelId: string, request: NormalizedChatRequest): any { if (modelId.includes('anthropic')) { return { anthropic_version: 'bedrock-2023-05-31', max_tokens: request.maxTokens || 4096, temperature: request.temperature || 0.7, messages: request.messages, }; } // Handle other model families... throw new Error(`Unsupported model: ${modelId}`); } } COMMAND_BLOCK: // Bedrock adapter export class BedrockProvider implements AIProvider { name = 'bedrock'; supportsStreaming = true; constructor(private region: string = 'us-east-1') {} async createChatCompletion(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const client = new BedrockRuntimeClient({ region: this.region }); // Bedrock has different request formats per model const modelId = this.mapModelName(request.model); const body = this.formatBedrockRequest(modelId, request); const command = new InvokeModelCommand({ modelId, body: JSON.stringify(body), }); const response = await client.send(command); const result = JSON.parse(new TextDecoder().decode(response.body)); return this.formatBedrockResponse(result, modelId); } private mapModelName(model: string): string { const modelMap = { 'claude-3-sonnet': 'anthropic.claude-3-sonnet-20240229-v1:0', 'claude-3-haiku': 'anthropic.claude-3-haiku-20240307-v1:0', 'claude-3.5-sonnet': 'anthropic.claude-3-5-sonnet-20240620-v1:0', }; return modelMap[model] || model; } private formatBedrockRequest(modelId: string, request: NormalizedChatRequest): any { if (modelId.includes('anthropic')) { return { anthropic_version: 'bedrock-2023-05-31', max_tokens: request.maxTokens || 4096, temperature: request.temperature || 0.7, messages: request.messages, }; } // Handle other model families... throw new Error(`Unsupported model: ${modelId}`); } } COMMAND_BLOCK: import { APIGatewayProxyHandler } from 'aws-lambda'; import { OpenAIProvider } from './providers/openai'; import { BedrockProvider } from './providers/bedrock'; import { AnthropicProvider } from './providers/anthropic'; const providers = { openai: new OpenAIProvider(process.env.OPENAI_API_KEY!), bedrock: new BedrockProvider(process.env.AWS_REGION!), anthropic: new AnthropicProvider(process.env.ANTHROPIC_API_KEY!), }; export const handler: APIGatewayProxyHandler = async (event) => { try { const request = JSON.parse(event.body || '{}'); const provider = getProvider(request.model); if (request.stream) { return handleStreamingRequest(request, provider); } else { return handleNormalRequest(request, provider); } } catch (error) { return { statusCode: 500, headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ error: error.message }), }; } }; function getProvider(model: string): AIProvider { // Model name routing logic if (model.startsWith('gpt-') || model.startsWith('text-')) { return providers.openai; } else if (model.startsWith('claude-')) { // Try Anthropic first, fall back to Bedrock return providers.anthropic; } else if (model.includes('bedrock') || model.includes('titan')) { return providers.bedrock; } // Default to OpenAI return providers.openai; } async function handleNormalRequest(request: any, provider: AIProvider) { const response = await provider.createChatCompletion(request); return { statusCode: 200, headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(response), }; } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: import { APIGatewayProxyHandler } from 'aws-lambda'; import { OpenAIProvider } from './providers/openai'; import { BedrockProvider } from './providers/bedrock'; import { AnthropicProvider } from './providers/anthropic'; const providers = { openai: new OpenAIProvider(process.env.OPENAI_API_KEY!), bedrock: new BedrockProvider(process.env.AWS_REGION!), anthropic: new AnthropicProvider(process.env.ANTHROPIC_API_KEY!), }; export const handler: APIGatewayProxyHandler = async (event) => { try { const request = JSON.parse(event.body || '{}'); const provider = getProvider(request.model); if (request.stream) { return handleStreamingRequest(request, provider); } else { return handleNormalRequest(request, provider); } } catch (error) { return { statusCode: 500, headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ error: error.message }), }; } }; function getProvider(model: string): AIProvider { // Model name routing logic if (model.startsWith('gpt-') || model.startsWith('text-')) { return providers.openai; } else if (model.startsWith('claude-')) { // Try Anthropic first, fall back to Bedrock return providers.anthropic; } else if (model.includes('bedrock') || model.includes('titan')) { return providers.bedrock; } // Default to OpenAI return providers.openai; } async function handleNormalRequest(request: any, provider: AIProvider) { const response = await provider.createChatCompletion(request); return { statusCode: 200, headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(response), }; } COMMAND_BLOCK: import { APIGatewayProxyHandler } from 'aws-lambda'; import { OpenAIProvider } from './providers/openai'; import { BedrockProvider } from './providers/bedrock'; import { AnthropicProvider } from './providers/anthropic'; const providers = { openai: new OpenAIProvider(process.env.OPENAI_API_KEY!), bedrock: new BedrockProvider(process.env.AWS_REGION!), anthropic: new AnthropicProvider(process.env.ANTHROPIC_API_KEY!), }; export const handler: APIGatewayProxyHandler = async (event) => { try { const request = JSON.parse(event.body || '{}'); const provider = getProvider(request.model); if (request.stream) { return handleStreamingRequest(request, provider); } else { return handleNormalRequest(request, provider); } } catch (error) { return { statusCode: 500, headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ error: error.message }), }; } }; function getProvider(model: string): AIProvider { // Model name routing logic if (model.startsWith('gpt-') || model.startsWith('text-')) { return providers.openai; } else if (model.startsWith('claude-')) { // Try Anthropic first, fall back to Bedrock return providers.anthropic; } else if (model.includes('bedrock') || model.includes('titan')) { return providers.bedrock; } // Default to OpenAI return providers.openai; } async function handleNormalRequest(request: any, provider: AIProvider) { const response = await provider.createChatCompletion(request); return { statusCode: 200, headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(response), }; } COMMAND_BLOCK: async function handleRequestWithFallback(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const fallbackChain = [ { provider: providers.anthropic, models: ['claude-3.5-sonnet', 'claude-3-sonnet'] }, { provider: providers.openai, models: ['gpt-4', 'gpt-3.5-turbo'] }, { provider: providers.bedrock, models: ['claude-3-haiku'] }, ]; for (const { provider, models } of fallbackChain) { for (const model of models) { try { console.log(`Trying ${provider.name} with model ${model}`); const fallbackRequest = { ...request, model }; const response = await provider.createChatCompletion(fallbackRequest); console.log(`Success with ${provider.name}/${model}`); return response; } catch (error) { console.log(`Failed with ${provider.name}/${model}: ${error.message}`); // Continue to next option } } } throw new Error('All providers failed'); } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: async function handleRequestWithFallback(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const fallbackChain = [ { provider: providers.anthropic, models: ['claude-3.5-sonnet', 'claude-3-sonnet'] }, { provider: providers.openai, models: ['gpt-4', 'gpt-3.5-turbo'] }, { provider: providers.bedrock, models: ['claude-3-haiku'] }, ]; for (const { provider, models } of fallbackChain) { for (const model of models) { try { console.log(`Trying ${provider.name} with model ${model}`); const fallbackRequest = { ...request, model }; const response = await provider.createChatCompletion(fallbackRequest); console.log(`Success with ${provider.name}/${model}`); return response; } catch (error) { console.log(`Failed with ${provider.name}/${model}: ${error.message}`); // Continue to next option } } } throw new Error('All providers failed'); } COMMAND_BLOCK: async function handleRequestWithFallback(request: NormalizedChatRequest): Promise<NormalizedChatResponse> { const fallbackChain = [ { provider: providers.anthropic, models: ['claude-3.5-sonnet', 'claude-3-sonnet'] }, { provider: providers.openai, models: ['gpt-4', 'gpt-3.5-turbo'] }, { provider: providers.bedrock, models: ['claude-3-haiku'] }, ]; for (const { provider, models } of fallbackChain) { for (const model of models) { try { console.log(`Trying ${provider.name} with model ${model}`); const fallbackRequest = { ...request, model }; const response = await provider.createChatCompletion(fallbackRequest); console.log(`Success with ${provider.name}/${model}`); return response; } catch (error) { console.log(`Failed with ${provider.name}/${model}: ${error.message}`); // Continue to next option } } } throw new Error('All providers failed'); } COMMAND_BLOCK: async function handleStreamingRequest(request: any, provider: AIProvider): Promise<any> { if (!provider.supportsStreaming) { // Fall back to non-streaming return handleNormalRequest(request, provider); } const generator = provider.createStreamingCompletion(request); let fullContent = ''; return { statusCode: 200, headers: { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', }, body: async function* () { try { for await (const chunk of generator) { fullContent += chunk.content; // SSE format yield `data: ${JSON.stringify(chunk)}\n\n`; if (chunk.finished) { break; } } // Final usage statistics yield `data: ${JSON.stringify({ type: 'complete', usage: { totalTokens: estimateTokens(fullContent) } })}\n\n`; } catch (error) { yield `data: ${JSON.stringify({ type: 'error', error: error.message })}\n\n`; } }(), isBase64Encoded: false, }; } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: async function handleStreamingRequest(request: any, provider: AIProvider): Promise<any> { if (!provider.supportsStreaming) { // Fall back to non-streaming return handleNormalRequest(request, provider); } const generator = provider.createStreamingCompletion(request); let fullContent = ''; return { statusCode: 200, headers: { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', }, body: async function* () { try { for await (const chunk of generator) { fullContent += chunk.content; // SSE format yield `data: ${JSON.stringify(chunk)}\n\n`; if (chunk.finished) { break; } } // Final usage statistics yield `data: ${JSON.stringify({ type: 'complete', usage: { totalTokens: estimateTokens(fullContent) } })}\n\n`; } catch (error) { yield `data: ${JSON.stringify({ type: 'error', error: error.message })}\n\n`; } }(), isBase64Encoded: false, }; } COMMAND_BLOCK: async function handleStreamingRequest(request: any, provider: AIProvider): Promise<any> { if (!provider.supportsStreaming) { // Fall back to non-streaming return handleNormalRequest(request, provider); } const generator = provider.createStreamingCompletion(request); let fullContent = ''; return { statusCode: 200, headers: { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', }, body: async function* () { try { for await (const chunk of generator) { fullContent += chunk.content; // SSE format yield `data: ${JSON.stringify(chunk)}\n\n`; if (chunk.finished) { break; } } // Final usage statistics yield `data: ${JSON.stringify({ type: 'complete', usage: { totalTokens: estimateTokens(fullContent) } })}\n\n`; } catch (error) { yield `data: ${JSON.stringify({ type: 'error', error: error.message })}\n\n`; } }(), isBase64Encoded: false, }; } CODE_BLOCK: // Configuration in DynamoDB or environment variables const config = { defaultProvider: 'anthropic', modelMapping: { 'chat': 'claude-3.5-sonnet', 'summary': 'gpt-3.5-turbo', 'analysis': 'claude-3-sonnet', }, fallbackChains: { 'claude-3.5-sonnet': ['gpt-4', 'claude-3-sonnet'], 'gpt-4': ['claude-3.5-sonnet', 'gpt-3.5-turbo'], }, costLimits: { daily: 100, // $100 per day monthly: 2000, // $2000 per month } }; Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: // Configuration in DynamoDB or environment variables const config = { defaultProvider: 'anthropic', modelMapping: { 'chat': 'claude-3.5-sonnet', 'summary': 'gpt-3.5-turbo', 'analysis': 'claude-3-sonnet', }, fallbackChains: { 'claude-3.5-sonnet': ['gpt-4', 'claude-3-sonnet'], 'gpt-4': ['claude-3.5-sonnet', 'gpt-3.5-turbo'], }, costLimits: { daily: 100, // $100 per day monthly: 2000, // $2000 per month } }; CODE_BLOCK: // Configuration in DynamoDB or environment variables const config = { defaultProvider: 'anthropic', modelMapping: { 'chat': 'claude-3.5-sonnet', 'summary': 'gpt-3.5-turbo', 'analysis': 'claude-3-sonnet', }, fallbackChains: { 'claude-3.5-sonnet': ['gpt-4', 'claude-3-sonnet'], 'gpt-4': ['claude-3.5-sonnet', 'gpt-3.5-turbo'], }, costLimits: { daily: 100, // $100 per day monthly: 2000, // $2000 per month } }; CODE_BLOCK: interface CostRecord { requestId: string; timestamp: number; provider: string; model: string; promptTokens: number; completionTokens: number; cost: number; userId?: string; application?: string; } async function logCost(record: CostRecord) { await dynamoClient.send(new PutItemCommand({ TableName: 'ai-costs', Item: marshall(record), })); } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: interface CostRecord { requestId: string; timestamp: number; provider: string; model: string; promptTokens: number; completionTokens: number; cost: number; userId?: string; application?: string; } async function logCost(record: CostRecord) { await dynamoClient.send(new PutItemCommand({ TableName: 'ai-costs', Item: marshall(record), })); } CODE_BLOCK: interface CostRecord { requestId: string; timestamp: number; provider: string; model: string; promptTokens: number; completionTokens: number; cost: number; userId?: string; application?: string; } async function logCost(record: CostRecord) { await dynamoClient.send(new PutItemCommand({ TableName: 'ai-costs', Item: marshall(record), })); } CODE_BLOCK: // Same API call works with any provider const response = await fetch('/ai/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'claude-3.5-sonnet', // or 'gpt-4', 'gpt-3.5-turbo', etc. messages: [ { role: 'user', content: 'Summarize this document...' } ], maxTokens: 150, stream: true // or false }) }); // Streaming response if (response.body) { const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); console.log(data.content); // Stream content to UI } } } } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: // Same API call works with any provider const response = await fetch('/ai/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'claude-3.5-sonnet', // or 'gpt-4', 'gpt-3.5-turbo', etc. messages: [ { role: 'user', content: 'Summarize this document...' } ], maxTokens: 150, stream: true // or false }) }); // Streaming response if (response.body) { const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); console.log(data.content); // Stream content to UI } } } } CODE_BLOCK: // Same API call works with any provider const response = await fetch('/ai/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'claude-3.5-sonnet', // or 'gpt-4', 'gpt-3.5-turbo', etc. messages: [ { role: 'user', content: 'Summarize this document...' } ], maxTokens: 150, stream: true // or false }) }); // Streaming response if (response.body) { const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); console.log(data.content); // Stream content to UI } } } } - TypeScript-first: Our frontend team lives in TypeScript. I needed strong typing across the entire stack. - AWS CDK deployment: Everything deploys through CDK. I needed infrastructure as code. - Cost tracking: Built-in tracking per request, per model, per application. LiteLLM doesn't handle this. - Custom auth: Integration with our existing auth system and user management. - Streaming through API Gateway: LiteLLM runs as a separate service. I needed streaming that worked with our existing infrastructure. - Vendor flexibility: Switch providers with config changes - Unified API: One integration instead of 7 different patterns - Automatic fallbacks: Reliability through redundancy - Streaming support: Real-time responses through SSE - Cost transparency: Track spending per request - TypeScript-first: Strong typing across the stack - Full provider adapters for OpenAI, Anthropic, and Bedrock - CDK deployment code - TypeScript SDK for clients - Cost tracking and monitoring - Streaming and non-streaming examples

🏷️ Tags

how-totutorialguidedev.toaiopenaillmgptserverroutingswitchgitgithub