Tools

Tools: Cost Tracking, Auth & Production Hardening

2026-03-06 0 views admin

Tools: Cost Tracking, Auth & Production Hardening

Source: Dev.to

The $2,847 Wake-Up Call ## Per-User Cost Tracking Architecture ## Real-Time Budget Management ## Authentication: Three Layers Deep ## 1. API Keys for External Developers ## 2. JWT for Internal Services ## 3. IAM Roles for AWS Services ## Rate Limiting with Token Buckets ## BYOK: Bring Your Own Key Deep Dive ## Monitoring and Alerting ## Security Hardening Checklist ## Cost Breakdown: What This Actually Costs ## Real Production Incidents ## The Security Mindset Shift ## What's Next $2,847 for six hours of work. That's what I paid when a developer accidentally created an infinite loop in my AI platform. GPT-4 at $3.20 per iteration, running 891 times while I slept. The agent kept retrying a failed classification call, convinced it could eventually succeed. It never did. But my credit card kept getting charged. That morning taught me that cost control isn't a nice-to-have feature - it's life support for AI platforms. Here's exactly what happened. A developer was testing an agent that analyzed customer feedback. The agent was supposed to: But there was a bug in the ReAct loop. When the classification tool returned an empty result (which happened for non-English reviews), the agent assumed it failed and retried. Forever. The logs told the story: Each retry: 4,000 tokens of GPT-4. 891 retries x $3.20 = $2,847. That's when I built comprehensive cost tracking and budget controls. Because if you're building an AI platform without cost guardrails, you're building a financial timebomb. Every request now logs detailed cost information to DynamoDB. Here's the tracking middleware: The budget system prevents runaway costs with soft and hard limits: I implement three authentication patterns depending on the use case: Internal microservices use JWT tokens with short expiration: Lambda functions and ECS tasks use IAM roles for service-to-service authentication: I use the token bucket algorithm for smooth rate limiting: Many users want to bring their own OpenAI/Anthropic API keys for cost control and compliance. This requires careful security handling: CloudWatch dashboards show real-time platform health: Here's my production security configuration: VPC Configuration for ECS: API Gateway with WAF: After 8 months in production serving 1,500 requests/day: Fixed Infrastructure Costs (Monthly): Variable AI Costs (Pass-through): Cost Optimization Wins: Incident 1: Memory Leak in ECS Agent Incident 2: DynamoDB Throttling Incident 3: BYOK Key Validation Loop Building production AI infrastructure changed how I think about security. Traditional web apps have predictable resource usage. AI apps can consume unlimited resources with a single malicious prompt. Every endpoint needs three guards: Miss any one, and you're vulnerable. The complete production setup is documented in my ai-platform-aws-examples repo. Next week, I'll tie everything together with a complete deployment walkthrough. You'll see how to go from zero to a fully operational AI platform in under an hour. But more importantly, I'll share the real numbers: what this platform actually costs to run, performance metrics from production, and the roadmap for what's coming next. Because the best architecture means nothing if you can't deploy it reliably. This is part 7 of an 8-part series documenting my journey building an AI platform on AWS. Next week: the complete deployment guide and lessons from 8 months in production. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: 2024-01-15 14:23:15 - Agent: Classifying review text... 2024-01-15 14:23:18 - Tool: Classification failed - no result 2024-01-15 14:23:19 - Agent: Let me try classifying again... 2024-01-15 14:23:22 - Tool: Classification failed - no result 2024-01-15 14:23:23 - Agent: Let me try classifying again... ... (repeats 891 times) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: 2024-01-15 14:23:15 - Agent: Classifying review text... 2024-01-15 14:23:18 - Tool: Classification failed - no result 2024-01-15 14:23:19 - Agent: Let me try classifying again... 2024-01-15 14:23:22 - Tool: Classification failed - no result 2024-01-15 14:23:23 - Agent: Let me try classifying again... ... (repeats 891 times) CODE_BLOCK: 2024-01-15 14:23:15 - Agent: Classifying review text... 2024-01-15 14:23:18 - Tool: Classification failed - no result 2024-01-15 14:23:19 - Agent: Let me try classifying again... 2024-01-15 14:23:22 - Tool: Classification failed - no result 2024-01-15 14:23:23 - Agent: Let me try classifying again... ... (repeats 891 times) COMMAND_BLOCK: export interface UsageRecord { userId: string; requestId: string; timestamp: number; provider: string; model: string; promptTokens: number; completionTokens: number; totalTokens: number; estimatedCost: number; actualCost?: number; // Updated when we get actual billing requestType: 'completion' | 'embedding' | 'agent'; metadata: { endpoint: string; userAgent: string; duration: number; byok: boolean; // Bring Your Own Key }; } import { DynamoDBClient } from '@aws-sdk/client-dynamodb'; import { DynamoDBDocumentClient, PutCommand, UpdateCommand } from '@aws-sdk/lib-dynamodb'; export class CostTracker { private readonly pricingTable: Map<string, TokenPricing>; constructor(private dynamoClient: DynamoDBDocumentClient) { this.initializePricing(); } async trackUsage( userId: string, requestId: string, usage: TokenUsage, metadata: RequestMetadata ): Promise<void> { const pricing = this.pricingTable.get(`${usage.provider}:${usage.model}`); if (!pricing) { throw new Error(`No pricing data for ${usage.provider}:${usage.model}`); } const promptCost = (usage.promptTokens / 1000) * pricing.promptPer1K; const completionCost = (usage.completionTokens / 1000) * pricing.completionPer1K; const estimatedCost = promptCost + completionCost; const record: UsageRecord = { userId, requestId, timestamp: Date.now(), provider: usage.provider, model: usage.model, promptTokens: usage.promptTokens, completionTokens: usage.completionTokens, totalTokens: usage.totalTokens, estimatedCost, requestType: metadata.requestType, metadata: { endpoint: metadata.endpoint, userAgent: metadata.userAgent, duration: metadata.duration, byok: metadata.byok } }; await this.dynamoClient.send(new PutCommand({ TableName: 'ai-platform-usage', Item: record })); // Update real-time budget tracking await this.updateUserBudget(userId, estimatedCost); } private initializePricing(): void { // Updated regularly from provider APIs this.pricingTable.set('openai:gpt-4', { promptPer1K: 0.030, completionPer1K: 0.060 }); this.pricingTable.set('openai:gpt-4-turbo', { promptPer1K: 0.010, completionPer1K: 0.030 }); this.pricingTable.set('anthropic:claude-3-sonnet', { promptPer1K: 0.003, completionPer1K: 0.015 }); // ... more models } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: export interface UsageRecord { userId: string; requestId: string; timestamp: number; provider: string; model: string; promptTokens: number; completionTokens: number; totalTokens: number; estimatedCost: number; actualCost?: number; // Updated when we get actual billing requestType: 'completion' | 'embedding' | 'agent'; metadata: { endpoint: string; userAgent: string; duration: number; byok: boolean; // Bring Your Own Key }; } import { DynamoDBClient } from '@aws-sdk/client-dynamodb'; import { DynamoDBDocumentClient, PutCommand, UpdateCommand } from '@aws-sdk/lib-dynamodb'; export class CostTracker { private readonly pricingTable: Map<string, TokenPricing>; constructor(private dynamoClient: DynamoDBDocumentClient) { this.initializePricing(); } async trackUsage( userId: string, requestId: string, usage: TokenUsage, metadata: RequestMetadata ): Promise<void> { const pricing = this.pricingTable.get(`${usage.provider}:${usage.model}`); if (!pricing) { throw new Error(`No pricing data for ${usage.provider}:${usage.model}`); } const promptCost = (usage.promptTokens / 1000) * pricing.promptPer1K; const completionCost = (usage.completionTokens / 1000) * pricing.completionPer1K; const estimatedCost = promptCost + completionCost; const record: UsageRecord = { userId, requestId, timestamp: Date.now(), provider: usage.provider, model: usage.model, promptTokens: usage.promptTokens, completionTokens: usage.completionTokens, totalTokens: usage.totalTokens, estimatedCost, requestType: metadata.requestType, metadata: { endpoint: metadata.endpoint, userAgent: metadata.userAgent, duration: metadata.duration, byok: metadata.byok } }; await this.dynamoClient.send(new PutCommand({ TableName: 'ai-platform-usage', Item: record })); // Update real-time budget tracking await this.updateUserBudget(userId, estimatedCost); } private initializePricing(): void { // Updated regularly from provider APIs this.pricingTable.set('openai:gpt-4', { promptPer1K: 0.030, completionPer1K: 0.060 }); this.pricingTable.set('openai:gpt-4-turbo', { promptPer1K: 0.010, completionPer1K: 0.030 }); this.pricingTable.set('anthropic:claude-3-sonnet', { promptPer1K: 0.003, completionPer1K: 0.015 }); // ... more models } } COMMAND_BLOCK: export interface UsageRecord { userId: string; requestId: string; timestamp: number; provider: string; model: string; promptTokens: number; completionTokens: number; totalTokens: number; estimatedCost: number; actualCost?: number; // Updated when we get actual billing requestType: 'completion' | 'embedding' | 'agent'; metadata: { endpoint: string; userAgent: string; duration: number; byok: boolean; // Bring Your Own Key }; } import { DynamoDBClient } from '@aws-sdk/client-dynamodb'; import { DynamoDBDocumentClient, PutCommand, UpdateCommand } from '@aws-sdk/lib-dynamodb'; export class CostTracker { private readonly pricingTable: Map<string, TokenPricing>; constructor(private dynamoClient: DynamoDBDocumentClient) { this.initializePricing(); } async trackUsage( userId: string, requestId: string, usage: TokenUsage, metadata: RequestMetadata ): Promise<void> { const pricing = this.pricingTable.get(`${usage.provider}:${usage.model}`); if (!pricing) { throw new Error(`No pricing data for ${usage.provider}:${usage.model}`); } const promptCost = (usage.promptTokens / 1000) * pricing.promptPer1K; const completionCost = (usage.completionTokens / 1000) * pricing.completionPer1K; const estimatedCost = promptCost + completionCost; const record: UsageRecord = { userId, requestId, timestamp: Date.now(), provider: usage.provider, model: usage.model, promptTokens: usage.promptTokens, completionTokens: usage.completionTokens, totalTokens: usage.totalTokens, estimatedCost, requestType: metadata.requestType, metadata: { endpoint: metadata.endpoint, userAgent: metadata.userAgent, duration: metadata.duration, byok: metadata.byok } }; await this.dynamoClient.send(new PutCommand({ TableName: 'ai-platform-usage', Item: record })); // Update real-time budget tracking await this.updateUserBudget(userId, estimatedCost); } private initializePricing(): void { // Updated regularly from provider APIs this.pricingTable.set('openai:gpt-4', { promptPer1K: 0.030, completionPer1K: 0.060 }); this.pricingTable.set('openai:gpt-4-turbo', { promptPer1K: 0.010, completionPer1K: 0.030 }); this.pricingTable.set('anthropic:claude-3-sonnet', { promptPer1K: 0.003, completionPer1K: 0.015 }); // ... more models } } COMMAND_BLOCK: export interface UserBudget { userId: string; monthlyLimit: number; currentSpend: number; warningThreshold: number; // Default: 80% lastUpdated: number; status: 'active' | 'warning' | 'blocked'; notifications: { warning: boolean; limit: boolean; lastSent: number; }; } export class BudgetManager { async checkBudget(userId: string, estimatedCost: number): Promise<BudgetCheckResult> { const budget = await this.getUserBudget(userId); const projectedSpend = budget.currentSpend + estimatedCost; if (projectedSpend > budget.monthlyLimit) { return { allowed: false, reason: 'Monthly budget exceeded', currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: 0 }; } if (projectedSpend > (budget.monthlyLimit * budget.warningThreshold / 100)) { await this.sendBudgetWarning(userId, budget); return { allowed: true, warning: true, reason: 'Approaching budget limit', currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: budget.monthlyLimit - projectedSpend }; } return { allowed: true, currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: budget.monthlyLimit - projectedSpend }; } async updateUserBudget(userId: string, cost: number): Promise<void> { const now = Date.now(); const monthStart = new Date(new Date().getFullYear(), new Date().getMonth(), 1).getTime(); await this.dynamoClient.send(new UpdateCommand({ TableName: 'ai-platform-budgets', Key: { userId }, UpdateExpression: ` SET currentSpend = if_not_exists(currentSpend, :zero) + :cost, lastUpdated = :now, #status = :status `, ExpressionAttributeNames: { '#status': 'status' }, ExpressionAttributeValues: { ':cost': cost, ':now': now, ':zero': 0, ':status': 'active' } })); // Reset monthly spend if new month if (now > monthStart + (30 * 24 * 60 * 60 * 1000)) { await this.resetMonthlyBudget(userId); } } private async sendBudgetWarning(userId: string, budget: UserBudget): Promise<void> { const timeSinceLastWarning = Date.now() - budget.notifications.lastSent; const hoursSinceWarning = timeSinceLastWarning / (1000 * 60 * 60); // Don't spam warnings - max once per 6 hours if (hoursSinceWarning < 6) return; const percentUsed = (budget.currentSpend / budget.monthlyLimit) * 100; await this.notificationService.send({ userId, type: 'budget_warning', title: 'AI Usage Budget Warning', message: `You've used ${percentUsed.toFixed(1)}% of your monthly AI budget ($${budget.currentSpend.toFixed(2)} of $${budget.monthlyLimit})`, severity: 'warning' }); await this.updateNotificationTime(userId); } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: export interface UserBudget { userId: string; monthlyLimit: number; currentSpend: number; warningThreshold: number; // Default: 80% lastUpdated: number; status: 'active' | 'warning' | 'blocked'; notifications: { warning: boolean; limit: boolean; lastSent: number; }; } export class BudgetManager { async checkBudget(userId: string, estimatedCost: number): Promise<BudgetCheckResult> { const budget = await this.getUserBudget(userId); const projectedSpend = budget.currentSpend + estimatedCost; if (projectedSpend > budget.monthlyLimit) { return { allowed: false, reason: 'Monthly budget exceeded', currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: 0 }; } if (projectedSpend > (budget.monthlyLimit * budget.warningThreshold / 100)) { await this.sendBudgetWarning(userId, budget); return { allowed: true, warning: true, reason: 'Approaching budget limit', currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: budget.monthlyLimit - projectedSpend }; } return { allowed: true, currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: budget.monthlyLimit - projectedSpend }; } async updateUserBudget(userId: string, cost: number): Promise<void> { const now = Date.now(); const monthStart = new Date(new Date().getFullYear(), new Date().getMonth(), 1).getTime(); await this.dynamoClient.send(new UpdateCommand({ TableName: 'ai-platform-budgets', Key: { userId }, UpdateExpression: ` SET currentSpend = if_not_exists(currentSpend, :zero) + :cost, lastUpdated = :now, #status = :status `, ExpressionAttributeNames: { '#status': 'status' }, ExpressionAttributeValues: { ':cost': cost, ':now': now, ':zero': 0, ':status': 'active' } })); // Reset monthly spend if new month if (now > monthStart + (30 * 24 * 60 * 60 * 1000)) { await this.resetMonthlyBudget(userId); } } private async sendBudgetWarning(userId: string, budget: UserBudget): Promise<void> { const timeSinceLastWarning = Date.now() - budget.notifications.lastSent; const hoursSinceWarning = timeSinceLastWarning / (1000 * 60 * 60); // Don't spam warnings - max once per 6 hours if (hoursSinceWarning < 6) return; const percentUsed = (budget.currentSpend / budget.monthlyLimit) * 100; await this.notificationService.send({ userId, type: 'budget_warning', title: 'AI Usage Budget Warning', message: `You've used ${percentUsed.toFixed(1)}% of your monthly AI budget ($${budget.currentSpend.toFixed(2)} of $${budget.monthlyLimit})`, severity: 'warning' }); await this.updateNotificationTime(userId); } } COMMAND_BLOCK: export interface UserBudget { userId: string; monthlyLimit: number; currentSpend: number; warningThreshold: number; // Default: 80% lastUpdated: number; status: 'active' | 'warning' | 'blocked'; notifications: { warning: boolean; limit: boolean; lastSent: number; }; } export class BudgetManager { async checkBudget(userId: string, estimatedCost: number): Promise<BudgetCheckResult> { const budget = await this.getUserBudget(userId); const projectedSpend = budget.currentSpend + estimatedCost; if (projectedSpend > budget.monthlyLimit) { return { allowed: false, reason: 'Monthly budget exceeded', currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: 0 }; } if (projectedSpend > (budget.monthlyLimit * budget.warningThreshold / 100)) { await this.sendBudgetWarning(userId, budget); return { allowed: true, warning: true, reason: 'Approaching budget limit', currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: budget.monthlyLimit - projectedSpend }; } return { allowed: true, currentSpend: budget.currentSpend, limit: budget.monthlyLimit, remainingBudget: budget.monthlyLimit - projectedSpend }; } async updateUserBudget(userId: string, cost: number): Promise<void> { const now = Date.now(); const monthStart = new Date(new Date().getFullYear(), new Date().getMonth(), 1).getTime(); await this.dynamoClient.send(new UpdateCommand({ TableName: 'ai-platform-budgets', Key: { userId }, UpdateExpression: ` SET currentSpend = if_not_exists(currentSpend, :zero) + :cost, lastUpdated = :now, #status = :status `, ExpressionAttributeNames: { '#status': 'status' }, ExpressionAttributeValues: { ':cost': cost, ':now': now, ':zero': 0, ':status': 'active' } })); // Reset monthly spend if new month if (now > monthStart + (30 * 24 * 60 * 60 * 1000)) { await this.resetMonthlyBudget(userId); } } private async sendBudgetWarning(userId: string, budget: UserBudget): Promise<void> { const timeSinceLastWarning = Date.now() - budget.notifications.lastSent; const hoursSinceWarning = timeSinceLastWarning / (1000 * 60 * 60); // Don't spam warnings - max once per 6 hours if (hoursSinceWarning < 6) return; const percentUsed = (budget.currentSpend / budget.monthlyLimit) * 100; await this.notificationService.send({ userId, type: 'budget_warning', title: 'AI Usage Budget Warning', message: `You've used ${percentUsed.toFixed(1)}% of your monthly AI budget ($${budget.currentSpend.toFixed(2)} of $${budget.monthlyLimit})`, severity: 'warning' }); await this.updateNotificationTime(userId); } } COMMAND_BLOCK: export interface ApiKey { keyId: string; keyPrefix: string; // First 8 chars for display hashedKey: string; // bcrypt hash userId: string; name: string; scopes: string[]; rateLimit: { requestsPerMinute: number; tokensPerMinute: number; }; budget: { monthlyLimit: number; currentSpend: number; }; status: 'active' | 'suspended' | 'revoked'; createdAt: number; lastUsed?: number; expiresAt?: number; } export class ApiKeyAuth { async validateApiKey(rawKey: string): Promise<AuthResult> { // Extract key prefix const keyId = rawKey.substring(0, 12); const keyData = await this.getApiKey(keyId); if (!keyData || keyData.status !== 'active') { return { valid: false, reason: 'Invalid API key' }; } // Check expiration if (keyData.expiresAt && Date.now() > keyData.expiresAt) { return { valid: false, reason: 'API key expired' }; } // Verify hash const isValid = await bcrypt.compare(rawKey, keyData.hashedKey); if (!isValid) { return { valid: false, reason: 'Invalid API key' }; } // Update last used await this.updateLastUsed(keyId); return { valid: true, userId: keyData.userId, scopes: keyData.scopes, rateLimit: keyData.rateLimit, budget: keyData.budget }; } async createApiKey(userId: string, options: CreateKeyOptions): Promise<string> { const rawKey = `sk-${generateId(48)}`; // sk- prefix like OpenAI const hashedKey = await bcrypt.hash(rawKey, 12); const apiKey: ApiKey = { keyId: rawKey.substring(0, 12), keyPrefix: rawKey.substring(0, 8), hashedKey, userId, name: options.name, scopes: options.scopes || ['ai:complete', 'ai:embed'], rateLimit: options.rateLimit || { requestsPerMinute: 60, tokensPerMinute: 100000 }, budget: options.budget || { monthlyLimit: 100, currentSpend: 0 }, status: 'active', createdAt: Date.now(), expiresAt: options.expiresAt }; await this.storeApiKey(apiKey); return rawKey; // Only returned once! } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: export interface ApiKey { keyId: string; keyPrefix: string; // First 8 chars for display hashedKey: string; // bcrypt hash userId: string; name: string; scopes: string[]; rateLimit: { requestsPerMinute: number; tokensPerMinute: number; }; budget: { monthlyLimit: number; currentSpend: number; }; status: 'active' | 'suspended' | 'revoked'; createdAt: number; lastUsed?: number; expiresAt?: number; } export class ApiKeyAuth { async validateApiKey(rawKey: string): Promise<AuthResult> { // Extract key prefix const keyId = rawKey.substring(0, 12); const keyData = await this.getApiKey(keyId); if (!keyData || keyData.status !== 'active') { return { valid: false, reason: 'Invalid API key' }; } // Check expiration if (keyData.expiresAt && Date.now() > keyData.expiresAt) { return { valid: false, reason: 'API key expired' }; } // Verify hash const isValid = await bcrypt.compare(rawKey, keyData.hashedKey); if (!isValid) { return { valid: false, reason: 'Invalid API key' }; } // Update last used await this.updateLastUsed(keyId); return { valid: true, userId: keyData.userId, scopes: keyData.scopes, rateLimit: keyData.rateLimit, budget: keyData.budget }; } async createApiKey(userId: string, options: CreateKeyOptions): Promise<string> { const rawKey = `sk-${generateId(48)}`; // sk- prefix like OpenAI const hashedKey = await bcrypt.hash(rawKey, 12); const apiKey: ApiKey = { keyId: rawKey.substring(0, 12), keyPrefix: rawKey.substring(0, 8), hashedKey, userId, name: options.name, scopes: options.scopes || ['ai:complete', 'ai:embed'], rateLimit: options.rateLimit || { requestsPerMinute: 60, tokensPerMinute: 100000 }, budget: options.budget || { monthlyLimit: 100, currentSpend: 0 }, status: 'active', createdAt: Date.now(), expiresAt: options.expiresAt }; await this.storeApiKey(apiKey); return rawKey; // Only returned once! } } COMMAND_BLOCK: export interface ApiKey { keyId: string; keyPrefix: string; // First 8 chars for display hashedKey: string; // bcrypt hash userId: string; name: string; scopes: string[]; rateLimit: { requestsPerMinute: number; tokensPerMinute: number; }; budget: { monthlyLimit: number; currentSpend: number; }; status: 'active' | 'suspended' | 'revoked'; createdAt: number; lastUsed?: number; expiresAt?: number; } export class ApiKeyAuth { async validateApiKey(rawKey: string): Promise<AuthResult> { // Extract key prefix const keyId = rawKey.substring(0, 12); const keyData = await this.getApiKey(keyId); if (!keyData || keyData.status !== 'active') { return { valid: false, reason: 'Invalid API key' }; } // Check expiration if (keyData.expiresAt && Date.now() > keyData.expiresAt) { return { valid: false, reason: 'API key expired' }; } // Verify hash const isValid = await bcrypt.compare(rawKey, keyData.hashedKey); if (!isValid) { return { valid: false, reason: 'Invalid API key' }; } // Update last used await this.updateLastUsed(keyId); return { valid: true, userId: keyData.userId, scopes: keyData.scopes, rateLimit: keyData.rateLimit, budget: keyData.budget }; } async createApiKey(userId: string, options: CreateKeyOptions): Promise<string> { const rawKey = `sk-${generateId(48)}`; // sk- prefix like OpenAI const hashedKey = await bcrypt.hash(rawKey, 12); const apiKey: ApiKey = { keyId: rawKey.substring(0, 12), keyPrefix: rawKey.substring(0, 8), hashedKey, userId, name: options.name, scopes: options.scopes || ['ai:complete', 'ai:embed'], rateLimit: options.rateLimit || { requestsPerMinute: 60, tokensPerMinute: 100000 }, budget: options.budget || { monthlyLimit: 100, currentSpend: 0 }, status: 'active', createdAt: Date.now(), expiresAt: options.expiresAt }; await this.storeApiKey(apiKey); return rawKey; // Only returned once! } } COMMAND_BLOCK: export class JWTAuth { constructor(private secretKey: string) {} generateServiceToken(serviceId: string, scopes: string[]): string { return jwt.sign( { sub: serviceId, aud: 'ai-platform', iss: 'ai-platform-auth', scopes, type: 'service' }, this.secretKey, { expiresIn: '1h', algorithm: 'HS256' } ); } async validateJWT(token: string): Promise<JWTAuthResult> { try { const decoded = jwt.verify(token, this.secretKey) as JWTPayload; return { valid: true, serviceId: decoded.sub, scopes: decoded.scopes, type: decoded.type }; } catch (error) { return { valid: false, reason: error.message }; } } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: export class JWTAuth { constructor(private secretKey: string) {} generateServiceToken(serviceId: string, scopes: string[]): string { return jwt.sign( { sub: serviceId, aud: 'ai-platform', iss: 'ai-platform-auth', scopes, type: 'service' }, this.secretKey, { expiresIn: '1h', algorithm: 'HS256' } ); } async validateJWT(token: string): Promise<JWTAuthResult> { try { const decoded = jwt.verify(token, this.secretKey) as JWTPayload; return { valid: true, serviceId: decoded.sub, scopes: decoded.scopes, type: decoded.type }; } catch (error) { return { valid: false, reason: error.message }; } } } COMMAND_BLOCK: export class JWTAuth { constructor(private secretKey: string) {} generateServiceToken(serviceId: string, scopes: string[]): string { return jwt.sign( { sub: serviceId, aud: 'ai-platform', iss: 'ai-platform-auth', scopes, type: 'service' }, this.secretKey, { expiresIn: '1h', algorithm: 'HS256' } ); } async validateJWT(token: string): Promise<JWTAuthResult> { try { const decoded = jwt.verify(token, this.secretKey) as JWTPayload; return { valid: true, serviceId: decoded.sub, scopes: decoded.scopes, type: decoded.type }; } catch (error) { return { valid: false, reason: error.message }; } } } COMMAND_BLOCK: export class IAMAuth { async validateAWSRequest(request: Request): Promise<IAMAuthResult> { const authHeader = request.headers['authorization']; if (!authHeader?.startsWith('AWS4-HMAC-SHA256')) { return { valid: false, reason: 'Missing AWS signature' }; } // Parse AWS Signature V4 const signature = this.parseAWSSignature(authHeader); const isValid = await this.verifyAWSSignature(request, signature); if (!isValid) { return { valid: false, reason: 'Invalid AWS signature' }; } // Get IAM role/user details const identity = await this.getAWSIdentity(signature.accessKeyId); return { valid: true, identity, scopes: this.mapIAMToScopes(identity.policies) }; } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: export class IAMAuth { async validateAWSRequest(request: Request): Promise<IAMAuthResult> { const authHeader = request.headers['authorization']; if (!authHeader?.startsWith('AWS4-HMAC-SHA256')) { return { valid: false, reason: 'Missing AWS signature' }; } // Parse AWS Signature V4 const signature = this.parseAWSSignature(authHeader); const isValid = await this.verifyAWSSignature(request, signature); if (!isValid) { return { valid: false, reason: 'Invalid AWS signature' }; } // Get IAM role/user details const identity = await this.getAWSIdentity(signature.accessKeyId); return { valid: true, identity, scopes: this.mapIAMToScopes(identity.policies) }; } } COMMAND_BLOCK: export class IAMAuth { async validateAWSRequest(request: Request): Promise<IAMAuthResult> { const authHeader = request.headers['authorization']; if (!authHeader?.startsWith('AWS4-HMAC-SHA256')) { return { valid: false, reason: 'Missing AWS signature' }; } // Parse AWS Signature V4 const signature = this.parseAWSSignature(authHeader); const isValid = await this.verifyAWSSignature(request, signature); if (!isValid) { return { valid: false, reason: 'Invalid AWS signature' }; } // Get IAM role/user details const identity = await this.getAWSIdentity(signature.accessKeyId); return { valid: true, identity, scopes: this.mapIAMToScopes(identity.policies) }; } } COMMAND_BLOCK: export class RateLimiter { private buckets = new Map<string, TokenBucket>(); async checkRate( userId: string, requestType: 'request' | 'token', amount: number = 1 ): Promise<RateLimitResult> { const bucketKey = `${userId}:${requestType}`; let bucket = this.buckets.get(bucketKey); if (!bucket) { const limits = await this.getUserLimits(userId); bucket = new TokenBucket( limits[requestType].capacity, limits[requestType].refillRate ); this.buckets.set(bucketKey, bucket); } const allowed = bucket.consume(amount); return { allowed, remainingTokens: bucket.tokens, refillRate: bucket.refillRate, resetTime: bucket.nextRefill }; } } class TokenBucket { private lastRefill: number; constructor( private capacity: number, public refillRate: number, // tokens per second public tokens: number = capacity ) { this.lastRefill = Date.now(); } consume(amount: number): boolean { this.refill(); if (this.tokens >= amount) { this.tokens -= amount; return true; } return false; } private refill(): void { const now = Date.now(); const timePassed = (now - this.lastRefill) / 1000; const tokensToAdd = timePassed * this.refillRate; this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd); this.lastRefill = now; } get nextRefill(): number { const tokensNeeded = this.capacity - this.tokens; const timeToRefill = tokensNeeded / this.refillRate; return this.lastRefill + (timeToRefill * 1000); } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: export class RateLimiter { private buckets = new Map<string, TokenBucket>(); async checkRate( userId: string, requestType: 'request' | 'token', amount: number = 1 ): Promise<RateLimitResult> { const bucketKey = `${userId}:${requestType}`; let bucket = this.buckets.get(bucketKey); if (!bucket) { const limits = await this.getUserLimits(userId); bucket = new TokenBucket( limits[requestType].capacity, limits[requestType].refillRate ); this.buckets.set(bucketKey, bucket); } const allowed = bucket.consume(amount); return { allowed, remainingTokens: bucket.tokens, refillRate: bucket.refillRate, resetTime: bucket.nextRefill }; } } class TokenBucket { private lastRefill: number; constructor( private capacity: number, public refillRate: number, // tokens per second public tokens: number = capacity ) { this.lastRefill = Date.now(); } consume(amount: number): boolean { this.refill(); if (this.tokens >= amount) { this.tokens -= amount; return true; } return false; } private refill(): void { const now = Date.now(); const timePassed = (now - this.lastRefill) / 1000; const tokensToAdd = timePassed * this.refillRate; this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd); this.lastRefill = now; } get nextRefill(): number { const tokensNeeded = this.capacity - this.tokens; const timeToRefill = tokensNeeded / this.refillRate; return this.lastRefill + (timeToRefill * 1000); } } COMMAND_BLOCK: export class RateLimiter { private buckets = new Map<string, TokenBucket>(); async checkRate( userId: string, requestType: 'request' | 'token', amount: number = 1 ): Promise<RateLimitResult> { const bucketKey = `${userId}:${requestType}`; let bucket = this.buckets.get(bucketKey); if (!bucket) { const limits = await this.getUserLimits(userId); bucket = new TokenBucket( limits[requestType].capacity, limits[requestType].refillRate ); this.buckets.set(bucketKey, bucket); } const allowed = bucket.consume(amount); return { allowed, remainingTokens: bucket.tokens, refillRate: bucket.refillRate, resetTime: bucket.nextRefill }; } } class TokenBucket { private lastRefill: number; constructor( private capacity: number, public refillRate: number, // tokens per second public tokens: number = capacity ) { this.lastRefill = Date.now(); } consume(amount: number): boolean { this.refill(); if (this.tokens >= amount) { this.tokens -= amount; return true; } return false; } private refill(): void { const now = Date.now(); const timePassed = (now - this.lastRefill) / 1000; const tokensToAdd = timePassed * this.refillRate; this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd); this.lastRefill = now; } get nextRefill(): number { const tokensNeeded = this.capacity - this.tokens; const timeToRefill = tokensNeeded / this.refillRate; return this.lastRefill + (timeToRefill * 1000); } } COMMAND_BLOCK: export class BYOKManager { private readonly encryptionKey: Buffer; constructor() { this.encryptionKey = crypto.scryptSync( process.env.BYOK_PASSWORD!, process.env.BYOK_SALT!, 32 ); } async storeUserKey( userId: string, provider: string, apiKey: string, metadata?: KeyMetadata ): Promise<void> { // Validate key before storing const isValid = await this.validateProviderKey(provider, apiKey); if (!isValid) { throw new Error('Invalid API key for provider'); } // Encrypt the key const iv = crypto.randomBytes(16); const cipher = crypto.createCipherGCM('aes-256-gcm', this.encryptionKey); cipher.setAAD(Buffer.from(userId)); // Additional authenticated data let encrypted = cipher.update(apiKey, 'utf8', 'hex'); encrypted += cipher.final('hex'); const authTag = cipher.getAuthTag(); // Store encrypted key await this.dynamoClient.put({ TableName: 'ai-platform-user-keys', Item: { userId, provider, encryptedKey: encrypted, iv: iv.toString('hex'), authTag: authTag.toString('hex'), metadata, createdAt: Date.now(), lastValidated: Date.now(), status: 'active' } }).promise(); } async getUserKey(userId: string, provider: string): Promise<string | null> { const result = await this.dynamoClient.get({ TableName: 'ai-platform-user-keys', Key: { userId, provider } }).promise(); if (!result.Item) return null; const { encryptedKey, iv, authTag } = result.Item; // Decrypt the key const decipher = crypto.createDecipherGCM('aes-256-gcm', this.encryptionKey); decipher.setAAD(Buffer.from(userId)); decipher.setAuthTag(Buffer.from(authTag, 'hex')); let decrypted = decipher.update(encryptedKey, 'hex', 'utf8'); decrypted += decipher.final('utf8'); return decrypted; } private async validateProviderKey(provider: string, apiKey: string): Promise<boolean> { try { switch (provider) { case 'openai': const openai = new OpenAI({ apiKey }); await openai.models.list(); return true; case 'anthropic': const anthropic = new Anthropic({ apiKey }); await anthropic.messages.create({ model: 'claude-3-haiku-20240307', messages: [{ role: 'user', content: 'test' }], max_tokens: 1 }); return true; default: return false; } } catch (error) { return false; } } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: export class BYOKManager { private readonly encryptionKey: Buffer; constructor() { this.encryptionKey = crypto.scryptSync( process.env.BYOK_PASSWORD!, process.env.BYOK_SALT!, 32 ); } async storeUserKey( userId: string, provider: string, apiKey: string, metadata?: KeyMetadata ): Promise<void> { // Validate key before storing const isValid = await this.validateProviderKey(provider, apiKey); if (!isValid) { throw new Error('Invalid API key for provider'); } // Encrypt the key const iv = crypto.randomBytes(16); const cipher = crypto.createCipherGCM('aes-256-gcm', this.encryptionKey); cipher.setAAD(Buffer.from(userId)); // Additional authenticated data let encrypted = cipher.update(apiKey, 'utf8', 'hex'); encrypted += cipher.final('hex'); const authTag = cipher.getAuthTag(); // Store encrypted key await this.dynamoClient.put({ TableName: 'ai-platform-user-keys', Item: { userId, provider, encryptedKey: encrypted, iv: iv.toString('hex'), authTag: authTag.toString('hex'), metadata, createdAt: Date.now(), lastValidated: Date.now(), status: 'active' } }).promise(); } async getUserKey(userId: string, provider: string): Promise<string | null> { const result = await this.dynamoClient.get({ TableName: 'ai-platform-user-keys', Key: { userId, provider } }).promise(); if (!result.Item) return null; const { encryptedKey, iv, authTag } = result.Item; // Decrypt the key const decipher = crypto.createDecipherGCM('aes-256-gcm', this.encryptionKey); decipher.setAAD(Buffer.from(userId)); decipher.setAuthTag(Buffer.from(authTag, 'hex')); let decrypted = decipher.update(encryptedKey, 'hex', 'utf8'); decrypted += decipher.final('utf8'); return decrypted; } private async validateProviderKey(provider: string, apiKey: string): Promise<boolean> { try { switch (provider) { case 'openai': const openai = new OpenAI({ apiKey }); await openai.models.list(); return true; case 'anthropic': const anthropic = new Anthropic({ apiKey }); await anthropic.messages.create({ model: 'claude-3-haiku-20240307', messages: [{ role: 'user', content: 'test' }], max_tokens: 1 }); return true; default: return false; } } catch (error) { return false; } } } COMMAND_BLOCK: export class BYOKManager { private readonly encryptionKey: Buffer; constructor() { this.encryptionKey = crypto.scryptSync( process.env.BYOK_PASSWORD!, process.env.BYOK_SALT!, 32 ); } async storeUserKey( userId: string, provider: string, apiKey: string, metadata?: KeyMetadata ): Promise<void> { // Validate key before storing const isValid = await this.validateProviderKey(provider, apiKey); if (!isValid) { throw new Error('Invalid API key for provider'); } // Encrypt the key const iv = crypto.randomBytes(16); const cipher = crypto.createCipherGCM('aes-256-gcm', this.encryptionKey); cipher.setAAD(Buffer.from(userId)); // Additional authenticated data let encrypted = cipher.update(apiKey, 'utf8', 'hex'); encrypted += cipher.final('hex'); const authTag = cipher.getAuthTag(); // Store encrypted key await this.dynamoClient.put({ TableName: 'ai-platform-user-keys', Item: { userId, provider, encryptedKey: encrypted, iv: iv.toString('hex'), authTag: authTag.toString('hex'), metadata, createdAt: Date.now(), lastValidated: Date.now(), status: 'active' } }).promise(); } async getUserKey(userId: string, provider: string): Promise<string | null> { const result = await this.dynamoClient.get({ TableName: 'ai-platform-user-keys', Key: { userId, provider } }).promise(); if (!result.Item) return null; const { encryptedKey, iv, authTag } = result.Item; // Decrypt the key const decipher = crypto.createDecipherGCM('aes-256-gcm', this.encryptionKey); decipher.setAAD(Buffer.from(userId)); decipher.setAuthTag(Buffer.from(authTag, 'hex')); let decrypted = decipher.update(encryptedKey, 'hex', 'utf8'); decrypted += decipher.final('utf8'); return decrypted; } private async validateProviderKey(provider: string, apiKey: string): Promise<boolean> { try { switch (provider) { case 'openai': const openai = new OpenAI({ apiKey }); await openai.models.list(); return true; case 'anthropic': const anthropic = new Anthropic({ apiKey }); await anthropic.messages.create({ model: 'claude-3-haiku-20240307', messages: [{ role: 'user', content: 'test' }], max_tokens: 1 }); return true; default: return false; } } catch (error) { return false; } } } COMMAND_BLOCK: export class MonitoringService { async createDashboard(): Promise<void> { await this.cloudwatchClient.putDashboard({ DashboardName: 'ai-platform-production', DashboardBody: JSON.stringify({ widgets: [ { type: 'metric', properties: { metrics: [ ['AWS/Lambda', 'Duration', 'FunctionName', 'ai-platform-gateway'], ['AWS/Lambda', 'Errors', 'FunctionName', 'ai-platform-gateway'], ['AWS/ECS', 'CPUUtilization', 'ServiceName', 'ai-agents'], ['AWS/ECS', 'MemoryUtilization', 'ServiceName', 'ai-agents'] ], period: 300, stat: 'Average', region: 'us-east-1', title: 'Infrastructure Health' } }, { type: 'metric', properties: { metrics: [ ['ai-platform', 'RequestCount'], ['ai-platform', 'TokensProcessed'], ['ai-platform', 'CostPerHour'], ['ai-platform', 'ErrorRate'] ], period: 300, stat: 'Sum', title: 'Business Metrics' } } ] }) }).promise(); } async setupCostAnomalyDetection(): Promise<void> { // Alert if hourly costs exceed 150% of baseline await this.cloudwatchClient.putAnomalyAlarm({ AlarmName: 'ai-platform-cost-anomaly', MetricName: 'CostPerHour', Namespace: 'ai-platform', Statistic: 'Sum', AnomalyDetector: { MetricMathAnomalyDetector: { MetricDataQueries: [ { Id: 'cost_per_hour', MetricStat: { Metric: { MetricName: 'CostPerHour', Namespace: 'ai-platform' }, Period: 3600, Stat: 'Sum' } } ] } }, ComparisonOperator: 'GreaterThanUpperThreshold', EvaluationPeriods: 2, AlarmActions: [process.env.ALERT_SNS_TOPIC] }).promise(); } } Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: export class MonitoringService { async createDashboard(): Promise<void> { await this.cloudwatchClient.putDashboard({ DashboardName: 'ai-platform-production', DashboardBody: JSON.stringify({ widgets: [ { type: 'metric', properties: { metrics: [ ['AWS/Lambda', 'Duration', 'FunctionName', 'ai-platform-gateway'], ['AWS/Lambda', 'Errors', 'FunctionName', 'ai-platform-gateway'], ['AWS/ECS', 'CPUUtilization', 'ServiceName', 'ai-agents'], ['AWS/ECS', 'MemoryUtilization', 'ServiceName', 'ai-agents'] ], period: 300, stat: 'Average', region: 'us-east-1', title: 'Infrastructure Health' } }, { type: 'metric', properties: { metrics: [ ['ai-platform', 'RequestCount'], ['ai-platform', 'TokensProcessed'], ['ai-platform', 'CostPerHour'], ['ai-platform', 'ErrorRate'] ], period: 300, stat: 'Sum', title: 'Business Metrics' } } ] }) }).promise(); } async setupCostAnomalyDetection(): Promise<void> { // Alert if hourly costs exceed 150% of baseline await this.cloudwatchClient.putAnomalyAlarm({ AlarmName: 'ai-platform-cost-anomaly', MetricName: 'CostPerHour', Namespace: 'ai-platform', Statistic: 'Sum', AnomalyDetector: { MetricMathAnomalyDetector: { MetricDataQueries: [ { Id: 'cost_per_hour', MetricStat: { Metric: { MetricName: 'CostPerHour', Namespace: 'ai-platform' }, Period: 3600, Stat: 'Sum' } } ] } }, ComparisonOperator: 'GreaterThanUpperThreshold', EvaluationPeriods: 2, AlarmActions: [process.env.ALERT_SNS_TOPIC] }).promise(); } } COMMAND_BLOCK: export class MonitoringService { async createDashboard(): Promise<void> { await this.cloudwatchClient.putDashboard({ DashboardName: 'ai-platform-production', DashboardBody: JSON.stringify({ widgets: [ { type: 'metric', properties: { metrics: [ ['AWS/Lambda', 'Duration', 'FunctionName', 'ai-platform-gateway'], ['AWS/Lambda', 'Errors', 'FunctionName', 'ai-platform-gateway'], ['AWS/ECS', 'CPUUtilization', 'ServiceName', 'ai-agents'], ['AWS/ECS', 'MemoryUtilization', 'ServiceName', 'ai-agents'] ], period: 300, stat: 'Average', region: 'us-east-1', title: 'Infrastructure Health' } }, { type: 'metric', properties: { metrics: [ ['ai-platform', 'RequestCount'], ['ai-platform', 'TokensProcessed'], ['ai-platform', 'CostPerHour'], ['ai-platform', 'ErrorRate'] ], period: 300, stat: 'Sum', title: 'Business Metrics' } } ] }) }).promise(); } async setupCostAnomalyDetection(): Promise<void> { // Alert if hourly costs exceed 150% of baseline await this.cloudwatchClient.putAnomalyAlarm({ AlarmName: 'ai-platform-cost-anomaly', MetricName: 'CostPerHour', Namespace: 'ai-platform', Statistic: 'Sum', AnomalyDetector: { MetricMathAnomalyDetector: { MetricDataQueries: [ { Id: 'cost_per_hour', MetricStat: { Metric: { MetricName: 'CostPerHour', Namespace: 'ai-platform' }, Period: 3600, Stat: 'Sum' } } ] } }, ComparisonOperator: 'GreaterThanUpperThreshold', EvaluationPeriods: 2, AlarmActions: [process.env.ALERT_SNS_TOPIC] }).promise(); } } CODE_BLOCK: // All secrets in AWS Systems Manager Parameter Store const config = { jwtSecret: await getParameter('/ai-platform/jwt-secret', true), encryptionKey: await getParameter('/ai-platform/encryption-key', true), openaiKey: await getParameter('/ai-platform/openai-key', true) }; Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: // All secrets in AWS Systems Manager Parameter Store const config = { jwtSecret: await getParameter('/ai-platform/jwt-secret', true), encryptionKey: await getParameter('/ai-platform/encryption-key', true), openaiKey: await getParameter('/ai-platform/openai-key', true) }; CODE_BLOCK: // All secrets in AWS Systems Manager Parameter Store const config = { jwtSecret: await getParameter('/ai-platform/jwt-secret', true), encryptionKey: await getParameter('/ai-platform/encryption-key', true), openaiKey: await getParameter('/ai-platform/openai-key', true) }; COMMAND_BLOCK: # CDK construct for secure ECS new ecs.FargateService(this, 'AgentService', { cluster, taskDefinition, vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }, securityGroups: [privateSecurityGroup], assignPublicIp: false }); # Security group: only allows outbound HTTPS privateSecurityGroup.addEgressRule( ec2.Peer.anyIpv4(), ec2.Port.tcp(443), 'HTTPS outbound for API calls' ); Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # CDK construct for secure ECS new ecs.FargateService(this, 'AgentService', { cluster, taskDefinition, vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }, securityGroups: [privateSecurityGroup], assignPublicIp: false }); # Security group: only allows outbound HTTPS privateSecurityGroup.addEgressRule( ec2.Peer.anyIpv4(), ec2.Port.tcp(443), 'HTTPS outbound for API calls' ); COMMAND_BLOCK: # CDK construct for secure ECS new ecs.FargateService(this, 'AgentService', { cluster, taskDefinition, vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }, securityGroups: [privateSecurityGroup], assignPublicIp: false }); # Security group: only allows outbound HTTPS privateSecurityGroup.addEgressRule( ec2.Peer.anyIpv4(), ec2.Port.tcp(443), 'HTTPS outbound for API calls' ); CODE_BLOCK: new wafv2.WebAcl(this, 'ApiWaf', { scope: wafv2.Scope.REGIONAL, rules: [ { name: 'RateLimitRule', priority: 1, action: wafv2.WafAction.block(), statement: wafv2.WafStatement.rateBasedStatement({ limit: 1000, // requests per 5 minutes aggregateKeyType: wafv2.AggregateKeyType.IP }) }, { name: 'GeoRestrictRule', priority: 2, action: wafv2.WafAction.block(), statement: wafv2.WafStatement.geoMatchStatement({ countryCodes: ['CN', 'RU'] // Block certain countries }) } ] }); Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: new wafv2.WebAcl(this, 'ApiWaf', { scope: wafv2.Scope.REGIONAL, rules: [ { name: 'RateLimitRule', priority: 1, action: wafv2.WafAction.block(), statement: wafv2.WafStatement.rateBasedStatement({ limit: 1000, // requests per 5 minutes aggregateKeyType: wafv2.AggregateKeyType.IP }) }, { name: 'GeoRestrictRule', priority: 2, action: wafv2.WafAction.block(), statement: wafv2.WafStatement.geoMatchStatement({ countryCodes: ['CN', 'RU'] // Block certain countries }) } ] }); CODE_BLOCK: new wafv2.WebAcl(this, 'ApiWaf', { scope: wafv2.Scope.REGIONAL, rules: [ { name: 'RateLimitRule', priority: 1, action: wafv2.WafAction.block(), statement: wafv2.WafStatement.rateBasedStatement({ limit: 1000, // requests per 5 minutes aggregateKeyType: wafv2.AggregateKeyType.IP }) }, { name: 'GeoRestrictRule', priority: 2, action: wafv2.WafAction.block(), statement: wafv2.WafStatement.geoMatchStatement({ countryCodes: ['CN', 'RU'] // Block certain countries }) } ] }); - Extract sentiment from reviews - Classify issues - Generate summary reports - API Gateway: $3.50 (1M requests) - Lambda (Gateway): $8.20 (compute + requests) - ECS Fargate: $15.40 (avg 2 tasks running) - DynamoDB: $6.80 (usage tracking + budgets) - Application Load Balancer: $16.20 - CloudWatch: $4.30 - Total Fixed: $54.40/month - OpenAI API: $340-890/month (user-dependent) - Anthropic API: $180-420/month - AWS Bedrock: $45-120/month - Total Variable: User-driven, 2% platform markup - Moved summarization to Claude Haiku: 60% cost reduction - Implemented response caching: 25% fewer API calls - BYOK adoption: 70% of users, zero platform AI costs - Symptom: Tasks consuming 8GB RAM, getting OOM killed - Root cause: Long conversations not being garbage collected - Fix: Added conversation pruning after 50 messages - Prevention: Memory monitoring alerts at 80% usage - Symptom: Budget checks failing, users getting 500 errors - Root cause: Hot partition on userId during peak traffic - Fix: Added requestId to partition key for better distribution - Prevention: DynamoDB on-demand billing mode - Symptom: Users unable to update API keys - Root cause: Validation calling itself recursively - Fix: Separate validation context from normal request context - Prevention: Integration tests for all BYOK flows - Authentication - Who are you? - Authorization - What can you do? - Budget control - How much can you spend?

🏷️ Tags

how-totutorialguidedev.toaiopenaigptsubnetswitch