Tools: Per-user cost attribution for your AI APP - Full Analysis

Tools: Per-user cost attribution for your AI APP - Full Analysis

Approach 1: Wrap your provider client (5 minutes)

Approach 2: OpenTelemetry telemetry metadata (Vercel AI SDK)

Approach 3: Raw event emission (autonomous bots / non-HTTP)

What you can answer once userId is in your tags

A note on GDPR / multi-tenant safety

Wrapping up You ship your AI feature. It works. A week later your OpenAI bill is $400 and you have no idea which of your users caused which $0.05. This is the single most underrated metric in production LLM apps — cost per end-user — and it's surprisingly easy to instrument if you know what to do. Here are the three approaches I've found work in practice, ranked by setup time. Works for Express, Next.js Route Handlers, Fastify — anything that has a single OpenAI or Anthropic client instance. The trick is withTrace({ tags: { userId } }) at the request boundary. Every LLM call inside the block — direct or nested — inherits those tags automatically via AsyncLocalStorage. You don't have to thread userId through every function. Pros: simplest. Pros: works with both OpenAI and Anthropic the same way. Cons: requires you to use the dedicated wrapper SDKs. If you're on the Vercel AI SDK, experimental_telemetry.metadata is the equivalent hook: This lifts onto ai.telemetry.metadata.<key> span attributes that any OpenTelemetry-compatible observability tool (Langfuse, Phoenix, Voight, Braintrust, Datadog) picks up. Pros: zero coupling — pure OTel, swap exporters whenever. Cons: only works if your SDK emits OTel spans. AI SDK does. Many others don't yet. For background workers, agents calling LLMs in loops, or anything that doesn't have a request boundary — emit events manually: This is more code per call, but you control everything. Useful when the LLM call doesn't fit cleanly inside a wrapper (e.g. you're proxying through your own router). Pros: full control over what gets emitted. Cons: more boilerplate. You're responsible for token counting. Once tags.userId (or whatever you name it) is on every event, the questions you can answer change shape: You don't need a separate analytics SDK on the client. You don't need to copy userId into LLM messages. You don't need anything custom on top — the tags propagate from the request boundary down to every span. userId here means your internal stable identifier — user_a3f9c2 or whatever — not the user's email or wallet. Never put PII into telemetry metadata. The good observability tools scrub PII anyway, but garbage-in is still garbage. For multi-tenant SaaS, add a second tag: tags: { userId, tenantId }. That way you can ask both "which customer is this?" and "which of their users?". Three approaches, one mental model: stamp userId at the boundary, let it propagate to every LLM call inside the request. The wrappers I used here are Apache 2.0: Same approach works with Langfuse, Phoenix, Braintrust, or your existing OTel pipeline — the metadata.userId pattern is the universal part. How do you currently track per-user spend in your AI app? Stripe metering? Server logs? Or have you been flying blind? Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or

Command

Copy

$ import OpenAI from 'openai' import { wrapOpenAI, withTrace } from '@voightxyz/openai' const openai = wrapOpenAI(new OpenAI(), { agent: 'production-chat-api', }) app.post('/api/chat', async (req, res) => { await withTrace( async () => { const r = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: req.body.messages, }) res.json({ reply: r.choices[0].message }) }, { routeTag: 'POST /api/chat', tags: { userId: req.user.id, plan: req.user.plan, }, }, ) }) COMMAND_BLOCK: import OpenAI from 'openai' import { wrapOpenAI, withTrace } from '@voightxyz/openai' const openai = wrapOpenAI(new OpenAI(), { agent: 'production-chat-api', }) app.post('/api/chat', async (req, res) => { await withTrace( async () => { const r = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: req.body.messages, }) res.json({ reply: r.choices[0].message }) }, { routeTag: 'POST /api/chat', tags: { userId: req.user.id, plan: req.user.plan, }, }, ) }) COMMAND_BLOCK: import OpenAI from 'openai' import { wrapOpenAI, withTrace } from '@voightxyz/openai' const openai = wrapOpenAI(new OpenAI(), { agent: 'production-chat-api', }) app.post('/api/chat', async (req, res) => { await withTrace( async () => { const r = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: req.body.messages, }) res.json({ reply: r.choices[0].message }) }, { routeTag: 'POST /api/chat', tags: { userId: req.user.id, plan: req.user.plan, }, }, ) }) CODE_BLOCK: import { openai } from '@ai-sdk/openai' import { streamText } from 'ai' export async function POST(req: Request) { const result = streamText({ model: openai('gpt-4o-mini'), prompt: (await req.json()).prompt, experimental_telemetry: { isEnabled: true, metadata: { userId: session.user.id, plan: session.user.plan, }, }, }) return result.toAIStreamResponse() } CODE_BLOCK: import { openai } from '@ai-sdk/openai' import { streamText } from 'ai' export async function POST(req: Request) { const result = streamText({ model: openai('gpt-4o-mini'), prompt: (await req.json()).prompt, experimental_telemetry: { isEnabled: true, metadata: { userId: session.user.id, plan: session.user.plan, }, }, }) return result.toAIStreamResponse() } CODE_BLOCK: import { openai } from '@ai-sdk/openai' import { streamText } from 'ai' export async function POST(req: Request) { const result = streamText({ model: openai('gpt-4o-mini'), prompt: (await req.json()).prompt, experimental_telemetry: { isEnabled: true, metadata: { userId: session.user.id, plan: session.user.plan, }, }, }) return result.toAIStreamResponse() } COMMAND_BLOCK: import { Voight } from '@voightxyz/sdk' const voight = new Voight({ agentId: 'my-bot' }) const t0 = Date.now() const res = await fetch('https://api.openai.com/v1/chat/completions', { method: 'POST', headers: { authorization: `Bearer ${process.env.OPENAI_API_KEY}` }, body: JSON.stringify({ model: 'gpt-4o-mini', messages: [...], }), }).then((r) => r.json()) voight.log({ type: 'reasoning', model: 'gpt-4o-mini', durationMs: Date.now() - t0, outcome: 'success', metadata: { tokens: { input: res.usage.prompt_tokens, output: res.usage.completion_tokens, }, tags: { userId: job.userId, tenantId: job.tenantId, }, }, }) COMMAND_BLOCK: import { Voight } from '@voightxyz/sdk' const voight = new Voight({ agentId: 'my-bot' }) const t0 = Date.now() const res = await fetch('https://api.openai.com/v1/chat/completions', { method: 'POST', headers: { authorization: `Bearer ${process.env.OPENAI_API_KEY}` }, body: JSON.stringify({ model: 'gpt-4o-mini', messages: [...], }), }).then((r) => r.json()) voight.log({ type: 'reasoning', model: 'gpt-4o-mini', durationMs: Date.now() - t0, outcome: 'success', metadata: { tokens: { input: res.usage.prompt_tokens, output: res.usage.completion_tokens, }, tags: { userId: job.userId, tenantId: job.tenantId, }, }, }) COMMAND_BLOCK: import { Voight } from '@voightxyz/sdk' const voight = new Voight({ agentId: 'my-bot' }) const t0 = Date.now() const res = await fetch('https://api.openai.com/v1/chat/completions', { method: 'POST', headers: { authorization: `Bearer ${process.env.OPENAI_API_KEY}` }, body: JSON.stringify({ model: 'gpt-4o-mini', messages: [...], }), }).then((r) => r.json()) voight.log({ type: 'reasoning', model: 'gpt-4o-mini', durationMs: Date.now() - t0, outcome: 'success', metadata: { tokens: { input: res.usage.prompt_tokens, output: res.usage.completion_tokens, }, tags: { userId: job.userId, tenantId: job.tenantId, }, }, }) - @voightxyz/openai for OpenAI - @voightxyz/anthropic for Anthropic - @voightxyz/vercel-ai for the Vercel AI SDK - @voightxyz/sdk for library mode