Tools: Report: How to Use Replicate the Right Way in Your Next.js App (And Ship a Real Product With It)

Tools: Report: How to Use Replicate the Right Way in Your Next.js App (And Ship a Real Product With It)

What Is Replicate, Really?

1. Understand the Prediction Lifecycle Before Writing Any Code

2. Polling vs. Webhooks: Choose the Right Strategy

Polling (simplest, fine for most apps)

Webhooks (better for longer or background tasks)

When to use each

3. Cold Starts Are Real — Here's How to Handle Them

4. Save Outputs Immediately — They Expire in 1 Hour

5. Next.js Config: Don't Forget This

6. Error Handling That Doesn't Suck

7. Rate Limits to Know

8. Choosing the Right Model

Real-World Case Study: Goodbye Watermark

TL;DR — The Patterns That Matter Most tutorials show you how to call Replicate. Few show you how to use it well inside a real production app. This article covers the mistakes I made and the patterns that actually work — using Goodbye Watermark as a real-world case study. Replicate is a cloud API that lets you run AI models — image generation, video, audio, vision — without owning a single GPU. You send an HTTP request, a model runs on their infrastructure, and you get the result back. The business model is pay-per-prediction: you're charged for the time the model actually runs, not idle time. That means cold boots don't affect your cost — only your latency. Every Replicate call creates a prediction — an object with a lifecycle: That last point is critical. If you're not saving outputs immediately, you'll lose them. More on that below. Replicate gives you three ways to handle async predictions: Works well for short-lived predictions (under ~15s). Simple to implement. The tradeoff: you're making repeated requests even when nothing has changed. Replicate POSTs to your URL when the prediction finishes. No polling loop. If there are network issues, they retry automatically. Tip: Add query params to your webhook URL to carry context: When a model hasn't been used recently, it needs to "boot up." This can add several seconds of latency on the first request after idle time. For casual traffic: Cold boots are fine. You only pay for actual compute, not boot time. For production apps with consistent traffic: Use a Deployment with minInstances: 1: This costs more (you're paying to keep the instance warm) but eliminates cold start latency entirely. For Goodbye Watermark, I don't use a deployment because the traffic is spread across the day and a few seconds of latency on first boot is acceptable. But if you're building something with strict SLA requirements — use deployments. This is the gotcha that trips up everyone: Input and output files are automatically deleted after 1 hour for any predictions created through the API. If your app doesn't save the result right after succeeded, it's gone. Your options: Option A: Stream back to the client immediately Option B: Save to your own storage (Supabase Storage, S3, etc.) For Goodbye Watermark, I stream the result directly back to the client. The user downloads it immediately. No storage needed, no expiry problem. If you're displaying output images from Replicate in a Next.js <Image> component, add this to your config or you'll get a domain error: Small thing, but it will bite you in production. Real-world Replicate usage needs to handle: Set your own deadline. Replicate's hard limit is 30 minutes, but your users don't want to wait more than ~60 seconds for most tasks. From Replicate's docs: For most indie apps, you won't hit these. If you do, they return a 429 — build retry logic with exponential backoff. Replicate hosts thousands of models. Two categories matter: Official models — maintained by Replicate, always warm, stable API, predictable per-output pricing. Best for production use. Community models — more variety, charged by compute time, may have cold starts, API can change between versions. For Goodbye Watermark, I use the Qwen model for watermark removal. The choice came down to output quality and how well it handled semi-transparent watermarks — which are significantly harder than solid text watermarks. Testing a few models on realistic samples before committing to one is worth the extra hour. Goodbye Watermark is an AI watermark removal tool built with Next.js + Replicate + Vercel. The full stack is: The entire MVP was built in ~1 hour. The hardest part wasn't the UI — it was getting consistent output quality from the model across different watermark types. Replicate made the difference. Running my own GPU inference would have added weeks of setup and ongoing ops overhead. Instead, I spent that time on the UX and monetization. Replicate is genuinely one of the best tools for indie developers shipping AI products fast. Use it well and you can build something real in a weekend. Built something with Replicate? Drop it in the comments — always curious to see what people are shipping. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or

Code Block

Copy

starting → processing → succeeded (or failed / canceled) CODE_BLOCK: starting → processing → succeeded (or failed / canceled) CODE_BLOCK: starting → processing → succeeded (or failed / canceled) COMMAND_BLOCK: // Create the prediction const prediction = await replicate.predictions.create({ model: "owner/model-name", input: { image: imageUrl }, }); // Poll until done let result = prediction; while (result.status !== "succeeded" && result.status !== "failed") { await new Promise((r) => setTimeout(r, 1000)); result = await replicate.predictions.get(result.id); } COMMAND_BLOCK: // Create the prediction const prediction = await replicate.predictions.create({ model: "owner/model-name", input: { image: imageUrl }, }); // Poll until done let result = prediction; while (result.status !== "succeeded" && result.status !== "failed") { await new Promise((r) => setTimeout(r, 1000)); result = await replicate.predictions.get(result.id); } COMMAND_BLOCK: // Create the prediction const prediction = await replicate.predictions.create({ model: "owner/model-name", input: { image: imageUrl }, }); // Poll until done let result = prediction; while (result.status !== "succeeded" && result.status !== "failed") { await new Promise((r) => setTimeout(r, 1000)); result = await replicate.predictions.get(result.id); } CODE_BLOCK: const prediction = await replicate.predictions.create({ model: "owner/model-name", input: { image: imageUrl }, webhook: `${process.env.VERCEL_URL}/api/webhooks`, webhook_events_filter: ["completed"], // only fire when done }); CODE_BLOCK: const prediction = await replicate.predictions.create({ model: "owner/model-name", input: { image: imageUrl }, webhook: `${process.env.VERCEL_URL}/api/webhooks`, webhook_events_filter: ["completed"], // only fire when done }); CODE_BLOCK: const prediction = await replicate.predictions.create({ model: "owner/model-name", input: { image: imageUrl }, webhook: `${process.env.VERCEL_URL}/api/webhooks`, webhook_events_filter: ["completed"], // only fire when done }); CODE_BLOCK: https://yourapp.com/api/webhooks?userId=abc123&predictionType=watermark CODE_BLOCK: https://yourapp.com/api/webhooks?userId=abc123&predictionType=watermark CODE_BLOCK: https://yourapp.com/api/webhooks?userId=abc123&predictionType=watermark CODE_BLOCK: // Via the Replicate dashboard or API: // Create a deployment for your model with min_instances = 1 // This keeps the model warm 24/7 CODE_BLOCK: // Via the Replicate dashboard or API: // Create a deployment for your model with min_instances = 1 // This keeps the model warm 24/7 CODE_BLOCK: // Via the Replicate dashboard or API: // Create a deployment for your model with min_instances = 1 // This keeps the model warm 24/7 CODE_BLOCK: // Next.js API route export async function GET(request: Request) { const output = await replicate.run("owner/model", { input }); return new Response(output); // stream back to client } CODE_BLOCK: // Next.js API route export async function GET(request: Request) { const output = await replicate.run("owner/model", { input }); return new Response(output); // stream back to client } CODE_BLOCK: // Next.js API route export async function GET(request: Request) { const output = await replicate.run("owner/model", { input }); return new Response(output); // stream back to client } CODE_BLOCK: const output = await replicate.run("owner/model", { input }); const response = await fetch(output[0]); // download from Replicate const buffer = await response.arrayBuffer(); await supabase.storage.from("outputs").upload(`${userId}/${id}.png`, buffer); CODE_BLOCK: const output = await replicate.run("owner/model", { input }); const response = await fetch(output[0]); // download from Replicate const buffer = await response.arrayBuffer(); await supabase.storage.from("outputs").upload(`${userId}/${id}.png`, buffer); CODE_BLOCK: const output = await replicate.run("owner/model", { input }); const response = await fetch(output[0]); // download from Replicate const buffer = await response.arrayBuffer(); await supabase.storage.from("outputs").upload(`${userId}/${id}.png`, buffer); CODE_BLOCK: // next.config.ts const nextConfig = { images: { remotePatterns: [ { protocol: "https", hostname: "replicate.delivery", }, { protocol: "https", hostname: "*.replicate.delivery", }, ], }, }; CODE_BLOCK: // next.config.ts const nextConfig = { images: { remotePatterns: [ { protocol: "https", hostname: "replicate.delivery", }, { protocol: "https", hostname: "*.replicate.delivery", }, ], }, }; CODE_BLOCK: // next.config.ts const nextConfig = { images: { remotePatterns: [ { protocol: "https", hostname: "replicate.delivery", }, { protocol: "https", hostname: "*.replicate.delivery", }, ], }, }; COMMAND_BLOCK: try { const prediction = await replicate.predictions.create({ ... }); if (prediction?.error) { return NextResponse.json({ error: prediction.error }, { status: 500 }); } // poll with timeout safety let result = prediction; const deadline = Date.now() + 60_000; // 60s max wait while (result.status !== "succeeded" && result.status !== "failed") { if (Date.now() > deadline) { return NextResponse.json({ error: "Prediction timed out" }, { status: 504 }); } await new Promise((r) => setTimeout(r, 1500)); result = await replicate.predictions.get(result.id); } if (result.status === "failed") { return NextResponse.json({ error: "Model failed" }, { status: 500 }); } return NextResponse.json({ output: result.output }); } catch (err) { return NextResponse.json({ error: "Unexpected error" }, { status: 500 }); } COMMAND_BLOCK: try { const prediction = await replicate.predictions.create({ ... }); if (prediction?.error) { return NextResponse.json({ error: prediction.error }, { status: 500 }); } // poll with timeout safety let result = prediction; const deadline = Date.now() + 60_000; // 60s max wait while (result.status !== "succeeded" && result.status !== "failed") { if (Date.now() > deadline) { return NextResponse.json({ error: "Prediction timed out" }, { status: 504 }); } await new Promise((r) => setTimeout(r, 1500)); result = await replicate.predictions.get(result.id); } if (result.status === "failed") { return NextResponse.json({ error: "Model failed" }, { status: 500 }); } return NextResponse.json({ output: result.output }); } catch (err) { return NextResponse.json({ error: "Unexpected error" }, { status: 500 }); } COMMAND_BLOCK: try { const prediction = await replicate.predictions.create({ ... }); if (prediction?.error) { return NextResponse.json({ error: prediction.error }, { status: 500 }); } // poll with timeout safety let result = prediction; const deadline = Date.now() + 60_000; // 60s max wait while (result.status !== "succeeded" && result.status !== "failed") { if (Date.now() > deadline) { return NextResponse.json({ error: "Prediction timed out" }, { status: 504 }); } await new Promise((r) => setTimeout(r, 1500)); result = await replicate.predictions.get(result.id); } if (result.status === "failed") { return NextResponse.json({ error: "Model failed" }, { status: 500 }); } return NextResponse.json({ output: result.output }); } catch (err) { return NextResponse.json({ error: "Unexpected error" }, { status: 500 }); } - starting: model is booting (cold start happens here) - processing: predict() is actively running - succeeded: output is ready — but files are deleted after 1 hour - Predictions take more than ~10-15 seconds - You want to persist results to a database - You're building background processing flows - Network timeouts - Model errors (bad input format, unsupported file type) - Rate limits (429) - Prediction timeouts (30 min hard cap) - Create prediction: 600 requests/minute - All other endpoints: 3000 requests/minute - Frontend: Next.js + Tailwind CSS - AI: Replicate (Qwen model) - Hosting: Vercel - Payments: Stripe (two credit tiers) - ~150 weekly organic users - $0 paid acquisition - Zero infrastructure management - Understand the prediction lifecycle — especially the 1-hour file expiry - Use polling for short tasks, webhooks for long/background ones - Use Deployments if cold start latency is a problem for your UX - Save or stream outputs immediately after succeeded - Add replicate.delivery to your Next.js image domains - Set your own deadline — don't wait 30 minutes for a user-facing request - Test multiple models before committing — quality varies significantly