Tools: Update: Why we self-hosted Open-Meteo: AI crawlers, rate limits, and 100 ms we didn't expect to win

Tools: Update: Why we self-hosted Open-Meteo: AI crawlers, rate limits, and 100 ms we didn't expect to win

The setup

Then the crawlers showed up

We tried the paid tier first

Self-hosting Open-Meteo

What the box actually looks like

The latency win

What we didn't migrate

The gotchas

Was it worth it? TL;DR — We run uvi.today (UV index), pollen.today (pollen forecast) and airindex.today (air quality). All three pull from Open-Meteo. AI crawlers pushed us past the free tier, the paid tier worked but still wasn't a fit for crawler-heavy traffic, so we ended up self-hosting Open-Meteo on a single VPS. Disk: ~50 GB. Latency: 90–100 ms faster per request. Cost: less than the paid plan. This is a short write-up of how we got there, what the migration actually looked like, and a couple of things we'd flag for anyone thinking about doing the same. The three sites are simple Next.js apps. Each city page renders server-side and calls Open-Meteo for two datasets: We cache responses in an LRU on the Node side (coord-keyed, 60 min TTL) and that's it. No queue, no warm cache jobs, no background workers. For the first months, the free public Open-Meteo API was perfect. The free tier is generous: Search Console traffic was modest. Logs were not. Once the sites started ranking on long-tail queries, every AI crawler on the planet decided that a city-by-locale URL grid was an irresistible buffet. In our access logs we kept seeing the same User-Agents on tight loops: The peak we measured before blocking them was around 15,000 requests per hour from a single bot. On a sister project that took longer to get blocking right, we saw bursts close to 200k/day. None of these bots respect any kind of "please slow down". They either get a 200, a 429, or a 403 — pick one. We picked 403, eventually. But before we did, the public Open-Meteo API started returning 429s during peaks, and our pages started erroring out for real users. The math is brutal: 100 cities × 9 locales = 900 cacheable URLs. With a 60 minute cache TTL that's 900 origin requests per hour worst case for our own users. Add a single misbehaving crawler that ignores cache headers and asks for ?lat=…&lon=… with random rounding, and the cache hit rate collapses. We were burning through 10,000 calls/day in a few hours. The Standard plan removes the per-minute, per-hour and per-day caps and gives you 1,000,000 calls per month on a customer-api.open-meteo.com host. Switching is one env var: Open-Meteo are upfront in their FAQ that monthly limits aren't being enforced yet — they're still building the usage portal — so in practice the Standard plan is "soft 1M/month, dedicated servers, commercial use OK". This solved the rate-limit problem immediately. It did not solve two other things: This is the part that surprised us: it is genuinely easy. Open-Meteo publishes a single Docker image (ghcr.io/open-meteo/open-meteo) that does both jobs: You run one serve and as many sync workers as you have models you care about. Each sync job re-runs on an interval (--repeat-interval 5 = every 5 minutes) and stores the last N days of past data (--past-days 3). For us, the relevant compose file looks roughly like this: We sync six model groups in total: Application-side change is one line: set OPENMETEO_HOST=http://open-meteo:8080 and the existing client code routes there instead of the public API. No query rewriting needed — that's the nice part of Open-Meteo's design. Real numbers from a single VPS (8 vCPU, 16 GB RAM, 150 GB disk) running everything — three sites, Caddy, an IP-geo service, monitoring, and the full Open-Meteo stack: This is a quieter footprint than we expected. Open-Meteo's storage format (here's their write-up) is a custom layout designed for exactly this kind of mmap-friendly point lookups, and you can feel that in the metrics. We weren't optimising for this — we just wanted the rate limits gone — but it turned into the most visible result. Measured per-call upstream latency from our app container to Open-Meteo: Per page render that's roughly 90–100 ms shaved off, twice (forecast + CAMS), most of it serial. For a server-rendered Next.js page that has to land HTML before the browser can paint, this is meaningful — we saw it directly in our TTFB numbers. Not everything makes sense to host yourself: A few things to know before you copy the docker-compose: For our shape of traffic — small site, three domains sharing the same upstream, lots of automated traffic — yes, comfortably: Written by the team behind uvi.today, pollen.today and airindex.today. We post the engineering side on X — @SimpleMeteo. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

# .env OPENMETEO_API_KEY=... # .env OPENMETEO_API_KEY=... # .env OPENMETEO_API_KEY=... // src/lib/uv-api.ts (excerpt) const omHost = process.env.OPENMETEO_HOST; const apiKey = !omHost ? process.env.OPENMETEO_API_KEY : undefined; const CAMS_BASE = omHost ? `${omHost}/v1/air-quality` : apiKey ? 'https://customer-air-quality-api.open-meteo.com/v1/air-quality' : 'https://air-quality-api.open-meteo.com/v1/air-quality'; // src/lib/uv-api.ts (excerpt) const omHost = process.env.OPENMETEO_HOST; const apiKey = !omHost ? process.env.OPENMETEO_API_KEY : undefined; const CAMS_BASE = omHost ? `${omHost}/v1/air-quality` : apiKey ? 'https://customer-air-quality-api.open-meteo.com/v1/air-quality' : 'https://air-quality-api.open-meteo.com/v1/air-quality'; // src/lib/uv-api.ts (excerpt) const omHost = process.env.OPENMETEO_HOST; const apiKey = !omHost ? process.env.OPENMETEO_API_KEY : undefined; const CAMS_BASE = omHost ? `${omHost}/v1/air-quality` : apiKey ? 'https://customer-air-quality-api.open-meteo.com/v1/air-quality' : 'https://air-quality-api.open-meteo.com/v1/air-quality'; services: open-meteo: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] expose: ["8080"] command: ["serve"] sync-dwd: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] command: - sync - dwd_icon,dwd_icon_eu,dwd_icon_d2 - temperature_2m,relative_humidity_2m,weather_code,cloud_cover,precipitation,shortwave_radiation - --past-days=3 - --repeat-interval=5 sync-cams: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] command: - sync - cams_global,cams_europe - uv_index,uv_index_clear_sky,pm10,pm2_5,ozone,alder_pollen,birch_pollen,grass_pollen,ragweed_pollen - --past-days=3 - --repeat-interval=5 volumes: open-meteo-data: services: open-meteo: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] expose: ["8080"] command: ["serve"] sync-dwd: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] command: - sync - dwd_icon,dwd_icon_eu,dwd_icon_d2 - temperature_2m,relative_humidity_2m,weather_code,cloud_cover,precipitation,shortwave_radiation - --past-days=3 - --repeat-interval=5 sync-cams: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] command: - sync - cams_global,cams_europe - uv_index,uv_index_clear_sky,pm10,pm2_5,ozone,alder_pollen,birch_pollen,grass_pollen,ragweed_pollen - --past-days=3 - --repeat-interval=5 volumes: open-meteo-data: services: open-meteo: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] expose: ["8080"] command: ["serve"] sync-dwd: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] command: - sync - dwd_icon,dwd_icon_eu,dwd_icon_d2 - temperature_2m,relative_humidity_2m,weather_code,cloud_cover,precipitation,shortwave_radiation - --past-days=3 - --repeat-interval=5 sync-cams: image: ghcr.io/open-meteo/open-meteo volumes: [open-meteo-data:/app/data] command: - sync - cams_global,cams_europe - uv_index,uv_index_clear_sky,pm10,pm2_5,ozone,alder_pollen,birch_pollen,grass_pollen,ragweed_pollen - --past-days=3 - --repeat-interval=5 volumes: open-meteo-data: - The forecast API for temperature, weather code, humidity, etc. - The CAMS air quality API for UV index, AQI components, and pollen. - Meta-ExternalAgent - Crawler load is wasteful spend. Even if the limit isn't enforced, paying for traffic that produces no revenue (and no useful index entry on most platforms) is irritating. - Latency. Every page render fans out to two API hosts in another datacenter. We were measuring p50 around 180–220 ms per upstream call from our box. CAMS pollen + forecast = two of those, mostly serial. - serve — runs the API on port 8080. Same query syntax as the public API. - sync <model> <variables> — pulls the latest model run from the upstream provider (DWD, NOAA, ECMWF, MET Norway, …) and writes it to a shared volume. - DWD ICON (11 km global, 7 km EU, 2 km Central EU) - NCEP (GFS 13 km global, HRRR 3 km CONUS) - ECMWF IFS 25 km — long-range - MET Norway / UKMO / BOM / CMC for regional accuracy - CAMS global + Europe — UV, AQI, pollen - A one-off copernicus_dem90 sync for elevation data (~10 GB, runs once) - Disk used by Open-Meteo data: ~50 GB and stable. The DEM is the largest one-time cost (~10 GB). The rolling weather data stays bounded by --past-days. - open-meteo serve container at steady state: ~1.1 GiB RAM, ~4 % of one core. Model files are mmapped, so the kernel page cache does most of the work — it's why free -h shows ~13 GiB sitting in buff/cache. - Sync workers: burst CPU when a new model run lands (every 1–6 hours depending on model), idle the rest of the time. - Initial sync: 1–2 hours for the first run. This is the only painful step. - Public API (api.open-meteo.com): ~100-110 ms - Customer API (customer-api.open-meteo.com): comparable, slightly more consistent - Local container (http://open-meteo:8080): ~10 ms - Geocoding (geocoding-api.open-meteo.com). It's a separate -weight: 500;">service with its own dataset; we kept it on the public API and put a 1-hour LRU cache in front of it. - Historical / climate / ensemble APIs. We don't use them. If you do, note that the Standard plan also doesn't include them — that's a Professional-tier thing. - Marine / flood APIs. Same — out of scope for us. - Pick variables deliberately. Each sync command takes an explicit list of variables. Adding a variable later means re-syncing — don't be too minimal at first. - Disk growth is mostly the DEM. The rolling weather data stays small if --past-days is small. Set this honestly — we use 3. - There is no built-in API key / rate limit on the local instance. It binds to a private Docker network in our case; if you expose it to the internet, put a reverse proxy with auth or a rate limiter in front. - Crawlers will still hit your app. Self-hosting Open-Meteo doesn't solve the crawler problem — it just stops the crawler problem from cascading into a third-party rate-limit problem. - Attribution still applies. Open-Meteo's data is CC BY 4.0; you keep crediting the underlying data sources (DWD, NOAA, ECMWF, CAMS, …) regardless of how you host it. - Capacity: effectively unbounded for our scale. We can let crawlers through if we ever change our mind without watching a meter. - Latency: ~10 ms per upstream call, twice per render. - Cost: one VPS that we already had, instead of a per-domain subscription. - Operational risk: lower than expected. The image is one container, the syncs are independent, and a failed sync just means stale-but-still-served data for that model.