Tools: Building Clusterflick: A London Cinema Aggregator

Tools: Building Clusterflick: A London Cinema Aggregator

Source: Dev.to

I've been working on a personal project called Clusterflick — a single source for every movie showing across London. Right now it's tracking 240 venues across 5 event platforms, currently pulling in 1,398 events and over 30,000 showings. It started simply enough: I just wanted cinema times on my calendar. But it quickly spiralled into a full data pipeline running on GitHub Actions, a statically generated Next.js site, and a cluster of Raspberry Pis in my living room. Some of the most interesting challenges so far: The whole project is open source on GitHub. If any of this sounds interesting, I'd love to hear from others working on similar scraping/aggregation/data pipeline projects. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - Movie matching is deceptively hard. You'd think title + year would uniquely identify a film. It doesn't. Neither does title + director. Sometimes cinema listings don't even give you enough to identify a movie as a human.
- Scraping at scale without a budget. GitHub runner IPs get blocked, so now there's a Raspberry Pi cluster handling the tricky ones.
- Using LLMs for data quality. When fuzzy matching falls short, LLMs have been surprisingly useful for resolving ambiguous movie lookups against The Movie DB.
- Keeping it cheap. The whole thing runs on near-zero infrastructure costs — GitHub Actions for orchestration, Releases as storage, static site generation to avoid hosting costs. - Location London
- Pronouns he/him
- Joined Feb 22, 2017