Tools: Building Clusterflick: A London Cinema Aggregator
I've been working on a personal project called Clusterflick — a single source for every movie showing across London. Right now it's tracking 240 venues across 5 event platforms, currently pulling in 1,398 events and over 30,000 showings. It started simply enough: I just wanted cinema times on my calendar. But it quickly spiralled into a full data pipeline running on GitHub Actions, a statically generated Next.js site, and a cluster of Raspberry Pis in my living room. Some of the most interesting challenges so far: The whole project is open source on GitHub. If any of this sounds interesting, I'd love to hear from others working on similar scraping/aggregation/data pipeline projects. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or - Movie matching is deceptively hard. You'd think title + year would uniquely identify a film. It doesn't. Neither does title + director. Sometimes cinema listings don't even give you enough to identify a movie as a human.
- Scraping at scale without a budget. GitHub runner IPs get blocked, so now there's a Raspberry Pi cluster handling the tricky ones.
- Using LLMs for data quality. When fuzzy matching falls short, LLMs have been surprisingly useful for resolving ambiguous movie lookups against The Movie DB.
- Keeping it cheap. The whole thing runs on near-zero infrastructure costs — GitHub Actions for orchestration, Releases as storage, static site generation to avoid hosting costs. - Location London
- Pronouns he/him
- Joined Feb 22, 2017