Tools
Tools: Automating Trend Research: How I Built a Pipeline to Track What People Are Saying
2026-02-27
0 views
admin
What This Pipeline Actually Does ## The Tools: All Free, All Local ## Querying Hacker News: The Algolia API ## Querying Reddit: JSON Without OAuth ## Fetching Summaries: The Fetcher Problem ## Writing Commentary That Doesn't Sound Like a Bot ## Organizing Results: Episodic + Semantic ## Email Delivery: ProtonMail CLI > Gmail ## Extending the Workflow: Optional Bluesky Promotion ## Running It on a Schedule ## Why Not Just Use a Third-Party Service? ## Challenges and Gotchas ## What I Use These Reports For ## Make Your Own I used to spend hours every week manually checking Hacker News and Reddit for trending topics in my niches. Open a tab, search, scroll, copy links, summarize in a doc… repeat. It was mind-numbing and inconsistent. Then I built the Research & Trend Report Workflow—a fully automated pipeline that scrapes the internet's best discussion hubs, compiles a curated report with my own commentary, and delivers it to my inbox. This thing has transformed how I stay on top of trends. And the best part? It's all built with simple tools (PowerShell, Python scripts, public APIs) and runs on a schedule. No paid services, no complex infrastructure. Let me show you how it works. Every time it runs (I have it set to weekly, but it can be ad-hoc too), here's the flow: The output is a clean, readable markdown file that looks like this: I'm not using any paid APIs or cloud services. Everything runs on my Windows machine: The whole orchestration is a PowerShell script that calls these tools in sequence. It's not fancy, but it gets the job done. Hacker News provides a fantastic search API via Algolia. Here's the PowerShell snippet I use: The key insight: Algolia uses Unix timestamps for numericFilters, so you need to convert dates properly. Also, you can combine multiple keyword searches but must de-duplicate URLs afterward. Reddit's JSON API is refreshingly simple. For a subreddit and keyword: restrict_sr=on keeps results within the subreddit (no r/all). I sort by new to get recent posts. The JSON structure is straightforward—data.children is an array of posts, each with a .data payload. Here's where it gets tricky. Some article URLs are behind paywalls, require JavaScript, or block automated requests. My approach: The key is having multiple fallbacks. I've found that web_fetch works for about 60% of sites, browser gets another 30%, and the remaining 10% are just inaccessible (looking at you, major news sites with bot detection). A typical summary extraction: This is where the Humanizer skill pays off. Initially, my commentary was awful: "This post highlights the enduring appeal of classic metroidvanias. The high engagement suggests strong community interest." Yawn. Now I force myself to: Example transformation: "This discussion demonstrates the sustained cultural relevance of Super Metroid. The high engagement metrics indicate strong community interest in retro gaming classics." "This kind of post surfaces periodically and always sparks huge engagement. It's not just nostalgia—Super Metroid literally defined the genre template. The fact that a simple 'must have been incredible' prompt draws over a thousand upvotes? That tells you something." See the difference? One sounds like a research paper, the other sounds like someone who actually cares about games talking. When the report is complete, I save it to two places: Episodic: memory/episodic/2026-02-25-research-retro-metroidvania.md
(Full dated report with all findings, summaries, commentary) Semantic index: Append to memory/semantic/research-reports-index.md: This dual storage means: I initially tried sending these reports via Gmail SMTP, but hit rate limits and spam filters with longer content (these reports can be 5-10KB of text). ProtonMail CLI handles large bodies reliably, though there's a catch: external delivery to Gmail can take up to 24 hours. But here's the trick: I don't need instant delivery. I run the report in the morning, it arrives in my inbox by evening. That's fine—I'm not waiting on it. The reliability trade-off is worth it. The pmail send command reads the message body from stdin when -b is omitted. Simple, no temporary files needed. If the research uncovers broadly interesting findings (like the Super Metroid engagement numbers), I'll create a Bluesky post to drive traffic: I only do this for reports with genuinely shareable takeaways. Not every research batch needs promotion. Right now I trigger this manually or via cron (or in OpenClaw, via scheduled tasks). The script is research-and-trend-report-workflow.ps1 and takes parameters: I'll probably set up a weekly run soon—every Monday morning, generate last week's trends, land in my inbox by Monday evening. That way I'm always in the loop without lifting a finger. You could use tools like Brandwatch, Talkwalker, or even Google Alerts. But: For a hobbyist or indie blogger, this DIY approach is more than capable. The quality of results from HN/Reddit is already excellent—you don't need a $500/mo social listening platform to get the pulse of the tech/gaming community. Reddit rate limits: Their JSON API is generous but not unlimited. I keep requests to 10 per subreddit per run and add delays between calls (1 second). So far no issues. Paywalls and bot detection: Some sites (looking at you, major news outlets) block non-browser requests. I've learned to recognize the patterns and fall back gracefully. The report still works without those summaries. Email deliverability: ProtonMail to Gmail can be slow (up to 24h). I've thought about switching to AgentMail for instant delivery, but their API has size limits. For now, the delay is acceptable. Keyword noise: Searching "nes" also returns surveillance camera posts (Nest). I filter by domain or add negative keywords (-nest -nests) to clean results. Humanizing commentary: This is the hardest part to automate. I still write the commentary myself (with humanizer assist) because I want the reports to have my voice and opinions. Could I fine-tune a model to write like me? Maybe down the road. For now, it's a 15-minute manual step that makes the reports actually useful. The first report I generated (retro metroidvania) immediately surfaced three blog post ideas. That's ROI right there. The workflow script lives in memory/procedural/research-and-trend-report-workflow.md. It's a PowerShell file with embedded Python or calls to external tools depending on your setup. The key pieces are: You don't need my exact stack—any language that can make HTTP requests and write files will work. The pattern is what matters: If you're a blogger, journalist, or just someone who wants to stay on top of niche topics without spending hours a week, this is a solid foundation. Feel free to adapt it, share your version, or drop questions in the comments. This post is part of my dev-to-diaries series documenting the automation and tooling behind my blogging workflow. See the whole series at https://dev.to/retrorom/series/35977 Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
# Trend Report: Retro Metroidvania Games
*Generated: Wednesday, February 25, 2026*
*Research period: Last 30 days* ## Executive Summary
The retro metroidvania community is buzzing with two major conversations:
1. Nostalgia for classics—especially Super Metroid—continues to drive massive engagement.
2. Industry loss: The passing of Shutaro Ida sparked heartfelt tributes.
... ## Top Findings
### 1. Super Metroid: A Legacy That Endures
**Source:** Reddit r/retrogaming
**URL:** https://www.reddit.com/...
**Stats:** 1,124 upvotes | 356 comments
**Summary:** [2-3 sentence summary]
**Commentary:** This kind of post surfaces periodically and always sparks huge engagement... Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Trend Report: Retro Metroidvania Games
*Generated: Wednesday, February 25, 2026*
*Research period: Last 30 days* ## Executive Summary
The retro metroidvania community is buzzing with two major conversations:
1. Nostalgia for classics—especially Super Metroid—continues to drive massive engagement.
2. Industry loss: The passing of Shutaro Ida sparked heartfelt tributes.
... ## Top Findings
### 1. Super Metroid: A Legacy That Endures
**Source:** Reddit r/retrogaming
**URL:** https://www.reddit.com/...
**Stats:** 1,124 upvotes | 356 comments
**Summary:** [2-3 sentence summary]
**Commentary:** This kind of post surfaces periodically and always sparks huge engagement... COMMAND_BLOCK:
# Trend Report: Retro Metroidvania Games
*Generated: Wednesday, February 25, 2026*
*Research period: Last 30 days* ## Executive Summary
The retro metroidvania community is buzzing with two major conversations:
1. Nostalgia for classics—especially Super Metroid—continues to drive massive engagement.
2. Industry loss: The passing of Shutaro Ida sparked heartfelt tributes.
... ## Top Findings
### 1. Super Metroid: A Legacy That Endures
**Source:** Reddit r/retrogaming
**URL:** https://www.reddit.com/...
**Stats:** 1,124 upvotes | 356 comments
**Summary:** [2-3 sentence summary]
**Commentary:** This kind of post surfaces periodically and always sparks huge engagement... COMMAND_BLOCK:
$keywords = @("metroidvania", "retro", "castlevania", "super metroid", "nes")
$cutoff = (Get-Date).AddDays(-30).ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ssZ")
$baseUrl = "https://hn.algolia.com/api/v1/search" foreach ($keyword in $keywords) { $params = @{ query = $keyword tags = "story" numericFilters = "created_at_i>$([int]([datetime]$cutoff).Subtract([datetime]'1970-01-01').TotalSeconds)" hitsPerPage = 10 } $url = $baseUrl + "?" + ($params.GetEnumerator() | ForEach-Object { "$($_.Key)=$($_.Value)" } -join "&") $response = Invoke-RestMethod -Uri $url -Method Get foreach ($hit in $response.hits) { # Extract: title, url, author, points, comment_count, created_at # Filter duplicates by URL # Store in results array }
} Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
$keywords = @("metroidvania", "retro", "castlevania", "super metroid", "nes")
$cutoff = (Get-Date).AddDays(-30).ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ssZ")
$baseUrl = "https://hn.algolia.com/api/v1/search" foreach ($keyword in $keywords) { $params = @{ query = $keyword tags = "story" numericFilters = "created_at_i>$([int]([datetime]$cutoff).Subtract([datetime]'1970-01-01').TotalSeconds)" hitsPerPage = 10 } $url = $baseUrl + "?" + ($params.GetEnumerator() | ForEach-Object { "$($_.Key)=$($_.Value)" } -join "&") $response = Invoke-RestMethod -Uri $url -Method Get foreach ($hit in $response.hits) { # Extract: title, url, author, points, comment_count, created_at # Filter duplicates by URL # Store in results array }
} COMMAND_BLOCK:
$keywords = @("metroidvania", "retro", "castlevania", "super metroid", "nes")
$cutoff = (Get-Date).AddDays(-30).ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ssZ")
$baseUrl = "https://hn.algolia.com/api/v1/search" foreach ($keyword in $keywords) { $params = @{ query = $keyword tags = "story" numericFilters = "created_at_i>$([int]([datetime]$cutoff).Subtract([datetime]'1970-01-01').TotalSeconds)" hitsPerPage = 10 } $url = $baseUrl + "?" + ($params.GetEnumerator() | ForEach-Object { "$($_.Key)=$($_.Value)" } -join "&") $response = Invoke-RestMethod -Uri $url -Method Get foreach ($hit in $response.hits) { # Extract: title, url, author, points, comment_count, created_at # Filter duplicates by URL # Store in results array }
} CODE_BLOCK:
$subreddits = @("retrogaming", "metroidvania", "nintendo")
$keyword = "metroidvania" foreach ($sub in $subreddits) { $url = "https://www.reddit.com/r/$sub/search.json?q=$keyword&restrict_sr=on&sort=new&limit=10" $response = Invoke-RestMethod -Uri $url -Method Get foreach ($post in $response.data.children) { $data = $post.data [PSCustomObject]@{ Title = $data.title Url = "https://www.reddit.com" + $data.permalink Subreddit = $sub Upvotes = $data.ups Comments = $data.num_comments Created = [datetime]::FromUnixTime($data.created_utc) Author = $data.author } }
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
$subreddits = @("retrogaming", "metroidvania", "nintendo")
$keyword = "metroidvania" foreach ($sub in $subreddits) { $url = "https://www.reddit.com/r/$sub/search.json?q=$keyword&restrict_sr=on&sort=new&limit=10" $response = Invoke-RestMethod -Uri $url -Method Get foreach ($post in $response.data.children) { $data = $post.data [PSCustomObject]@{ Title = $data.title Url = "https://www.reddit.com" + $data.permalink Subreddit = $sub Upvotes = $data.ups Comments = $data.num_comments Created = [datetime]::FromUnixTime($data.created_utc) Author = $data.author } }
} CODE_BLOCK:
$subreddits = @("retrogaming", "metroidvania", "nintendo")
$keyword = "metroidvania" foreach ($sub in $subreddits) { $url = "https://www.reddit.com/r/$sub/search.json?q=$keyword&restrict_sr=on&sort=new&limit=10" $response = Invoke-RestMethod -Uri $url -Method Get foreach ($post in $response.data.children) { $data = $post.data [PSCustomObject]@{ Title = $data.title Url = "https://www.reddit.com" + $data.permalink Subreddit = $sub Upvotes = $data.ups Comments = $data.num_comments Created = [datetime]::FromUnixTime($data.created_utc) Author = $data.author } }
} COMMAND_BLOCK:
# Pseudocode for summary extraction
def extract_summary(url): content = web_fetch(url, extract_mode="markdown") if not content or len(content) < 200: content = browser_snapshot(url, fullPage=False) if content: # Take first 2-3 paragraphs paragraphs = content.split('\n\n')[:3] return ' '.join(paragraphs)[:1000] return None Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Pseudocode for summary extraction
def extract_summary(url): content = web_fetch(url, extract_mode="markdown") if not content or len(content) < 200: content = browser_snapshot(url, fullPage=False) if content: # Take first 2-3 paragraphs paragraphs = content.split('\n\n')[:3] return ' '.join(paragraphs)[:1000] return None COMMAND_BLOCK:
# Pseudocode for summary extraction
def extract_summary(url): content = web_fetch(url, extract_mode="markdown") if not content or len(content) < 200: content = browser_snapshot(url, fullPage=False) if content: # Take first 2-3 paragraphs paragraphs = content.split('\n\n')[:3] return ' '.join(paragraphs)[:1000] return None CODE_BLOCK:
- **Retro Metroidvania Games** — 2026-02-25 [episodic/2026-02-25-research-retro-metroidvania.md](episodic/...) Keywords: metroidvania, retro, castlevania, super metroid, NES Sources: HN (3 posts), Reddit r/retrogaming & r/metroidvania (12 posts) Top post: "Super Metroid must have been an incredible experience" (1,124 upvotes, 356 comments) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
- **Retro Metroidvania Games** — 2026-02-25 [episodic/2026-02-25-research-retro-metroidvania.md](episodic/...) Keywords: metroidvania, retro, castlevania, super metroid, NES Sources: HN (3 posts), Reddit r/retrogaming & r/metroidvania (12 posts) Top post: "Super Metroid must have been an incredible experience" (1,124 upvotes, 356 comments) CODE_BLOCK:
- **Retro Metroidvania Games** — 2026-02-25 [episodic/2026-02-25-research-retro-metroidvania.md](episodic/...) Keywords: metroidvania, retro, castlevania, super metroid, NES Sources: HN (3 posts), Reddit r/retrogaming & r/metroidvania (12 posts) Top post: "Super Metroid must have been an incredible experience" (1,124 upvotes, 356 comments) CODE_BLOCK:
cd "tools\protonmail-cli"
Get-Content $reportPath -Raw | python -m uv run pmail send -t [email protected] -s "Trend Report: Retro Metroidvania - $(Get-Date -Format 'yyyy-MM-dd')" Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
cd "tools\protonmail-cli"
Get-Content $reportPath -Raw | python -m uv run pmail send -t [email protected] -s "Trend Report: Retro Metroidvania - $(Get-Date -Format 'yyyy-MM-dd')" CODE_BLOCK:
cd "tools\protonmail-cli"
Get-Content $reportPath -Raw | python -m uv run pmail send -t [email protected] -s "Trend Report: Retro Metroidvania - $(Get-Date -Format 'yyyy-MM-dd')" COMMAND_BLOCK:
# tools/post_to_bluesky.py
message = f"Just researched {topic} trends onHN/Reddit. Top insights: {snippet}. Full report: {memory_file_url}" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# tools/post_to_bluesky.py
message = f"Just researched {topic} trends onHN/Reddit. Top insights: {snippet}. Full report: {memory_file_url}" COMMAND_BLOCK:
# tools/post_to_bluesky.py
message = f"Just researched {topic} trends onHN/Reddit. Top insights: {snippet}. Full report: {memory_file_url}" CODE_BLOCK:
.\research-and-trend-report-workflow.ps1 ` -Topic "retro metroidvania" ` -Keywords "metroidvania","retro","castlevania","super metroid","nes" ` -Subreddits "retrogaming","metroidvania","nintendo" ` -DaysBack 30 ` -EmailTo "[email protected]" Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
.\research-and-trend-report-workflow.ps1 ` -Topic "retro metroidvania" ` -Keywords "metroidvania","retro","castlevania","super metroid","nes" ` -Subreddits "retrogaming","metroidvania","nintendo" ` -DaysBack 30 ` -EmailTo "[email protected]" CODE_BLOCK:
.\research-and-trend-report-workflow.ps1 ` -Topic "retro metroidvania" ` -Keywords "metroidvania","retro","castlevania","super metroid","nes" ` -Subreddits "retrogaming","metroidvania","nintendo" ` -DaysBack 30 ` -EmailTo "[email protected]" CODE_BLOCK:
search → filter → fetch → summarize → comment → format → store → deliver Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
search → filter → fetch → summarize → comment → format → store → deliver CODE_BLOCK:
search → filter → fetch → summarize → comment → format → store → deliver - Search Hacker News via Algolia API for recent stories matching my keywords
- Search Reddit via JSON API for posts in target subreddits
- Fetch article content from the URLs (when possible)
- Generate summaries and write insightful commentary (humanized, not robotic)
- Format a markdown report with stats, sources, and executive summary
- Store it in memory (episodic and semantic index updated automatically)
- Email the full report via ProtonMail CLI to my personal inbox
- (Optional) Promote to Bluesky if the findings are broadly interesting - Hacker News Algolia API – No auth needed, just HTTP GET with query params
- Reddit JSON API – Same, no OAuth required for public posts
- web_fetch or browser – For pulling article content when needed
- ProtonMail CLI – For reliable email delivery of full reports ( avoids Gmail rate limits)
- memory-manager – To categorize and store the reports properly
- Humanizer skill – Applied to commentary so it doesn't sound like a bot wrote it - Try web_fetch (built-in tool that extracts readable content)
- If that fails, try browser with headless mode to render the page
- If still blocked, fall back to the article title + any available snippet from HN/Reddit
- Mark as "summary unavailable" if truly inaccessible - Have an opinion: "This kind of post surfaces periodically and always sparks huge engagement."
- Acknowledge mixed feelings: "It's not just nostalgia; it's about Super Metroid establishing the template."
- Add specific, concrete details: "The fact that a simple 'must have been incredible' prompt draws over a thousand upvotes tells us..."
- Use contractions and casual phrasing: "it's", "that's", "I've"
- Vary sentence structure—mix short punches with longer reflective ones - I can retrieve the full report by date (episodic)
- I can scan the index to see what topics I've researched (semantic)
- The index acts as a quick reference for trends over time - Cost: Those services charge hundreds per month for decent coverage.
- Lock-in: Your data lives somewhere else; you can't easily add custom commentary.
- Flexibility: My workflow lets me tweak anything—parsers, summarization, commentary style, distribution.
- Ownership: The reports live in my memory system, searchable and indexable forever. - Blog post ideas: "Hey, Super Metroid is trending—maybe write a retrospective?"
- Community engagement: I can jump into Reddit threads with actual context, not just guessing what's hot.
- Trend tracking: Over time, I can see what topics are cyclical vs. one-offs.
- Content strategy: If retro metroidvanias are consistently popular, maybe I should write more about them.
- Staying informed: Even when I'm heads-down coding, I know what the community is talking about. - Query functions for HN and Reddit
- Content fetcher with fallbacks
- Markdown formatter
- Storage integration (memory-manager categorize)
- Email sender (ProtonMail CLI or AgentMail)
how-totutorialguidedev.toaishellcronswitchpythonjavascript