Tools
Tools: How I Indexed 172,000+ AI Agent Skills Using Multi-Strategy Discovery
2026-02-03
0 views
admin
The Problem: Skills Are Everywhere ## Our Approach: Multi-Strategy Discovery ## Strategy 1: Path-Based Search ## Strategy 2: File Size Segmentation ## Strategy 3: Topic-Based Discovery ## Strategy 4: Awesome List Crawling ## Strategy 5: Fork Network Traversal ## The Stack ## Results ## Try It Now ## What's Next ## Your Turn GitHub's search API has a hard limit: 1,000 results per query. We have 172,000+ skills indexed. Here's how we built a discovery system that found them all—without breaking any rules. AI agents like Claude Code, OpenAI Codex, and GitHub Copilot use SKILL.md files to learn new capabilities. These skills teach agents how to handle PDFs, write Excel formulas, follow brand guidelines, and much more. The problem? These skills are scattered across thousands of GitHub repositories: Finding the right skill is like searching for a needle in a haystack of haystacks. I tried GitHub's search: filename:SKILL.md. It returned results, but never more than 1,000. The GitHub API documentation confirms this limit—and there's no way around it with a single query. So I built something different. Instead of fighting the 1,000-result limit, we work with it by running multiple specialized searches. Each strategy targets a different slice of the skill ecosystem. Skills follow predictable directory patterns. We search each path separately: Each query can return up to 1,000 results. Four queries = up to 4,000 potential discoveries. GitHub lets you filter by file size. We segment our searches: Same file, different queries, different result sets. Many skill repositories use GitHub topics. We search for repos tagged with: Then deep-scan each repository for SKILL.md files. The community maintains curated lists of skills: We parse these lists and index every linked repository. When we find a popular skills repository, we also check its forks. Forks often contain additional or modified skills that never made it back to the original repo. Here's what powers the discovery and search: The indexer runs on a schedule: All queries use authenticated GitHub API requests with proper rate limit handling. We rotate between multiple tokens to stay well within limits. After running our multi-strategy discovery: The search is fast. Type "pdf" and get relevant results in milliseconds, ranked by GitHub stars, download count, and security status. Every skill is scanned for: Skills that pass get a green checkmark. Those with issues get flagged. Or browse all 172,000+ skills on the web: skills.palebluedot.live The entire project is open source under MIT license. What skills would you like to see indexed? Any repositories we should add? Drop a comment below—I read every one. Built with Next.js, PostgreSQL, Meilisearch, and way too much coffee. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
filename:SKILL.md path:skills
filename:SKILL.md path:.claude
filename:SKILL.md path:.github
filename:SKILL.md path:.codex Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
filename:SKILL.md path:skills
filename:SKILL.md path:.claude
filename:SKILL.md path:.github
filename:SKILL.md path:.codex CODE_BLOCK:
filename:SKILL.md path:skills
filename:SKILL.md path:.claude
filename:SKILL.md path:.github
filename:SKILL.md path:.codex COMMAND_BLOCK:
filename:SKILL.md size:<1000 # Small skills
filename:SKILL.md size:1000..5000 # Medium skills
filename:SKILL.md size:>5000 # Large skills Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
filename:SKILL.md size:<1000 # Small skills
filename:SKILL.md size:1000..5000 # Medium skills
filename:SKILL.md size:>5000 # Large skills COMMAND_BLOCK:
filename:SKILL.md size:<1000 # Small skills
filename:SKILL.md size:1000..5000 # Medium skills
filename:SKILL.md size:>5000 # Large skills COMMAND_BLOCK:
npm install -g skillhub Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
npm install -g skillhub COMMAND_BLOCK:
npm install -g skillhub CODE_BLOCK:
skillhub search pdf Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
skillhub search pdf CODE_BLOCK:
skillhub search pdf CODE_BLOCK:
skillhub install anthropics/skills/pdf Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
skillhub install anthropics/skills/pdf CODE_BLOCK:
skillhub install anthropics/skills/pdf - Some live in ~/.claude/skills/
- Others in .github/skills/
- Many in random skills/ folders
- And countless more in personal dotfiles repos - claude-skills
- agent-skills - awesome-claude-skills
- awesome-agent-skills
- awesome-copilot - Daily: Incremental crawl (new/updated skills)
- Weekly: Full discovery (all strategies)
- On-demand: Process user-submitted repositories - Dangerous shell commands
- Prompt injection patterns
- Data exfiltration attempts - Native Claude Code integration via MCP protocol
- Skill verification with author confirmation
- Usage analytics so you know which skills actually work
how-totutorialguidedev.toaiopenaishellnetworkpostgresqlgitgithub