Tools
Tools: Git Cluster RAG: Semantic Routing for Git History (Copilot CLI Challenge)
2026-02-06
0 views
admin
What I Built ## The Problem ## The Solution ## Demo: Cluster-Guided Routing in Action ## My Experience with GitHub Copilot CLI GitHub Copilot CLI Challenge Submission This is a submission for the GitHub Copilot CLI Challenge Repository: https://github.com/dxa204/git-cluster-rag.git I built Git Cluster RAG, a command-line tool that uses K-Means Clustering to "route" questions about a repository's history to the correct context. Standard RAG (Retrieval-Augmented Generation) applications are "flat." If you ask a question like "Why did we remove the notes file?", a standard vector search might retrieve unrelated commits just because they share keywords. It struggles to distinguish between Code Refactoring, Documentation Updates, and One-off Cleanups. My tool uses the GitHub Copilot CLI to build a pipeline that: In this video, you can see the tool ingesting the git history, identifying the clusters, and then correctly routing a specific query about a deleted file to the "Maintenance" cluster. https://youtu.be/FY4GY0uqMxI Building this project entirely with the Copilot CLI changed my workflow from "Stack Overflow searcher" to "Command Line architect." Scaffolding with Context
I used the @workspace /new command to generate the entire project structure (ingest.py, cluster.py, chat.py) in one go. Instead of writing boilerplate, I could focus on the logic of the K-Means algorithm. The "Agent" Workflow
The standout feature for me was the /init command. By running this, I was able to generate a .github/copilot-instructions.md file that taught Copilot the specific constraints of my project (e.g., "Always use 3 clusters", "Truncate diffs to 500 chars"). This effectively turned Copilot into a specialized teammate that knew my architecture, not just a generic code generator. Frictionless Debugging
When I hit syntax errors or needed to generate dummy git data for testing, I didn't leave the terminal. I used gh copilot suggest to generate complex shell commands that created dummy commits, enabling me to test the clustering algorithm in seconds. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - Ingests commit history (messages + file diffs).
- Embeds the changes using sentence-transformers.
- Clusters the commits using K-Means.
- Routes user queries to the specific semantic cluster (e.g., "Cluster 0: Maintenance") before retrieving answers.
This "Cluster-Guided" approach ensures that when I ask about a deleted file, the system prioritizes "Cleanup" commits over "Feature" commits. - Scaffolding with Context
I used the @workspace /new command to generate the entire project structure (ingest.py, cluster.py, chat.py) in one go. Instead of writing boilerplate, I could focus on the logic of the K-Means algorithm.
- The "Agent" Workflow
The standout feature for me was the /init command. By running this, I was able to generate a .github/copilot-instructions.md file that taught Copilot the specific constraints of my project (e.g., "Always use 3 clusters", "Truncate diffs to 500 chars"). This effectively turned Copilot into a specialized teammate that knew my architecture, not just a generic code generator.
- Frictionless Debugging
When I hit syntax errors or needed to generate dummy git data for testing, I didn't leave the terminal. I used gh copilot suggest to generate complex shell commands that created dummy commits, enabling me to test the clustering algorithm in seconds.
how-totutorialguidedev.toaishellroutinggitgithub