Tools: RFC: AI agent for validating MRs against acceptance criteria - does this solve your problem?

Tools: RFC: AI agent for validating MRs against acceptance criteria - does this solve your problem?

Source: Dev.to

Request for Comments: Meridian ## The Hypothesis ## The Proposed Solution ## Acceptance Criteria Validation ## Historical Context Surfacing ## Technical Approach ## Questions for You ## 1. Problem Validation ## 2. Solution Validation ## 3. Workflow Fit ## 4. Alternative Solutions ## 5. False Positive Tolerance ## Why This Matters ## How to Give Feedback I'm building an AI-powered code review agent for the GitLab AI Hackathon and would love feedback from practicing engineers. Problem 1: MRs get merged without fully implementing acceptance criteria, causing requirement drift and rework. Problem 2: Developers change code without understanding historical design constraints, causing regressions. Cost: Estimated 20-30% of merged code needs follow-up work (based on anecdotal observation). An autonomous agent that: Does this problem exist in your team? Would automated blocking help or create friction? Do you document acceptance criteria in a parseable format? How accurate would this need to be? Building for GitLab AI Hackathon (45-day timeline). Targeting $10K prize, but more importantly: I'd rather pivot now than build something useless. Thanks for your time! πŸ™ Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: Issue #123: criteria: - Export to CSV βœ“ - Export to JSON βœ— - Include all fields βœ— MR Analysis: implemented: 1/3 criteria action: Block merge recommendation: Complete remaining criteria or update issue scope Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Issue #123: criteria: - Export to CSV βœ“ - Export to JSON βœ— - Include all fields βœ— MR Analysis: implemented: 1/3 criteria action: Block merge recommendation: Complete remaining criteria or update issue scope CODE_BLOCK: Issue #123: criteria: - Export to CSV βœ“ - Export to JSON βœ— - Include all fields βœ— MR Analysis: implemented: 1/3 criteria action: Block merge recommendation: Complete remaining criteria or update issue scope CODE_BLOCK: File: auth_flow.py Lines changed: 45-67 Historical Context: original_mr: #89 (8 months ago) design_decision: "SSO requires token refresh every 30s" edge_case: "Enterprise customers need persistent sessions" warning: "Your changes remove refresh logic. SSO may break." Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: File: auth_flow.py Lines changed: 45-67 Historical Context: original_mr: #89 (8 months ago) design_decision: "SSO requires token refresh every 30s" edge_case: "Enterprise customers need persistent sessions" warning: "Your changes remove refresh logic. SSO may break." CODE_BLOCK: File: auth_flow.py Lines changed: 45-67 Historical Context: original_mr: #89 (8 months ago) design_decision: "SSO requires token refresh every 30s" edge_case: "Enterprise customers need persistent sessions" warning: "Your changes remove refresh logic. SSO may break." - LLM: Anthropic Claude 3.5 Sonnet (semantic understanding) - Platform: GitLab Duo Agent Platform - Architecture: Event-driven (webhooks β†’ async analysis β†’ automated comments) - Stack: Python, FastAPI, PostgreSQL, Redis - [ ] Yes, constantly - [ ] Yes, occasionally - [ ] No, not a problem - MR implements 3/5 criteria β†’ Agent blocks merge - Dev changes old code β†’ Agent warns about design constraint - Both scenarios happen - [ ] This would save us hours - [ ] This would be annoying - [ ] Depends on accuracy - [ ] Yes (checkboxes, bullet points in issues) - [ ] Partially (sometimes) - [ ] No (verbal/Slack/tribal knowledge) - PR templates with checklists? - Manual gating process? - Code ownership + tribal knowledge? - 50% accurate β†’ Would you use it? - 70% accurate β†’ Would you use it? - 90% accurate β†’ Would you use it? - 100% accurate or nothing? - Learning distributed systems - Leveling up engineering practices - Building something people actually want - Your role (engineer/lead/manager) - Answers to questions above - Any other thoughts