Tools

Tools: Building a Prompt Engineering Feedback Loop: The System That Made My AI Prompts 3x More Effective

2026-02-14 0 views admin

Tools: Building a Prompt Engineering Feedback Loop: The System That Made My AI Prompts 3x More Effective

Source: Dev.to

The Rating Schema ## The Review Cadence ## Anti-Patterns: Prompts That Consistently Fail ## 1. The Vague Scope Dump ## 2. The Over-Constrained Micromanager ## 3. The Missing Context Assumption ## 4. The "Be Creative" Trap ## Proven Patterns: What Consistently Scores 4-5 ## 1. Reference-Based Generation ## 2. Constraint Sandwich (Context, Task, Boundaries) ## 3. Iterative Refinement With Explicit Feedback ## 4. Anti-Pattern Fencing ## Prompt Template: v1 vs v3 ## File Structure for Storing the System ## Applying This to Production API Calls ## Getting Started Most developers treat prompt engineering like a one-time skill. You read a guide, learn a few tricks, then wing it from there. That is how I started too. It did not work. I run an AI automation agency. I use the Claude API and Claude Code daily for production systems, everything from generating content at scale to building full-stack features. When your prompts power revenue-generating infrastructure, "good enough" prompts cost real money in wasted tokens, bad outputs, and manual rework. So I built a feedback loop. After three months of running it, I have 9 reusable prompt templates, 6 saved examples I reference constantly, and a documented list of anti-patterns that would have kept burning me. Here is the system. After every meaningful AI session, I spend 60 seconds recording a rating. The key is making this fast enough that you actually do it. What gets recorded per session: This takes less than a minute. The discipline is not in the writing. It is in doing it every single time, especially when the session goes well. You learn as much from a 5 as you do from a 1. Weekly (15 minutes): Scan the last 7 entries. Look for two things: techniques that keep producing 4s and 5s, and failure modes that keep producing 1s and 2s. If you see the same insight three times, it is a pattern. Write it down. Monthly (30 minutes): Update your templates. Patterns that proved out across multiple weeks get added to your prompt templates. Repeated failures get added to the anti-patterns list. Delete anything that stopped being useful. These are specific patterns I tracked across dozens of sessions. Each one seemed reasonable but produced reliably poor results. This fails because "all the features" is unbounded. The model guesses at scope and inevitably picks the wrong features or implements them at the wrong depth. You get a sprawling mess that matches nothing you actually needed. This looks thorough but it produces brittle, literal-minded code. The model follows your spec so precisely that it misses obvious improvements. Worse, you spend more time writing the prompt than you would writing the code. This assumes the model knows your pagination style, your ORM, your response format, your frontend expectations. It does not. You get generic offset/limit pagination when you needed cursor-based, or you get a completely different response envelope than your other endpoints use. "Creative" is not a technical requirement. This produces over-engineered novelty code, often using obscure patterns the model is less reliable at implementing. Your caching layer does not need to be creative. It needs to work. This works because you are giving the model a concrete target instead of an abstract description. It matches style, structure, and conventions without you having to enumerate every rule. Context sets the environment. Task defines the deliverable. Boundaries prevent scope creep. This three-part structure consistently produces focused, usable output. Instead of one mega-prompt, break it into steps and give explicit feedback: This consistently outperforms trying to get everything right in a single shot. Each step is small enough that the model gets it right, and your corrections compound. Explicitly stating what you do not want is as valuable as stating what you do. This eliminates the most common failure mode: the model "helping" by doing more than you asked. Here is a real template from my system. The v1 version is what I started with. The v3 version is what three months of feedback produced. v3 (After Feedback Loop): What changed between v1 and v3: Everything is plain Markdown. No special tooling. I keep it in a git repo so template changes are tracked, but a folder on your desktop works fine as a starting point. The format matters less than the habit. The feedback loop becomes even more valuable when you are making programmatic API calls. In a chat session, you can course-correct in real time. In production, a bad prompt runs hundreds of times before you notice. Here is how I apply the system to API calls in Python: Key differences for production API prompts: Version your templates. Store them as files, not inline strings. When you update a template, you can diff it against the previous version and measure whether outputs improved. Log inputs and outputs. Every API call should log the prompt version, the input variables, and a quality score (automated or manual). This is your ratings/ data at scale. A/B test template changes. When you update from v2 to v3, run both versions on the same inputs for a week. Compare output quality before fully switching. Set up automated quality checks. For structured outputs, validate the schema. For content, check word count, reading level, and keyword presence. These automated scores supplement your manual ratings. The same feedback loop applies. You review production logs weekly, identify which prompts produce the most rework, and update those templates first. The difference is that a 10% improvement in a production prompt template saves hundreds of manual corrections per month. You do not need to build all of this on day one. Start with the rating habit. After every AI session, spend 60 seconds recording a score and one key insight. Do that for two weeks. The patterns will be obvious, and you will naturally start building templates from what works. The compound effect is real. Three months in, I spend less time prompting and get better results than I did when I started. Not because I memorized tricks, but because I built a system that learns from every session. I'm Parker Gawne, founder of Syntora. We build custom Python infrastructure for small and mid-size businesses. syntora.io (https://syntora.io) Templates let you quickly answer FAQs or store snippets for re-use. Prompt Engineering is everything. Planning the proper prompts just got easier. Some comments may only be visible to logged-in visitors. Sign in to view all comments. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: ## 2026-02-10 | Score: 4 | Task: API endpoint generation Prompt type: Code generation (Express handler) Model: claude-sonnet-4-20250514 What worked: Giving it the existing handler pattern as a reference. Output matched project conventions perfectly. What failed: Did not include error handling until I asked. Key insight: Always include "follow the error handling pattern from [reference]" in code gen prompts. Template updated: Yes, added to code-gen-v3.md Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: ## 2026-02-10 | Score: 4 | Task: API endpoint generation Prompt type: Code generation (Express handler) Model: claude-sonnet-4-20250514 What worked: Giving it the existing handler pattern as a reference. Output matched project conventions perfectly. What failed: Did not include error handling until I asked. Key insight: Always include "follow the error handling pattern from [reference]" in code gen prompts. Template updated: Yes, added to code-gen-v3.md COMMAND_BLOCK: ## 2026-02-10 | Score: 4 | Task: API endpoint generation Prompt type: Code generation (Express handler) Model: claude-sonnet-4-20250514 What worked: Giving it the existing handler pattern as a reference. Output matched project conventions perfectly. What failed: Did not include error handling until I asked. Key insight: Always include "follow the error handling pattern from [reference]" in code gen prompts. Template updated: Yes, added to code-gen-v3.md CODE_BLOCK: Bad: "Build me a user authentication system with all the features a modern app would need." Score: 1-2, every time. Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Bad: "Build me a user authentication system with all the features a modern app would need." Score: 1-2, every time. CODE_BLOCK: Bad: "Build me a user authentication system with all the features a modern app would need." Score: 1-2, every time. CODE_BLOCK: Bad: "Write a Python function that takes exactly two arguments, the first being a string of length 1-255, validates it using regex pattern ^[a-zA-Z0-9_]+$, raises ValueError with message 'Invalid input: {input}' if validation fails, logs to stdout using print() not logging, returns a dict with keys 'status' and 'result'..." Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Bad: "Write a Python function that takes exactly two arguments, the first being a string of length 1-255, validates it using regex pattern ^[a-zA-Z0-9_]+$, raises ValueError with message 'Invalid input: {input}' if validation fails, logs to stdout using print() not logging, returns a dict with keys 'status' and 'result'..." CODE_BLOCK: Bad: "Write a Python function that takes exactly two arguments, the first being a string of length 1-255, validates it using regex pattern ^[a-zA-Z0-9_]+$, raises ValueError with message 'Invalid input: {input}' if validation fails, logs to stdout using print() not logging, returns a dict with keys 'status' and 'result'..." CODE_BLOCK: Bad: "Add pagination to the list endpoint." Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Bad: "Add pagination to the list endpoint." CODE_BLOCK: Bad: "Add pagination to the list endpoint." CODE_BLOCK: Bad: "Write me a really creative and unique solution for caching API responses." Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Bad: "Write me a really creative and unique solution for caching API responses." CODE_BLOCK: Bad: "Write me a really creative and unique solution for caching API responses." CODE_BLOCK: Proven: "Here is an existing handler that follows our project conventions: [paste 30-50 lines of a real handler] Write a new handler for the /users/:id/preferences endpoint that follows the same patterns for error handling, response format, and input validation." Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Proven: "Here is an existing handler that follows our project conventions: [paste 30-50 lines of a real handler] Write a new handler for the /users/:id/preferences endpoint that follows the same patterns for error handling, response format, and input validation." CODE_BLOCK: Proven: "Here is an existing handler that follows our project conventions: [paste 30-50 lines of a real handler] Write a new handler for the /users/:id/preferences endpoint that follows the same patterns for error handling, response format, and input validation." CODE_BLOCK: Proven: "Context: Python FastAPI service, SQLAlchemy ORM, Pydantic v2 for schemas. We use repository pattern for data access. Task: Write a new endpoint POST /api/v1/reports that accepts a date range and report type, queries the database, and returns aggregated results. Boundaries: Do not add any new dependencies. Use existing database models. Keep the endpoint under 40 lines. Return errors as HTTPException with appropriate status codes." Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Proven: "Context: Python FastAPI service, SQLAlchemy ORM, Pydantic v2 for schemas. We use repository pattern for data access. Task: Write a new endpoint POST /api/v1/reports that accepts a date range and report type, queries the database, and returns aggregated results. Boundaries: Do not add any new dependencies. Use existing database models. Keep the endpoint under 40 lines. Return errors as HTTPException with appropriate status codes." CODE_BLOCK: Proven: "Context: Python FastAPI service, SQLAlchemy ORM, Pydantic v2 for schemas. We use repository pattern for data access. Task: Write a new endpoint POST /api/v1/reports that accepts a date range and report type, queries the database, and returns aggregated results. Boundaries: Do not add any new dependencies. Use existing database models. Keep the endpoint under 40 lines. Return errors as HTTPException with appropriate status codes." CODE_BLOCK: Step 1: "Write the Pydantic schemas for a report request and response." [Review output, then:] Step 2: "Good. The request schema is right. For the response, change 'data' to 'rows' and add a 'generated_at' timestamp field. Now write the repository method that queries the database using these schemas." Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Step 1: "Write the Pydantic schemas for a report request and response." [Review output, then:] Step 2: "Good. The request schema is right. For the response, change 'data' to 'rows' and add a 'generated_at' timestamp field. Now write the repository method that queries the database using these schemas." CODE_BLOCK: Step 1: "Write the Pydantic schemas for a report request and response." [Review output, then:] Step 2: "Good. The request schema is right. For the response, change 'data' to 'rows' and add a 'generated_at' timestamp field. Now write the repository method that queries the database using these schemas." CODE_BLOCK: Proven: "Write a database migration to add a 'status' column to the orders table. Do NOT: create a new table, modify existing columns, add indexes (we will do that separately), or include seed data." Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: Proven: "Write a database migration to add a 'status' column to the orders table. Do NOT: create a new table, modify existing columns, add indexes (we will do that separately), or include seed data." CODE_BLOCK: Proven: "Write a database migration to add a 'status' column to the orders table. Do NOT: create a new table, modify existing columns, add indexes (we will do that separately), or include seed data." COMMAND_BLOCK: # Code Generation Prompt Write [description of what I need] in [language]. Make it production-ready. Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Code Generation Prompt Write [description of what I need] in [language]. Make it production-ready. COMMAND_BLOCK: # Code Generation Prompt Write [description of what I need] in [language]. Make it production-ready. COMMAND_BLOCK: # Code Generation Prompt v3 ## Reference [Paste 1 existing file that follows project conventions] ## Context - Language/framework: [e.g., Python 3.12, FastAPI] - ORM/DB: [e.g., SQLAlchemy 2.0, PostgreSQL] - Project patterns: [e.g., repository pattern, dependency injection] ## Task [One clear deliverable, 1-3 sentences max] ## Boundaries - Do not add new dependencies - Do not modify existing files unless specified - Match the error handling pattern from the reference - [Any other project-specific constraints] ## Output format - Single file, ready to save - Include type hints - Include docstring with one-line description Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Code Generation Prompt v3 ## Reference [Paste 1 existing file that follows project conventions] ## Context - Language/framework: [e.g., Python 3.12, FastAPI] - ORM/DB: [e.g., SQLAlchemy 2.0, PostgreSQL] - Project patterns: [e.g., repository pattern, dependency injection] ## Task [One clear deliverable, 1-3 sentences max] ## Boundaries - Do not add new dependencies - Do not modify existing files unless specified - Match the error handling pattern from the reference - [Any other project-specific constraints] ## Output format - Single file, ready to save - Include type hints - Include docstring with one-line description COMMAND_BLOCK: # Code Generation Prompt v3 ## Reference [Paste 1 existing file that follows project conventions] ## Context - Language/framework: [e.g., Python 3.12, FastAPI] - ORM/DB: [e.g., SQLAlchemy 2.0, PostgreSQL] - Project patterns: [e.g., repository pattern, dependency injection] ## Task [One clear deliverable, 1-3 sentences max] ## Boundaries - Do not add new dependencies - Do not modify existing files unless specified - Match the error handling pattern from the reference - [Any other project-specific constraints] ## Output format - Single file, ready to save - Include type hints - Include docstring with one-line description COMMAND_BLOCK: prompt-engineering/ ratings/ 2026-01.md # Monthly rating logs 2026-02.md templates/ code-gen-v3.md # Code generation (current) code-review-v2.md # Code review checklist api-design-v1.md # API endpoint design content-gen-v4.md # Blog/marketing content migration-v2.md # Database migrations bug-fix-v3.md # Debugging assistance refactor-v2.md # Code refactoring test-gen-v2.md # Test writing data-transform-v1.md # Data pipeline scripts examples/ great-code-gen.md # Scored 5, reference prompt+output great-refactor.md # Scored 5, complex refactor great-migration.md # Scored 5, zero-downtime migration failed-vague.md # Scored 1, lesson in specificity failed-scope.md # Scored 1, scope explosion failed-creative.md # Scored 2, over-engineered result anti-patterns.md # Documented failure modes patterns.md # Documented success patterns CHANGELOG.md # Template version history Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: prompt-engineering/ ratings/ 2026-01.md # Monthly rating logs 2026-02.md templates/ code-gen-v3.md # Code generation (current) code-review-v2.md # Code review checklist api-design-v1.md # API endpoint design content-gen-v4.md # Blog/marketing content migration-v2.md # Database migrations bug-fix-v3.md # Debugging assistance refactor-v2.md # Code refactoring test-gen-v2.md # Test writing data-transform-v1.md # Data pipeline scripts examples/ great-code-gen.md # Scored 5, reference prompt+output great-refactor.md # Scored 5, complex refactor great-migration.md # Scored 5, zero-downtime migration failed-vague.md # Scored 1, lesson in specificity failed-scope.md # Scored 1, scope explosion failed-creative.md # Scored 2, over-engineered result anti-patterns.md # Documented failure modes patterns.md # Documented success patterns CHANGELOG.md # Template version history COMMAND_BLOCK: prompt-engineering/ ratings/ 2026-01.md # Monthly rating logs 2026-02.md templates/ code-gen-v3.md # Code generation (current) code-review-v2.md # Code review checklist api-design-v1.md # API endpoint design content-gen-v4.md # Blog/marketing content migration-v2.md # Database migrations bug-fix-v3.md # Debugging assistance refactor-v2.md # Code refactoring test-gen-v2.md # Test writing data-transform-v1.md # Data pipeline scripts examples/ great-code-gen.md # Scored 5, reference prompt+output great-refactor.md # Scored 5, complex refactor great-migration.md # Scored 5, zero-downtime migration failed-vague.md # Scored 1, lesson in specificity failed-scope.md # Scored 1, scope explosion failed-creative.md # Scored 2, over-engineered result anti-patterns.md # Documented failure modes patterns.md # Documented success patterns CHANGELOG.md # Template version history COMMAND_BLOCK: import anthropic from datetime import datetime client = anthropic.Anthropic() # Template loaded from your templates/ directory PROMPT_TEMPLATE = """ ## Reference {reference_example} ## Context Service: Content generation pipeline Output format: JSON with keys: title, body, meta_description Word count target: {word_count} ## Task {task_description} ## Boundaries - Output valid JSON only, no markdown wrapping - Do not include placeholder text - meta_description must be under 160 characters """ def generate_content(task: str, reference: str, word_count: int) -> dict: prompt = PROMPT_TEMPLATE.format( reference_example=reference, task_description=task, word_count=word_count, ) response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, messages=[{"role": "user", "content": prompt}], ) return response Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: import anthropic from datetime import datetime client = anthropic.Anthropic() # Template loaded from your templates/ directory PROMPT_TEMPLATE = """ ## Reference {reference_example} ## Context Service: Content generation pipeline Output format: JSON with keys: title, body, meta_description Word count target: {word_count} ## Task {task_description} ## Boundaries - Output valid JSON only, no markdown wrapping - Do not include placeholder text - meta_description must be under 160 characters """ def generate_content(task: str, reference: str, word_count: int) -> dict: prompt = PROMPT_TEMPLATE.format( reference_example=reference, task_description=task, word_count=word_count, ) response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, messages=[{"role": "user", "content": prompt}], ) return response COMMAND_BLOCK: import anthropic from datetime import datetime client = anthropic.Anthropic() # Template loaded from your templates/ directory PROMPT_TEMPLATE = """ ## Reference {reference_example} ## Context Service: Content generation pipeline Output format: JSON with keys: title, body, meta_description Word count target: {word_count} ## Task {task_description} ## Boundaries - Output valid JSON only, no markdown wrapping - Do not include placeholder text - meta_description must be under 160 characters """ def generate_content(task: str, reference: str, word_count: int) -> dict: prompt = PROMPT_TEMPLATE.format( reference_example=reference, task_description=task, word_count=word_count, ) response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, messages=[{"role": "user", "content": prompt}], ) return response - Added Reference section. This was the single biggest improvement. Giving the model a real example to match cut my edit time by half. - Split "make it production-ready" into explicit Boundaries. "Production-ready" means different things to every developer. Spelling out the constraints removed ambiguity. - Added Output format. Specifying "single file, ready to save" eliminated the model's tendency to split code across multiple files or add explanatory text I had to strip out. - Removed vague qualifiers. No more "make it clean" or "make it good." Every instruction is specific and verifiable. - Version your templates. Store them as files, not inline strings. When you update a template, you can diff it against the previous version and measure whether outputs improved. - Log inputs and outputs. Every API call should log the prompt version, the input variables, and a quality score (automated or manual). This is your ratings/ data at scale. - A/B test template changes. When you update from v2 to v3, run both versions on the same inputs for a week. Compare output quality before fully switching. - Set up automated quality checks. For structured outputs, validate the schema. For content, check word count, reading level, and keyword presence. These automated scores supplement your manual ratings. - Email [email protected] - Location Chicago - Joined Feb 13, 2026

🏷️ Tags

how-totutorialguidedev.toaiswitchpostgresqlpythondatabasegit