Tools
Tools: The Day the AI Took My Requirements Literally
2026-02-20
0 views
admin
The AI Built Exactly What Was Asked For. That Was the Problem. ## Garbage In. AI Out. Repeat. Or: How We Professionally Automated Our Own Confusion ## Prompt Engineering Is Just (Mis)Communication With Better Marketing ## Human in the Loop, Not AI in the Loop A new project was starting. The proposal described a distributed, multi-domain system responsible for processing financial transactions, aggregating operational metrics, and exposing analytical insights to multiple internal stakeholders. There were compliance considerations. There were integration points with legacy services. There were performance expectations. There was the word “real-time” in bold. The requirements included: It sounded serious. It sounded modern. It sounded like something you would proudly present in a boardroom.
The company had recently rolled out its new AI-enabled development infrastructure. Internal agents. Architecture generators. Prompt templates. “Accelerated delivery pipelines.” The team fed the requirements into it.
The AI did not hesitate. An architecture emerged. It was the best system. So robust, so scalable. The best enterprise application.
And then a preliminary AWS estimate arrived two days later. Projected monthly infrastructure cost: $42,300. That included managed streaming clusters, multi-AZ databases for command and query models, service mesh overhead, observability tooling, and three separate environments. But there were no hallucinations. No obvious mistakes.
Just a perfectly coherent interpretation of the requirements. The AI built exactly what was asked for. That was the problem. Let’s rewind.
Before the architecture. Before the AWS estimate. Before the “enterprise-grade” diagram. There was a stakeholder. Stakeholder wanted to build an internal operations platform to monitor transaction processing and generate insights for management. Reasonable goal. They opened ChatGPT. They typed something like:
What architecture should I use to build a scalable, real-time financial monitoring platform that might grow in the future? ChatGPT did what it does best. It delivered ambition. The model mentioned event-driven architecture, streaming pipelines and service meshes for observability and control! “Service mesh,” the stakeholder thought.
That sounds important. They type a follow-up question:
How do we make sure the system scales well if usage grows? ChatGPT responded confidently:
Modern systems often adopt patterns such as CQRS and event sourcing to separate concerns, improve scalability, and support future growth. CQRS. The stakeholder had heard that word before. Someone from engineering had mentioned it in a meeting once.
It sounded serious, modern and safe. So they asked:
Would CQRS make our platform more future-proof? ChatGPT, still helpful:
Yes, CQRS is commonly used in systems that anticipate growth and evolving requirements. Future-proof. There it was again. And it sounded very reassuring. So the stakeholder wrote a very serious proposal. The requirements were vague. The ambition was abstract. The context was thin. Garbage in. → AI out.
Then that proposal was fed into another AI-powered system. And it produced a perfectly consistent, technically correct, financially enthusiastic architecture. Garbage in. → AI out. → Repeat. By the time it reached the architecture review, the proposal sounded heavy. It looked substantial.
What no one had written clearly was this: The expected traffic curve could comfortably fit inside a single moderately-sized instance without breaking a sweat. The vague ambition travelled further than the concrete constraints. And the AI did exactly what it was asked to do. It optimised for the bold words, for growth, for the future. It was not, however, optimised for reality. Of course, this is a slightly exaggerated example. But only slightly. I have seen real systems where the architecture slides were more complex than any business case.
And to be fair, the opposite happens too. Sometimes the requirement arrives as: “It’s just adding an if to the tool we already have.” Just an if. Behind that “if”: But the proposal says: The AI will happily believe that too.
It will produce a neat solution for a simple conditional branch. Again, this is slightly exaggerated. But (again) only slightly. I have genuinely heard someone say, “It’s just an if,”
for a problem that required eight scalable AWS services, ingestion from thousands of devices and near real-time reporting,. It was not, in fact, just an if. Garbage in → garbage out has always been true. Confusing requirements did not start with AI. The difference now is speed and confidence. We used to miscommunicate slowly. A developer would question it. A meeting would happen. Someone would sigh. Clarifications would emerge. Now we can take vague ambition, feed it into a model, generate a polished architecture, feed that into another system, and deploy it way before anyone asks whether the original requirement made sense. Garbage in → AI out → Feed that output back in → Repeat.
Confusion used to stay human-sized. Now it scales. Which brings us to the uncomfortable part. We have been trying to fix human miscommunication for years.
User stories. BDD. Acceptance criteria. Refinement meetings. Specification templates. Diagrams.
Workshops about workshops. Entire methodologies exist because humans are terrible at saying what they want. And if anything has been proven, it’s that we remain terrible at communicating and writing. Now we have “prompt engineering” — which sounds very fancy and technical — but is essentially communicating by text with a very confident, opinionated rubber duck. That, by itself, will solve nothing. We say “scalable” and mean “won’t crash.”
We say “real-time” and mean “doesn’t feel slow.”
We say “future-proof” and mean “I don’t want to revisit this.” We assume everyone shares the same mental model. And they don't.
AI does not fix this. It removes the human buffer. There is also the small detail of language. The same prompt written in English, Spanish, or German will not always produce the same output. Nuance shifts. Assumptions shift. Tone shifts.
If “lost in translation” is a problem between humans, it does not disappear with a probabilistic model. It just becomes statistically interesting. Ambiguity is often resolved by questioning. A lot of questioning.
The human (ideally) will ask: Yes. Never trust the requirements on their own. Requirements are optimistic by design. AI, however, trusts the requirements. If you write:
Let’s build an event-driven real-time streaming pipeline.
It builds one. If you write:
Let’s use CQRS with event sourcing to future-proof.
It splits your system in two and prepares you for scale you may never reach. If you write:
We should introduce a service mesh for observability and control
It configures sidecars, mTLS, traffic policies, retries. The AI will trust your ambition and will not even ask if you have the budget to match it. Stakeholders sometimes bring big technical words. They do not always bring matching “big” money. Let’s be clear: I am very glad AI does not fight back.
I am not prepared to argue with the AI overlords — or with a chatbot that has developed ego.
But someone should. (Fight back, I mean.) Because AI does not measure necessity. It measures alignment. It will scale like the users exist.
It will future-proof like the roadmap exists.
It will architect like the money exists. Prompt engineering is not magic. It is structured communication without interruption. And if we were imprecise before, we are now imprecise with acceleration. Let’s un-exaggerate this for a second and gently deflate the enterprise dreams: all the previous examples were somewhat optimistic about the capabilities of AI. I have yet to see an AI model or agent that can take a full-blown PDF specification, translate it into a complete system, preserve every nuance, and not quietly drop 15% of the context somewhere between page 12 and Annex C. I have tried something simpler:
“Here are two versions of a specification as PDFs. Tell me the differences.” What I usually get back is enthusiasm and vibes, instead of a traceable list of changes. A few of the vibes are sometimes correct. Which means I cannot trust it blindly. Which means I have to read the specification myself anyway — to confirm that the AI didn’t miss half of it or hallucinate a requirement that never existed.
Which means… I still have to do the work. Now let’s imagine a better-case scenario.
Let’s imagine the AI does understand the specification perfectly and generates all the required code. I still need to review it. Line by line; behaviour by behaviour; edge case by edge case. If something doesn’t match expectations, I now have two options: And there is another subtle problem. AI depends heavily on familiarity. Imagine you are working with Java 25. If most publicly available examples, blog posts, and Stack Overflow discussions revolve around Java 8, then statistically speaking, that is the world the model understands best.
You can explicitly ask for Java 25. It will try. But models gravitate toward what they have seen most often. So now your review job expands again. And even in the magical scenario where the AI produces flawless, syntactically perfect, technically coherent and version adherent code — there is still the one uncomfortable truth that initiated this whole article: If the human requirements were vague, confused, or contradictory…
the output will be vague, confused, and contradictory — just very efficiently so. And no model can compensate for missing intent. Now, before this turns into “AI is useless,” let me be clear: I am not against AI. Quite the opposite. AI is extremely useful when it is positioned correctly. You probably shouldn’t let it write your documentation from scratch. But you absolutely can ask it to: It will save you hours of re-reading your own text and wondering why that paragraph “feels off.”
You probably shouldn’t let it replace code reviews entirely. But you can use tools that: Worst case?
You dismiss the comment — just like you would if a colleague misunderstood the context. Examples of other areas where it shines, if expectations are realistic: In all these cases, AI is not replacing the human. It is assisting the human.
That’s the difference. When you put AI in the loop, you risk removing accountability and context. AI will happily suggest the architecture but it will not sit in the budget review defending it. And, a person walking into that meeting and saying: “Well… the AI model told me to.”
…is not exactly a career-enhancing move. When you keep humans in the loop, AI becomes leverage.
Every system has loops.
Requirements go in. → Architectures come out. → Costs follow. AI can sit inside that loop. But it cannot, or at least should not, own the loop. AI should not be a replacement for thinking.
If you let that happen confusion will scale better than your system ever will. And the loop becomes something else entirely: Garbage in. → AI out. → Repeat. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - An event-driven real-time streaming pipeline.
- CQRS with event sourcing for future scalability.
- A service mesh for observability and traffic control. - Message brokers coordinating streams of domain events.
- Dedicated command and query models backed by separate data stores.
- Event sourcing to maintain an immutable audit trail.
- Sidecars injected for traffic control.
- mTLS between services.
- Retries. Circuit breakers. Distributed tracing. - It contained “real-time streaming architecture.”
- It contained “CQRS with event sourcing.”
- It contained “service mesh for resiliency and governance.” - There were 25 users. At most.
- “Real-time” meant “an update every hour or so.”
- The “legacy integration” was a REST API. Poorly documented, yes... but still a REST API. - 18,000 lines of legal compliance requirements.
- Regional regulatory variations.
- Audit trail obligations.
- Data retention constraints.
- Twenty bespoke hardware integrations with devices that were configured in 2014 by someone who no longer works here. - “Do we actually need real-time?”
- “Is this a CQRS problem or a CRUD problem?”
- “Do we have enough services to justify a service mesh?”
- “What’s the traffic?”
- “What’s the budget?” - tune the prompts, refine the context, restructure the input, clarify assumptions… and try again
- or just redo it myself — or delegate it to someone who will (we still might use AI to help in smaller increments). - tighten your writing,
- improve clarity,
- suggest structure,
- highlight ambiguities,
- spot inconsistencies. - flag suspicious logic,
- detect edge cases,
- suggest refactors,
- answer follow-up questions in pull request threads,
- tell you that according to the version of the tool that you are using, there are simpler ways to do something. - Suggesting boundary conditions, or pointing out “Have you considered null here?” moments. You still decide what matters. It just saves typing and forgetfulness.
- Generating repetitive scaffolding: templates, boilerplate, basic CRUD layers, migration scripts. The kind of code that is necessary but not intellectually exciting. You still review it. You still adapt it. But you type less.
how-totutorialguidedev.toaigptchatgptdatabase