We are four sprints into a large Oracle Forms modernization engagement. Three hundred forms to convert into a modern Angular web stack. We have BA agents parsing legacy XML and generating business requirement documentation, Dev agents converting Oracle Forms code into Angular UI and Node.js backend producing sixty to seventy percent of the code on the first pass and QA agents generating test cases and test scripts from the same BRDs and user stories.
On paper, this looks like exactly what AI-assisted SDLC is supposed to look like.
And yet, when I looked at the bug counts coming out of the first twenty-four completed forms, around twenty bugs per form, roughly five hundred bugs in total, something felt off. Not just the number but the nature of the problem underneath it.
The bugs told a story the team hadn’t read yet
My instinct was this: if an agent is generating sixty to seventy percent of the code using a coherent set of instructions applied consistently across all three hundred forms, then the bugs it introduces should also be coherent.
Not random. Not form-specific. Systemic.
So, I asked for all the defect descriptions across the twenty-four forms in a single spreadsheet and ran a meta-analysis using AI to categorize the bugs systemically, looking for patterns across the full dataset rather than within each form.
The result was striking. Around three hundred of the five hundred bugs, three in every five fell into just seven systemic categories. These weren’t random implementation errors. They were the same gaps, repeating, across form after form, because they traced back to the same root cause: specific gaps in the instructions given to the dev agent.
The wrong process for a new paradigm
Here is what the team was doing: the QA team raised a bug list per form, handed it to a developer who fixed those bugs for that form and moved on to the next form.
In a traditional SDLC, this makes complete sense. Different developers write different forms. Each developer interprets user stories slightly differently. The bugs are largely uncorrelated. Fixing them form by form is the right unit of work.
But that logic breaks down entirely in an agent-assisted process.
When the same agent operating on the same instructions generates code for hundreds of forms, the bugs it introduces are not independent. They are correlated with design. Fixing them form by form is the equivalent of treating symptoms of one patient at a time while ignoring the virus that is infecting all of them.
The right intervention is upstream, not downstream.
Identify the gaps in the instructions. Fix those gaps. Regenerate. Validate. Repeat.
The meta-analysis also surfaced a process gap in how QA was approaching its work. The QA team was comparing the converted Angular forms against the original Oracle Forms pixel by pixel, interaction by interaction as if a verbatim functional match was the acceptance criterion. But the user stories worked by the developers are from a described intent and functionality and not an exact UI reproduction. Oracle Forms were built twenty years ago on a desktop technology stack. Recreating every legacy interaction in a modern web application is not modernization; it is replication.
This is now being clarified with the client confirming the appropriate level of scrutiny as per the scope of work, so that QA is measuring against the right standard.
The ML analogy that reframes everything
Having a background in machine learning, and watching this unfold, I kept seeing a familiar pattern.
The speed at which a model converges acceptable accuracy is almost entirely determined by how fast you can close that feedback loop, and that speed depends on having automated validation metrics.
Imagine trying to fine-tune a model without automated metrics. Every iteration, you show the outputs to a human evaluator. They review them manually and report back in a few days. You make the adjustments and wait again. The iteration cycle collapses and convergence become impossibly slow. The model never gets good enough fast enough.
This is precisely what is happening in our engagement.
Refining the instructions given to a dev agent is not conceptually different from fine-tuning an ML model. The instructions are the parameters, the generated code is the output, the bug report is the loss function, and the automated regression testing is the validation metric that makes rapid iteration possible.
The closed loop that makes it work
This is the process. Without this, agents produce faster. But the system as a whole does not learn faster. The efficiency gain that leadership was promised and that the technology genuinely can deliver never materializes.
The insight I want every CTO to carry
The answer to all of these is yes, and the dashboards look impressive.
If the answer is no, if QA is still manual, if bugs are being fixed form by form rather than traced to instruction gaps, if there is no regression suite enabling rapid iteration; then the agents are producing outputs, but the process is not learning.
Introducing agents into the SDLC without redesigning the feedback loop around them is like installing a faster engine in a car with no steering. You accelerate. But you do not arrive any faster.


