AI-Assisted SDLC is not delivering its promise. Here is why?

The agents are working. The feedback loop isn’t.

Published: May 20, 2026

/ Leadership & AI Transformation

Author

Karthic Chandran

Chief Technology Officer

LinkedIn

We are four sprints into a large Oracle Forms modernization engagement. Three hundred forms to convert into a modern Angular web stack. We have BA agents parsing legacy XML and generating business requirement documentation, Dev agents converting Oracle Forms code into Angular UI and Node.js backend producing sixty to seventy percent of the code on the first pass and QA agents generating test cases and test scripts from the same BRDs and user stories.

On paper, this looks like exactly what AI-assisted SDLC is supposed to look like.

And yet, when I looked at the bug counts coming out of the first twenty-four completed forms, around twenty bugs per form, roughly five hundred bugs in total, something felt off. Not just the number but the nature of the problem underneath it.

The bugs told a story the team hadn’t read yet

My instinct was this: if an agent is generating sixty to seventy percent of the code using a coherent set of instructions applied consistently across all three hundred forms, then the bugs it introduces should also be coherent.

Not random. Not form-specific. Systemic.

So, I asked for all the defect descriptions across the twenty-four forms in a single spreadsheet and ran a meta-analysis using AI to categorize the bugs systemically, looking for patterns across the full dataset rather than within each form.

The result was striking. Around three hundred of the five hundred bugs, three in every five fell into just seven systemic categories. These weren’t random implementation errors. They were the same gaps, repeating, across form after form, because they traced back to the same root cause: specific gaps in the instructions given to the dev agent.

null

“The bugs weren’t a QA problem. They were an instruction problem.”

The wrong process for a new paradigm

Here is what the team was doing: the QA team raised a bug list per form, handed it to a developer who fixed those bugs for that form and moved on to the next form.
In a traditional SDLC, this makes complete sense. Different developers write different forms. Each developer interprets user stories slightly differently. The bugs are largely uncorrelated. Fixing them form by form is the right unit of work.

But that logic breaks down entirely in an agent-assisted process.

When the same agent operating on the same instructions generates code for hundreds of forms, the bugs it introduces are not independent. They are correlated with design. Fixing them form by form is the equivalent of treating symptoms of one patient at a time while ignoring the virus that is infecting all of them.

The right intervention is upstream, not downstream.

Identify the gaps in the instructions. Fix those gaps. Regenerate. Validate. Repeat.

The meta-analysis also surfaced a process gap in how QA was approaching its work. The QA team was comparing the converted Angular forms against the original Oracle Forms pixel by pixel, interaction by interaction as if a verbatim functional match was the acceptance criterion. But the user stories worked by the developers are from a described intent and functionality and not an exact UI reproduction. Oracle Forms were built twenty years ago on a desktop technology stack. Recreating every legacy interaction in a modern web application is not modernization; it is replication.

This is now being clarified with the client confirming the appropriate level of scrutiny as per the scope of work, so that QA is measuring against the right standard.

The ML analogy that reframes everything

Having a background in machine learning, and watching this unfold, I kept seeing a familiar pattern.

Training a machine learning model is an iterative process.

  • You run the model
  • Measure its accuracy against a validation set
  • Identify where it is failing
  • Adjust the parameters
  • Run it again

The speed at which a model converges acceptable accuracy is almost entirely determined by how fast you can close that feedback loop, and that speed depends on having automated validation metrics.

Imagine trying to fine-tune a model without automated metrics. Every iteration, you show the outputs to a human evaluator. They review them manually and report back in a few days. You make the adjustments and wait again. The iteration cycle collapses and convergence become impossibly slow. The model never gets good enough fast enough.

This is precisely what is happening in our engagement.

Refining the instructions given to a dev agent is not conceptually different from fine-tuning an ML model. The instructions are the parameters, the generated code is the output, the bug report is the loss function, and the automated regression testing is the validation metric that makes rapid iteration possible.

null

“Without automated regression, every instruction change has to wait for a manual QA cycle, potentially a full sprint, before you know whether it worked.”

The closed loop that makes it work

Based on this, here is the process architecture AI-assisted SDLC requires:

  • Generate — the dev agent produces code from current instructions and context.
  • Test — the QA agent generates test cases and scripts from the BRDs and user stories.
  • Automate — those test scripts run as an automated regression suite against the generated code.
  • Analyze — bug reports are analyzed at the meta level, across forms, not within a single form, to identify systemic categories and map them back to instruction gaps.
  • Refine — developers fix the instructions upstream, not the code downstream.
  • Regenerate — the dev agent produces the next iteration.
  • Regress — the automated suite runs again, measuring whether the instruction changes resolved the systemic bugs without introducing new ones.
  • Repeat — the loop closes, and the convergence accelerates.

This is the process. Without this, agents produce faster. But the system as a whole does not learn faster. The efficiency gain that leadership was promised and that the technology genuinely can deliver never materializes.

The insight I want every CTO to carry

Teams adopting AI-assisted SDLC are measuring agent performance at the wrong level. They are asking:

  • Is the agent generating code?
  • Is documentation being produced?
  • Are test cases being created?

The answer to all of these is yes, and the dashboards look impressive.

But the question that matters is:

  • Is the system converging?
  • Is each sprint producing fewer systemic bugs than the last?
  • Are the instructions getting sharper?
  • Is the feedback loop closing fast enough to make a difference?

If the answer is no, if QA is still manual, if bugs are being fixed form by form rather than traced to instruction gaps, if there is no regression suite enabling rapid iteration; then the agents are producing outputs, but the process is not learning.

Introducing agents into the SDLC without redesigning the feedback loop around them is like installing a faster engine in a car with no steering. You accelerate. But you do not arrive any faster.

null

“The agents are not bottlenecks. The loop is.”

Connect With Us!