How Agile and Extreme Programming principles are powering the AI agent revolution
March 2026
A note on how this was made
This blog was created with the help of Claude. The ideas, direction, and arguments are mine; the prose is LLM-generated. I believe in practising what I preach.
A note on framing: why this blog compares Agile with the ADLC — not "the SDLC."
Most ADLC literature positions itself against "the traditional SDLC" — but what they really mean is waterfall: big upfront specs, phase gates, deploy-and-forget. That comparison is valid but almost too easy. Nobody seriously argues that waterfall works for AI agents.
The more useful question is for the teams who already run Agile — which is most of us. Is your current Agile practice sufficient for building agents, or does it need to evolve? That's the comparison this blog makes, because that's the one that matters in practice. SDLC is just a category — waterfall, Agile, and the ADLC are all instances of it. Waterfall is clearly inadequate. Agile provides the foundation but needs extension. The ADLC is that extension. Understanding where Agile already gets you — and where it doesn't — is far more actionable than comparing against a process most teams abandoned years ago.
A note on why I wrote it.
This blog was created with the help of Claude. The ideas, direction, and arguments are mine; the prose is LLM-generated. I believe in practising what I preach.
I'm a long-time Agile and XP advocate — the principles and mindset, not the rigid mechanics. I've been working hands-on with AI and Gen AI tools (Claude Code, Claude, ChatGPT) for a couple of years now, and that experience has made one thing clear: strong Agile and XP practices aren't just compatible with the AI evolution — they're at the core of making it work.
For many years we have been following what we call SDLC (Software Development Life Cycle) and right now there's a new acronym making the rounds in 2026: ADLC — the Agent Development Life Cycle. Multiple companies (EPAM, Arthur AI, Writer, Salesforce, IBM) have independently landed on the same conclusion: you can't build agentic AI systems using the traditional Software Development Life Cycle. Agents are non-deterministic. They reason, adapt, and occasionally hallucinate. The SDLC's assumptions about predictable inputs, binary test results, and static deployments just don't hold.
But here's the thing that keeps nagging me: the ADLC isn't being built from scratch. If you squint at its core principles — customer outcomes over feature specs, small iterations, continuous feedback, verifiable increments, fast deployment — you're looking at Agile. And not just Agile in the abstract. You're looking at the engineering discipline of Extreme Programming (XP) specifically.
The principles we've been practising since 2001 turn out to be the natural operating system for building AI agents. Let me explain why I think that's not a coincidence — and why it matters for how fast you can deliver value to your customers.
But I also want to be honest: Agile and ADLC are not the same thing. They share deep DNA — customer focus, small batches, fast feedback, iteration — but the ADLC extends Agile into territory that deterministic software never had to deal with: probabilistic behaviour, continuous post-deployment evolution, behavioural security, and governance at scale. The argument of this blog isn't that Agile is the ADLC. It's that Agile gave us the operating system — and the ADLC is the upgrade patch for a non-deterministic world.
Below the parallel between Agile/XP and ADLC
Before we get into the technical parallels, let's talk about the thing Agile got right from day one that the ADLC is rediscovering: outcomes over outputs.
The Agile Manifesto didn't start with "we value clean code." It started with "our highest priority is to satisfy the customer through early and continuous delivery of valuable software." The entire philosophy — short sprints, working software, responding to change — exists in service of one goal: getting value into customers' hands faster and learning from what happens.
The ADLC has the same North Star. EPAM's ADLC framework begins with "Phase 0: Discovery," which is entirely about understanding where work breaks down for users, where decisions stall, and where manual effort repeats. Writer's six ADLC principles start with "outcomes, not requirements" — identifying a problem you can actually solve, measure, and improve, rather than writing a spec and hoping it holds.
This isn't waterfall with AI bolted on. It's outcome-based development. You define what success looks like for the customer, build the minimum viable agent to test that hypothesis, get it in front of users quickly, and iterate based on real-world feedback. Sound familiar? It should. That's Agile's core loop, applied to a new class of system.
Here's a finding from Microsoft's work with GitHub Spec Kit that stopped me in my tracks. When they used spec-driven development to break a large task into individual, measurable small components, it dramatically improved the AI coding agent's ability to execute those tasks. Not marginally — dramatically. The breakdown of one big task into clear, methodical small units is what makes agents actually reliable.
That's not a new AI insight. That's Agile Principle #1: deliver working software in small, frequent increments. It's the whole reason we write user stories with acceptance criteria instead of 80-page requirements documents. And it's the reason Agile teams ship faster — small batches reduce risk, shorten feedback loops, and get value to customers sooner.
Google DeepMind published a paper in February 2026 on "Intelligent AI Delegation" that makes a crucial distinction: decomposition (just splitting a task into parts) is not the same as delegation (splitting it with clear authority, accountability, verifiable completion criteria, and trust boundaries). That's exactly what a well-written user story does: it defines what "done" looks like from the customer's perspective so the person (or agent) doing the work can verify their own output.
The practical implication? The teams that are best at breaking big customer problems into small, shippable increments are the same teams that get the most out of AI agents. The skill transfers directly.
Here's the part that surprised me. At the Pragmatic Engineer Summit in February 2026, one of the major themes was the return of Extreme Programming. Kent Beck was there. Martin Fowler was there. And the consensus was clear: XP's engineering practices — the ones the industry mostly dismissed as "too expensive" and "too slow" — are exactly what AI-assisted development needs.
Think about what XP gave us:
Test-Driven Development (TDD): Write the test before the code. Define the expected behaviour first, then implement to satisfy it.
Pair programming: Two minds on every piece of work — one thinking strategically, one implementing tactically.
Continuous integration: Integrate and deploy constantly. Never let work drift from the mainline.
Small releases: Ship the smallest useful increment to the customer, then iterate based on feedback.
Simple design: Build only what's needed now. Don't over-engineer for hypothetical futures.
Refactoring: Continuously improve the code while keeping it working.
Customer on-site: The person who understands the business outcome is always available to clarify intent.
For years, the industry cherry-picked from this list. CI/CD became standard. Refactoring became accepted. But TDD and pair programming? Too slow. Too expensive. Doesn't scale.
AI changes that calculation completely.
As Robert Melton put it in his "XP 3.0" essay: "We all pair program now — with AI. TDD keeps AI on rails. AI-to-AI code review catches what humans miss. Simple design matters more than ever because AI needs clean structure to understand context. XP was right. AI makes it practical."
The XP practices that were "too expensive" for human-only teams become force multipliers when your pair partner is an AI agent available 24/7 at near-zero marginal cost. The discipline that felt slow — writing tests first, integrating constantly, keeping design simple — is exactly what makes AI output trustworthy and shippable.
Let me map the specific connections between Agile/XP principles and what the ADLC demands — always through the lens of delivering customer value faster.
This is Agile's heartbeat — and it's become the number one enabler for effective AI delegation. Agents perform dramatically better on focused, bounded tasks than on vague, sprawling ones. VS Code's new multi-agent orchestration (v1.109) is built entirely around this: break a fragile, monolithic workflow into smaller, verifiable steps. The BMAD framework (Breakthrough Method for Agile AI-Driven Development) enforces an Agile cycle of PRD → user stories → acceptance criteria → iterative implementation specifically to make AI outputs reliable and shippable faster.
The ADLC's "graduated rollout" phase — internal testers, then controlled pilots, then production — is sprint-based delivery applied to agents. Arthur AI calls this the Agent Development Flywheel: a continuous loop of build, evaluate, release, learn. They make an important observation: getting an agent to functionally complete is usually quick. Going from functionally complete to reliable is where the real work lives. That's exactly the kind of refinement that short, frequent delivery cycles are designed for — and it's how you close the gap between "it works in the lab" and "customers trust it."
Agents are non-deterministic. Your agent's behaviour will shift when you swap models, update context, or encounter new edge cases. The ADLC explicitly assumes systems will change after deployment and builds "Runtime Optimization Loops" for live tuning. Agile's principle of welcoming changing requirements — even late in development — gave us the cultural muscle to handle this. Waterfall teams get paralysed by uncertainty. Agile teams were trained for it. And customer needs don't wait for your next release cycle.
XP was built on the idea that feedback should be constant, not batched. Every test run, every integration, every customer interaction is a signal. The ADLC embeds this same philosophy: every agent interaction in production generates behavioural data that feeds back into evaluation and improvement. Minded AI captures real conversations, annotates mistakes, and turns them into future test cases. This is the inspect-and-adapt cycle accelerated to the speed of production traffic.
In 2026, "self-organising teams" increasingly includes AI agents. Multi-agent architectures use orchestrators to coordinate specialised agents working in parallel — one for code review, one for test generation, one for security scanning. Each agent has a defined role, clear boundaries, and specific outputs. The human's job shifts to what Agile always said leadership should be: setting the vision, removing obstacles, and keeping the team focused on customer outcomes. The difference is that some of your "team members" are now AI — and they can scale instantly.
I'd be wrong if I claimed the ADLC is just Agile with a new name. The shared foundation is real — customer outcomes, small batches, fast feedback, iteration. But the ADLC extends into territory that Agile, XP and current SDLC were never designed for. Here are the seven key shifts:
Agile assumes that well-written code behaves the same way every time. Same input, same output. The ADLC is built for systems where the same input can produce different reasoning paths, different tool invocations, and different conclusions. This unpredictability isn't a bug — it's the nature of AI. And it changes everything about how you test, deploy, and govern.
Agile's TDD tests are binary: the code either passes or it doesn't. The ADLC needs evaluations that score across multiple quality dimensions — accuracy, groundedness, tone, safety, hallucination rate. An AI response can be factually correct but poorly structured, or well-formatted but missing key information. Binary pass/fail can't express that. EDD uses gradient scoring that captures the full picture.
In Agile, deployed code stays unchanged until the next release. In the ADLC, agent behaviour drifts post-deployment — model updates shift outputs, new input patterns create edge cases, context changes alter reasoning. The ADLC builds runtime optimisation loops for live tuning of prompts, models, and tools based on real-world telemetry. The system is never "done."
Agile asks "does the code match the spec?" The ADLC asks "does the agent achieve business KPIs under acceptable risk limits?" An agent with imperfect code but consistent, auditable, customer-satisfying behaviour passes the ADLC's bar. Success is measured by what the agent does, not how it's built.
Agile secures against code vulnerabilities — SQL injection, XSS, buffer overflows. The ADLC adds an entirely new threat surface: behavioural vulnerabilities like prompt injection, tool misuse, memory poisoning, and goal hijacking. Agents demand sandboxed execution, cryptographic identity, and runtime policy enforcement. This is security for systems that reason, not just systems that execute.
This is the biggest role shift. In Agile, developers write implementation. In the ADLC, developers write specifications, evals, and quality gates — then orchestrate AI to implement. The human's irreplaceable value moves from typing code to defining what "good" looks like and verifying outcomes.
Agile produces static test reports per release. The ADLC requires continuous governance catalogs: lineage tracking, eval metrics, risk documentation, bias audits, and red-teaming that never stops. Every agent release carries its own compliance record — not as an afterthought, but as a core artifact.
The bottom line on differences: Agile gives you the mindset and the muscle. The ADLC gives you the extensions needed for systems that think, adapt, and occasionally surprise you. Teams that try ADLC without Agile foundations struggle. Teams that have Agile foundations but don't extend them for non-deterministic systems also struggle. You need both.
Here's where the XP lineage becomes most powerful. TDD — XP's signature engineering practice — is undergoing a direct evolution for the AI era. It's being called Eval-Driven Development (EDD), and the parallels are striking.
The core XP DNA is identical: define what "done" looks like before you build, then iterate in small steps until you get there. But the mechanism adapts:
TDD tests return binary pass/fail. EDD evals score across dimensions — because "partially right" is a meaningful state for AI output.
TDD assumes stable behaviour. EDD monitors for drift, because agent behaviour changes without any code changes.
TDD is primarily pre-deployment. EDD is perpetual — evaluation never stops.
A formal academic paper on "EDDOps" (Evaluation-Driven Development and Operations) published in late 2025 makes this lineage explicit: EDD builds on the iterative principles of TDD and BDD while extending them for non-deterministic behaviour and post-deployment evolution. The SD Architect blog puts it simply: "Begin with evals — just as TDD begins with tests."
Evals also serve a function TDD tests never had to: compliance and customer trust. As increasingly critical, customer-impacting decisions get delegated to AI agents, evals provide the transparency needed for audits and the confidence needed for adoption. As one practitioner put it: "If people don't trust your evals, they won't trust you."
This brings us to what I think is the most powerful idea to emerge from the collision of XP and AI: the best workflow is for humans to write the tests and evals, and let AI write the implementation.
Think about that for a moment. It inverts the traditional developer gripe about TDD — that writing tests first feels slow and backwards. In the AI era, writing tests first isn't a chore. It's the highest-value activity. It's where human judgment, domain knowledge, customer understanding, and intent specification live. The implementation? That's becoming commodity work that AI handles reliably — if you give it a clear target.
Kent Beck — creator of Extreme Programming, co-author of the Agile Manifesto, and the person who popularised TDD — has been coding with AI agents and calls TDD a "superpower" in this context. His reasoning: AI agents can and do introduce regressions. Having a comprehensive test suite is the safety net that catches those regressions immediately. (He also notes a hilarious problem: AI agents sometimes try to delete the tests to make them "pass.")
The Codemanship blog makes the XP connection explicit: "Core to the technical practices of eXtreme Programming is a micro-iterative process we now call Test-Driven Development. In TDD, we work in small steps — solving one problem at a time. We specify using examples. We test continuously. We review continuously." This micro-iterative discipline — straight from XP — is exactly what keeps AI agents on a tight leash and shipping reliable code.
Multiple practitioners have converged on the same pattern:
Human writes a clear specification — user story with acceptance criteria, or a failing test suite that captures the customer's desired outcome
Human prompts the AI: "Here is my failing test suite. Write the code to pass it."
AI generates implementation — targeting the concrete, verifiable criteria
Human reviews — if the tests pass, you know the code actually delivers the intended outcome
As one writer put it: "In a world where code is cheap and abundant, reliability is the premium asset." The test becomes the contract. The AI must fulfil it. Hallucinations get caught by failing assertions, not by human code review that might miss subtle issues. And because each test is tied to a customer outcome, every green test means another piece of real value delivered.
A brand-new study from METR (March 2026) drives this home. They found that between 50% and 66% of AI-generated code that passes automated unit tests would be rejected by human repository maintainers. The AI-written patches passed the automated grader but were flagged for poor code quality, bad style, and subtle breakage.
This isn't an argument against AI-generated code. It's an argument for better evals. The automated tests weren't comprehensive enough — they checked functional correctness but missed quality, maintainability, and architectural fit. The solution is to write better tests and evals that capture the full definition of "good" from the customer and maintainability perspective.
And who writes those better evals? Humans. People who understand the customer, the domain, the architecture, and what "done" really means. That's the irreplaceable skill — and it's exactly the skill that XP has been training developers in for 25 years.
Here's my hypothesis, and it's one I'm seeing play out: teams with strong Agile and XP habits are adopting agentic workflows faster, shipping more reliably, and delivering customer value sooner than those without.
Why? Because Agile and XP already trained them in the skills the ADLC demands:
Writing clear specifications with verifiable acceptance criteria — which translates directly into writing effective prompts, evals, and task briefs for AI agents.
Working in short feedback loops — which is how you iteratively tune an agent's behaviour and get improvements to customers in days, not quarters.
Decomposing complex customer problems into small, independently shippable pieces — the single biggest factor in whether an agent can reliably complete a task.
Writing tests before code (TDD → EDD) — which directly evolves into writing evals before agent behaviour, the most critical practice for agent reliability.
Continuous integration and deployment — which means every improvement can reach customers immediately, not batch up in a release train.
Keeping the customer at the centre — measuring outcomes and value delivered, not lines of code or story points completed.
But — and this is important — those same teams also need to extend their practices for the new reality. The seven shifts I described earlier aren't optional. You can't treat an AI agent like deterministic code and expect good results. The Agile foundation gets you 80% of the way there. The ADLC extensions are the last 20% — and they're the difference between an agent that demos well and one that works in production.
Writer, the company that coined "ADLC," literally described what they needed as "Agile for Agents." That's not a metaphor. It's an admission that the operating system was already there — it just needed to be extended for non-deterministic systems.
If you're a developer or team lead thinking about how to deliver value faster with AI agents, here's the punch line: you probably already have the most important skills. You just need to apply them differently — and extend them where AI demands it.
Start with the customer outcome, not the technology. Before you build an agent, ask: what problem does this solve for the user? What does success look like? How will we measure it? This is ADLC Phase 0, and it's also Sprint Planning 101.
Treat your prompts like user stories. Give them clear context, a specific goal, and verifiable acceptance criteria. "Add authentication to the API" is a bad user story AND a bad prompt. "Add JWT-based authentication to the /users endpoint with token refresh, returning 401 on invalid tokens, with unit tests" is a good one for both.
Write the tests first. Seriously. The XP/TDD discipline that felt tedious for human-written code becomes your superpower with AI. Write your failing test suite, hand it to the AI, and say "make these pass." You've just turned a probabilistic system into a target-seeking one. And every passing test is a piece of customer value you can verify.
Build your eval suite like a test suite — but richer. For agent systems, go beyond binary pass/fail. Define quality dimensions: accuracy, tone, groundedness, safety. Collect real-world customer interactions. Turn failures into eval cases. Remember the METR finding: if your evals only check functional correctness, you're missing 50-66% of the quality picture. This is where the ADLC extends beyond what TDD alone can handle.
Ship small, ship often. Don't wait for the perfect agent. Deploy the smallest useful version to a controlled group, learn from real usage, and iterate. The ADLC's graduated rollout is just Agile's "minimum viable product" applied to agents. Time to market matters — your competitor is iterating too.
Don't stop evaluating after deployment. This is the hardest habit for Agile teams to build, because traditional software doesn't need it. Agent behaviour drifts. Models update. Input patterns shift. Build runtime monitoring into your workflow from day one — treat it as a continuous retrospective, not a one-off QA gate.
Retro your agent's behaviour, not just your team's. When an agent produces unexpected output, don't just re-prompt. Ask why. Was the task too broad? Was the customer context missing? Was the eval too narrow? Apply the same inspect-and-adapt loop you'd use in a sprint retro — but extend it to cover behavioural drift, not just code bugs.
The ADLC isn't a replacement for Agile. It's Agile's next chapter — with important new sections that Agile alone doesn't cover.
The shared DNA is unmistakable: customer focus, small batches, fast feedback, verifiable increments, quick deployment, embracing change. The evolution from TDD to EDD is perhaps the clearest proof. XP spent two decades insisting that writing tests first was the disciplined way to build reliable software. It turns out we were building the exact muscle memory needed for the AI era — where humans define intent and quality gates, and machines handle implementation.
But the ADLC also extends into genuinely new territory: probabilistic behaviour, gradient evaluation, post-deployment evolution, behavioural security, and continuous governance. These aren't things you can bolt onto Agile as an afterthought. They require deliberate practice and new tooling.
The teams that will thrive are the ones that bring both: the Agile/XP foundation for speed, customer focus, and iteration — plus the ADLC extensions for the reality of non-deterministic systems.
The tools have changed. The team composition is changing. The speed at which you can get value to customers has accelerated beyond anything we imagined. But the operating system? Agile and XP had it right all along. They just needed a patch for a probabilistic world.
We didn't know what we were training for. Now we do.
Sources & further reading: EPAM — "Introducing ADLC" (Feb 2026) • Arthur AI — "The Agent Development Lifecycle" • Writer — "Anyone Can Build Software Now" • Google DeepMind — "Intelligent AI Delegation" (Feb 2026) • Microsoft — "AI-Led SDLC with GitHub Spec Kit" (Feb 2026) • Anthropic — "2026 Agentic Coding Trends Report" • BMAD Method — github.com/bmad-code-org • Minded AI — "The ADLC Loop" • VS Code 1.109 Multi-Agent Orchestration • Salesforce — "Agent Development Lifecycle" • SD Architect — "AI Agents: The Case for Eval Driven Development" (Oct 2025) • EDDOps Paper — arxiv.org (Nov 2025) • Braintrust — "What Is Eval-Driven Development" (Feb 2026) • Kent Beck on TDD & AI — The Pragmatic Engineer (Jun 2025) • Codemanship — "The AI-Ready Software Developer #16: A Token of Our eXtreme" (Nov 2025) • Robert Melton — "XP 3.0: AI Validates What Extreme Programming Got Right" (Dec 2025) • METR Study on AI Code Quality — LeadDev (Mar 2026) • AWS — "From AI Agent Prototype to Product" (Jan 2026) • Pragmatic Engineer Summit — "The Future of Software Development" (Feb 2026)