How one engineering leader designed a repeatable AI-augmented development system — and what it means for your team.
By Adam Goss, Founder — Iter8 IT Consulting
Executive Summary
I wanted to know if I could build something real.
Not a prototype. Not a proof of concept that gets abandoned after the demo. A production-grade web application — with auth, a database, a CI/CD pipeline, multiple deployment environments, payment processing, and a badge system — built solo, without cutting corners on the engineering discipline I'd spent 25 years developing leading teams.
The question wasn't whether AI could write code. It can. The question was whether a structured, professional workflow could be built around AI that produced consistent, reviewable, maintainable output — the kind of output a senior engineer would be proud of, not just code that mostly worked.
Duck Duck Jeep Tracker (duckduckj.com) is that answer. It's a mobile-first web application that gamifies the Jeep community tradition of leaving rubber ducks on strangers' Jeeps and tracking their journeys. But the app is almost beside the point. What matters is how it was built: 12 development phases, 200+ user stories, every one of them flowing through a structured pipeline from architectural planning to code review — with an AI pair at every stage.
The process I built works. And it is fully transferable.
The tools I chose — Claude for architecture and planning, Claude Code for execution, Azure DevOps for work item tracking, Supabase for the database, Vercel for hosting — are implementation details. The same system runs on Jira and GitHub Actions. It runs on AWS. It runs on whatever your team already uses. What doesn't transfer automatically is the discipline required to run it well. That's what this case study is about.
If you lead an engineering team, manage a product organization, or are trying to figure out how to use AI without losing the quality standards your customers depend on — this is written for you.
The Problem This Was Designed to Solve
Most teams adopting AI development tools are doing it ad hoc. A developer uses GitHub Copilot for autocomplete. Another pastes functions into ChatGPT when they're stuck. A product manager uses an AI to draft tickets. These are genuine productivity gains, but they aren't a system. They don't compound. They don't produce the kind of predictable, reviewable output that a professional software delivery pipeline requires.
The result is a familiar pattern: AI-assisted code that works in isolation but creates integration problems, architectural drift that nobody catches until it's expensive to fix, and a growing gap between "what AI generated" and "what the team can actually own and maintain."
The question I set out to answer was: what does it look like when you apply real SDLC discipline — structured work items, defined acceptance criteria, branch-per-story development, mandatory PR review — to an AI-augmented workflow? Does the discipline make the AI more useful, or does it get in the way?
The answer, definitively, is that discipline makes AI dramatically more useful. Not less.
The System: How It Works
The workflow I designed has five components, each with a clear role.
The Architect (human + AI planning session). Every phase begins with a planning conversation. Architecture decisions are locked in before any code is written. This is where technology choices are made, schema designs are debated, scope boundaries are drawn, and user stories are written with explicit acceptance criteria and test cases. The AI acts as a senior technical peer — proposing options, flagging risks, asking clarifying questions. The human makes every significant decision. The output of each planning session is a structured set of user stories imported into a work tracking system.
The Work Tracking System. Every story lives in a work tracker — in this project, Azure DevOps. Stories have a defined state progression: To Do → In Progress → In Review → Done. They are stack-ranked so execution order is always unambiguous. A script queries the tracker to identify the next story, creates a feature branch, and writes the story context to a file the AI executor can read. The tracker is the single source of truth. Nothing gets built that isn't in a story. Any work tracking tool with an API supports this pattern — Jira, Linear, GitHub Issues, or anything else your team already runs.
The Executor (AI coding agent). Claude Code reads the current story file, implements the acceptance criteria, and creates a pull request. It works within rails defined by a CLAUDE.md architectural rules document — a living specification that enforces patterns like the Repository Pattern for all data access, TypeScript strict mode, no inline database calls in components, and consistent file structure. The executor doesn't make architectural decisions. It executes within the boundaries the architect established.
The PR Review (human quality gate). Every pull request is reviewed by a human before merge. This is non-negotiable and is where the engineering discipline pays off most visibly. Because stories have explicit acceptance criteria, review is structured: does this implementation satisfy the criteria? The reviewer isn't starting from scratch trying to understand intent — the intent is documented. Issues found in review are either handled inline via PR comments for small fixes, or written as new stories for larger concerns.
The Architectural Rules Document. The CLAUDE.md file is the connective tissue of the whole system. It contains the technology stack, folder structure conventions, data access patterns, hard prohibitions, and the step-by-step workflow the executor follows for every story. It is updated as decisions are made — not a one-time setup document, but a living record of every architectural choice the project has accumulated. This document is what makes the executor's output consistent across dozens of stories and months of development.
What Was Built — and What That Proves
Duck Duck Jeep Tracker shipped 12 development phases over the course of the project. Each phase had a defined scope, a set of user stories, and a clear definition of done.
The phases tell the story of a real product maturing:
Phases 0–1 established the pipeline itself — the ADO scripts, the CLAUDE.md rails document, the branching conventions, and the first nine application stories covering UI components and core layout. By the end of the first sprint, the pipeline was running end to end. Nine stories that would have taken days each in traditional development were completed in a single session.
Phase 2 introduced architectural hardening that most side projects never bother with: a Repository Pattern abstracting all database access, a typed RepositoryResult<T> discriminated union replacing try/catch error handling, unit testing infrastructure, and a full brand identity system. This phase established the quality floor that every subsequent phase built on.
Phase 3 delivered the core duck lifecycle — the product's reason for existing. Ducks could be hatched, placed on Jeeps, found, and tracked. The QR code entry point, the geolocation pipeline, the color picker, the auto-generated duck names — all of it was designed and built within a single focused phase. The routing architecture decision from this phase is worth noting: a smart redirect that sent users straight to an action screen was rejected in favor of always showing the duck's journey page first. That's a product judgment call, made in the planning session, that improved the user experience. AI didn't make that call. Engineering discipline in the planning process made it possible to catch it before the code was written.
Phases 4–5 built the public-facing identity of the app — jeep profiles, public duck journeys, and a PDF sticker generation system that produced print-ready Avery label sheets with branded QR codes. Server-side PDF generation was chosen over client-side specifically because it produced a real URL the browser could GET — a decision that also anticipated future re-download needs without adding complexity.
Phase 6 is the one that separates projects that get to production from projects that don't. The entire phase was devoted to deployment infrastructure: a three-environment model (local Docker, TEST cloud, PROD cloud), a CI pipeline running type checks and tests on every PR, an automated TEST deployment triggered by the release/test branch, and a PowerShell migration script for database schema promotion. No user-facing features. Just the infrastructure to ship reliably. This phase also surfaced and resolved a significant RLS (Row Level Security) policy gap that had accumulated across the prior phases — a real security issue caught before anything was exposed to real users.
Phases 7–8 added community features — leaderboards backed by PostgreSQL RPC functions, and a full earned badge system with four categories, milestone thresholds, automatic showcase management, and badge artwork. The leaderboard phase also resolved a dashboard architecture problem: authenticated users without a jeep had been silently gated from community content. Fixing it required tracing the root cause to the login flow, not the dashboard component. That kind of diagnosis requires understanding the system, not just the symptoms.
Phase 9 wired up Stripe payment processing — not just for the donation flow that shipped, but as a reusable payment foundation explicitly designed to support future features. The payments table was named generically rather than donations because a future badge purchase is a payment, not a donation. A naming decision in planning session saved a schema migration later.
Phase 10 shipped the app to production. The PROD pipeline — migrations before deploy, release/prod merging from release/test rather than from main, so promotion always reflects what was actually tested — ran cleanly. The app went live at duckduckj.com. Post-launch bugs (two, both caught and fixed the same day) were tracked as stories, not handled ad hoc.
Phase 11 addressed discoverability: SEO metadata infrastructure, a tradition page written for search intent around the Jeep duck community, and a real-time activity feed bar that shows live duck events to all visitors. The feed animation — a float-up/pause/fade pattern rather than a horizontal ticker — was a product decision made in the planning session. Small detail. Right call.
Phase 12 designed the premium membership tier: subscription infrastructure, multiple jeep support, and an AI-powered photo moderation and cartoonization pipeline using OpenAI's image APIs. This phase was scoped and storified entirely in planning — 31 user stories ready for execution — before a line of code was written.
The Human Role: Where Judgment Lives
The most important thing to understand about this system is what the human does — and what the human does not do.
The human does not write code in this workflow. But the human is not absent from the technical work. The opposite is true.
Every significant architectural decision in this project was made by a human, often by overriding or redirecting an AI recommendation. A few examples from the source material:
When the executor followed CLAUDE.md rules about import patterns so faithfully that a story designed to establish those patterns had nothing left to do, the human noticed, asked about it, and understood the explanation. The story was still valid — the pattern had been correctly anticipated and pre-applied. A reviewer without that understanding might have flagged it as a bug.
When the three-environment database model was being designed, the human asked — before writing a single story — whether the planned PROD deployment approach would still be scriptable from ADO in a future phase. That question, asked early, prevented a design that would have painted the project into a corner by Phase 10.
When the payment table was named, the human caught that calling it donations was too narrow. That's not a technical catch. That's product thinking — recognizing that the data model should reflect the domain's future, not just its present.
When the support banner behavior was being designed, the AI proposed showing it once per session. The human said: keep it visible until the user dismisses it. The AI agreed it was a better call. One of them was thinking about conversion. One was thinking about honesty.
This is what the guide's role looks like in an AI-augmented workflow. Not writing the code. Not even reviewing every line in detail. Maintaining judgment over the decisions that accumulate into architecture — and staying present enough to catch them when they arise.
Scope Discipline: The Skill That Multiplies Everything Else
Across 12 phases of development, the most consistent pattern in the planning sessions was not what was built. It was what was deliberately not built.
The Supabase local development environment was deferred from Phase 0 to Phase 6, where it actually belonged. Unit testing was introduced in Phase 2 rather than Phase 0, after the repository pattern was established and there was something worth testing. The monthly_winners database table was deferred from Phase 7 (leaderboards) to Phase 8 (badges), where its design could be informed by what the badge system actually needed. Payment infrastructure was given its own phase rather than being bundled with the badge system that would eventually depend on it.
None of these deferrals were failures. They were decisions. Each one kept the current phase coherent, prevented complexity bleed, and ensured that when the deferred work eventually ran, it was built with better information than it would have had earlier.
Scope discipline is the skill that makes a structured workflow faster, not slower. When a planning session produces clean, bounded stories, the executor makes fewer wrong turns. When a phase has a clear definition of done, review is faster and merges are cleaner. The time invested in saying "not this phase" is recovered many times over in the execution that follows.
This is also where engineering leadership experience has the most direct impact on AI-augmented development. An AI will happily help you plan more features. It takes a disciplined human to say: not yet.
What This Means for Your Team
Duck Duck Jeep Tracker was a solo project, but the process it demonstrates is not a solo process. It scales.
A team of five engineers running this workflow assigns the architect role to whoever owns the technical direction for a given sprint — usually a tech lead or principal engineer. The executor role distributes across the team. The PR review gate remains human. The work tracking system, the architectural rules document, and the story-as-contract between planning and execution all become shared infrastructure the whole team operates against.
A team of one — a solo founder, a consultant building a client's MVP, an engineering leader validating an idea before hiring — runs exactly the workflow described in this case study.
In both cases, the critical variables are the same:
Stories with real acceptance criteria. Not "add a payment page." A story that specifies what the payment page does, what it does not do, what the success state looks like, and what edge cases are explicitly out of scope. The AI executor is only as good as the story it is given. Vague stories produce vague code.
An architectural rules document that reflects real decisions. Not a boilerplate style guide. A living document that accumulates the actual choices the project has made — which patterns are required, which are prohibited, and why. Every rule in that document is a conversation the executor doesn't have to have again.
A human in the review seat who understands the system. PR review is not a rubber stamp. It is where the human's understanding of the whole system catches the things that look correct in isolation but create problems in context. This role cannot be delegated to another AI — at least not yet.
Discipline about scope. The planning session is where scope is set and where the temptation to add "just one more thing" is most dangerous. A planning process that produces clean, bounded work is the foundation everything else rests on.
The tools you use to implement this process are genuinely interchangeable. Azure DevOps, Jira, Linear — any work tracker with an API and a concept of story states and ordering. Supabase, AWS RDS, PlanetScale — any managed database that your migration tooling can target. Vercel, AWS Amplify, Azure Static Web Apps — any hosting platform your deploy scripts can reach. The process doesn't belong to any of them.
What the process does require is someone who has built software systems before — someone who can make the architectural calls, write the acceptance criteria with enough precision to be actionable, and review the output with enough understanding to catch what the AI missed. That person doesn't need to write every line of code. They need to own the system.
The Question Behind the Question
When I started this project, I said my biggest interest was whether I could build a pattern stable enough to develop applications consistently — maybe even as a primary business.
That question got answered. But the more interesting question it surfaced is one that matters to every engineering organization right now: what does engineering leadership actually mean in a world where code generation is largely solved?
The answer this project points to is: the same thing it always meant. Understanding systems deeply enough to make good decisions about them. Knowing what to build and what not to build. Reviewing output critically rather than optimistically. Keeping quality standards when the speed of execution makes it tempting to let them slip.
Those skills don't become less valuable when AI can write the code. They become the only thing that determines whether the code is any good.
What Good Systems Have in Common
Looking back across 12 phases of development, something becomes clear that wasn't explicitly planned at the outset. The decisions that held up best — the ones that made each subsequent phase faster and cleaner rather than slower and more tangled — all reflect the same underlying instincts about how good systems behave.
Change was cheap. The Repository Pattern meant that swapping or extending the database layer never touched a component. The typed RepositoryResult<T> meant that error handling worked the same way everywhere. Architectural decisions made in Phase 2 were still paying dividends in Phase 12. Systems that welcome change don't just survive — they compound.
Progress came from small steps. Two hundred user stories, each one a bounded, reviewable unit of work. No phase tried to do everything. Each one did one thing well and handed off cleanly to the next. Complexity that would have been overwhelming as a single effort became manageable as a sequence of small, measurable increments.
Direction mattered more than detailed plans. The CLAUDE.md architectural rules document wasn't a specification for every feature. It was a statement of direction — the patterns, principles, and hard limits the system would be held to. Within that direction, the executor had enormous latitude. Without it, the output would have drifted phase by phase into something no one could maintain.
If it didn't move the project forward, it didn't belong. The scope discipline section of this case study is really just this principle applied consistently. Every deferral, every explicit "not this phase," was the same judgment: does this move us forward right now, or does it slow us down to do something we could do better later?
The next step was always obvious. Not because it was planned in advance — the phase map evolved throughout the project. Payment infrastructure earned its own phase once the badge system revealed it was a prerequisite. A growth phase emerged after go-live because shipping to production made the gaps in discoverability impossible to ignore. The inline story format appeared when a class of small, targeted fixes didn't fit cleanly into the sprint model. None of these were scheduled. They surfaced because the process was moving in a clear direction, and a system with clear direction tends to show you what it needs next. That's what this principle is really about — not having a complete plan, but building with enough discipline and momentum that the right next move becomes hard to miss.
Predictability came from cadence, not estimation. No phase had a deadline. Each phase had a definition of done. The question was never "when will this be finished?" It was "is this phase complete?" That shift — from time-boxing to scope-boxing — is what made the delivery rhythm sustainable across a year of development.
Go-live was not the finish line. Phase 10 shipped the app to production. Phase 11 started the growth work. Phase 12 is designing the premium tier. The system was built to evolve, and it is evolving. Production is where the real learning begins.
These aren't observations unique to this project. They're the principles that show up in every well-run engineering system — the ones that make teams fast, output maintainable, and delivery predictable. The reason this project surfaced them so cleanly is that AI-augmented development, done with discipline, makes good and bad system design unusually visible. Good decisions multiply. Bad ones surface quickly. There's nowhere to hide.
Working With Iter8
Iter8 IT Consulting exists to help engineering teams build the systems and practices that make delivery predictable. AI-augmented development is one of the most significant workflow changes engineering teams are navigating right now — and most teams are navigating it without a map.
We've walked this path. We know where the discipline pays off and where teams typically stumble. If your organization is trying to figure out how to adopt AI development tools without sacrificing the engineering standards your customers depend on, that's exactly the work we do.
The hero of this story isn't the tooling. It's the team that learns to use it well. We'd be glad to help yours get there.
Reach us at iter8itconsulting.com
Duck Duck Jeep Tracker is live at duckduckj.com. Adam Goss is the founder of Iter8 IT Consulting, LLC, an engineering leadership and SDLC optimization practice based in Indiana.

