Spec-Driven Development: Let AI Read the Boring Stuff For You

A senior architect at a healthcare tech company stared at her third cup of coffee, wondering if she'd somehow developed the ability to read 47 pages of FHIR specification backwards. It was 2 AM, and she was trying to figure out why her team's patient data integration kept failing validation. Somewhere between "Resource.meta.versionId" and "Bundle.entry.resource.identifier.system," she had an epiphany: she was doing this completely wrong.

The problem is that specifications are genuinely hard to work with at scale. The deeper issue is that we've been treating specs like reference manuals we consult when things break, rather than living documents that should guide development from day one. We read them when we're stuck, skim them when we're confident, and miss them entirely when we're in a hurry. Meanwhile, AI assistants can ingest thousands of pages of technical documentation without needing coffee breaks or developing eye strain.

A Thoughtworks blog post published in December 2025 called spec-driven development one of the most important practices to emerge in 2025, and the practice appeared in the Assess ring of the Thoughtworks Technology Radar Volume 33. The industry has converged on a clear definition: spec-driven development (SDD) is a paradigm that uses well-crafted software requirement specifications as prompts, aided by AI coding agents, to generate executable code. Where "vibe coding" treats AI as a shared design partner, SDD keeps humans firmly in control of design and delegates implementation to the agent.

Wait, Aren't We Already Using Specs?

Think of traditional development like building a house by looking at photos of finished homes and trying to reverse-engineer the blueprints. You'll eventually figure out that walls need studs and roofs need trusses, but you're going to make a lot of expensive mistakes along the way. Spec-driven development with AI is like having an architect who's memorized every building code, can instantly recall exact requirements for load-bearing walls, and never gets tired of answering questions about plumbing regulations.

Most development teams interact with specifications in three painful ways:

The Crisis Reader: Only opens the spec when something's broken. Searches frantically for error codes, tries to match cryptic validation messages to documentation, and usually ends up on Stack Overflow anyway.
The Optimistic Skimmer: Reads the introduction and conclusion, maybe glances at a few examples, assumes the rest is "probably fine." Ships code that technically works but violates half the spec's best practices.
The Martyr: Actually reads the entire specification, takes detailed notes, creates comprehensive documentation, and burns out three weeks later when the spec gets updated and invalidates half their work.

The core insight from SDD practitioners is that most AI coding failures happen in the first few seconds, when a developer gives a vague prompt and an eager agent builds in the wrong direction. AI doesn't replace the need to understand specifications. It transforms how we engage with them throughout the development lifecycle.

What Makes a Specification "Spec-Driven Ready"

Before diving into workflow, it is worth being precise about what SDD actually means by a "specification." It is not a requirements document, a Jira ticket, or a design doc. According to Augment Code's practitioner guide, a spec is a contract between you and the AI agent. It defines what should be built, what should not be built, and how the work will be verified.

Every effective specification needs six elements:

Outcomes: Concrete, measurable results. "User receives verification email and logs in without error" beats "User can log in."
Scope boundaries: Both what is in scope and, critically, what is explicitly out of scope.
Constraints and assumptions: Tech stack decisions, API versions, security requirements.
Prior decisions: Already-chosen database schemas, libraries, or architectural patterns the AI must respect.
Task breakdown: Discrete, ordered sub-tasks that can be executed and verified independently.
Verification criteria: Acceptance criteria and edge cases the implementation must satisfy.

The specificity bar is higher than most teams expect. "Implement JWT validation" is not a spec. "Implement JWT validation according to RFC 7519, specifically sections 4.1.4 through 4.1.7 for registered claim validation, with a 24-hour token expiry and 5-attempt-per-minute rate limiting" is a spec.

As a rough sizing guide: a basic function needs 100-200 words of specification, an API endpoint needs 300-500, a module needs 500-800, and a system architecture needs 1,000-2,000.

The New Development Paradigm

The SDD workflow inverts the traditional approach. Instead of treating your AI assistant as a fancy autocomplete, you use it as a specification interpreter that sits between intent and implementation.

Before: The Traditional Approach

Without a spec as the anchor, development becomes a loop of assumptions and corrections. The team writes code based on what they think the requirements mean, discovers the gaps through testing, consults the specification too late, and rewrites. Each cycle burns time without advancing the design.

After: Spec-Driven AI Development

With SDD, the spec is written and reviewed before any code is generated. The AI works from that agreed source of truth, validates its own output against it, and the CI/CD pipeline enforces compliance as a final safety net. The spec stays alive alongside the code rather than drifting away from it.

The difference goes beyond efficiency. This is a complete shift in where authority lives in the development process. Specs become the source of truth; code becomes a generated artifact that must satisfy them.

The Three Levels of Commitment

Birgitta Böckeler, Distinguished Engineer at Thoughtworks, identified three distinct levels of how teams adopt SDD in an October 2025 article on martinfowler.com:

Spec-first: Write the spec, use it to guide development, then discard it. The spec is a planning tool, not an ongoing artifact.
Spec-anchored: Maintain specs alongside code as the system evolves. Changes to code require corresponding spec updates. This is where most teams land after initial adoption.
Spec-as-source: Specs become the primary maintained artifact; code is generated and not manually edited. Tessl pioneered this approach, and by 2026 tools like Intent are pursuing similar positions, though all of them echo earlier Model-Driven Development patterns with their associated trade-offs. Tessl's framework remains in closed beta as of mid-2026.

Most architects should start at spec-first to build the habit, then migrate to spec-anchored once the workflow is established.

The Four Pillars of Spec-Driven AI Development

SDD is built on four mutually reinforcing practices. Each pillar is useful on its own, but the real leverage comes from applying all four together.

Pillar 1: Context Engineering

Your AI assistant is only as good as the context you provide. This is about strategic context loading, not dumping a 500-page PDF into a chat window and hoping for magic.

Specification chunking: Break large specs into logical sections. If you're implementing OAuth 2.0, you don't need the entire RFC loaded when you're working on token refresh. Load the relevant sections, keep them in context, and swap them out as needed. Modern AI models with 200K+ token context windows can process comprehensive specifications, but selective loading still produces better results.
Version pinning: Always specify which version of the spec you're working with. "Implement JWT validation" is vague. "Implement JWT validation according to RFC 7519, specifically sections 4.1.4 through 4.1.7 for registered claim validation" gives your AI assistant something concrete to work with. This matters especially now that RFC 9700 (published January 2025 as BCP 240) has formalised OAuth 2.0 security best practices, including making PKCE required for all client types and deprecating the Implicit Grant and Resource Owner Password Credentials flows as insecure.
Related standards: Specs rarely exist in isolation. If you're implementing FHIR, you're probably also dealing with OAuth 2.0, HL7 v2, or SMART on FHIR. Load the intersection points between specs to catch integration gotchas early.
MCP servers for live documentation: One of the most significant developments in 2025 has been the rise of Model Context Protocol (MCP) servers that give AI agents real-time access to structured documentation. Tools like Context7 (developed by Upstash) pull current library and framework documentation directly into your AI's context, preventing hallucinated API calls against outdated versions. Context7 indexes over 9,000 libraries and integrates with Cursor, VS Code, Claude, and Windsurf. For codebases that depend on rapidly evolving APIs and standards, this is no longer optional.
Steering rules and AGENTS.md: Before your AI generates any code, establish non-negotiable constraints in a steering rules file or AGENTS.md. Security policies, architectural constraints, coding standards, and compliance requirements belong here. These constraints propagate to every generation step without requiring you to repeat them in every prompt.

Pillar 2: Prompt Engineering for Specifications

Generic prompts produce generic code. Spec-driven prompts produce compliant implementations:

Generic Prompt	Spec-Driven Prompt	Why It Matters
"Create a REST API"	"Create a REST API following OpenAPI 3.1 spec, implementing HAL for hypermedia controls as defined in draft-kelly-json-hal-08"	Produces self-documenting, standards-compliant APIs
"Add authentication"	"Implement OAuth 2.0 authorization code flow per RFC 6749 section 4.1, with PKCE extension per RFC 7636, and mandatory security requirements per RFC 9700"	Ensures security best practices and interoperability
"Parse this XML"	"Parse HL7 CDA R2 documents according to the consolidated CDA implementation guide, validating against templateId requirements"	Catches validation errors before they reach production

The pattern is specificity: point your AI at exact sections, reference specific requirements, and ask for validation against stated constraints.

One advanced technique worth adopting is the adversarial agent pattern. Use a three-role structure: a coordinator that breaks down the specification and delegates to implementors, implementors that execute focused tasks from sub-specifications, and a verifier that independently checks output against criteria. The separation matters because implementing agents reliably miss their own errors through self-verification bias.

Pillar 3: Validation Loops

AI can hallucinate, and specifications can be ambiguous. The solution is continuous, layered validation.

Requirement extraction: Before writing any code, ask your AI to extract the specific requirements from the relevant spec sections and cite each one. Review this list against your own understanding. If the AI misinterprets something, you catch it before it becomes code.
Compliance checking: After generating implementation code, ask the AI to review its own output against the specification. The AI in "generation mode" and "review mode" uses meaningfully different reasoning patterns. This is not redundant; it is a different pass.
Edge case discovery: Specifications often include subtle edge cases buried in normative language. Ask your AI: "What edge cases does RFC XXXX section Y.Z identify that this implementation needs to handle?" You will be surprised what surfaces.
CI/CD enforcement: Thoughtworks researchers are emphatic on this point: spec drift and hallucinations are inherently difficult to avoid, so highly deterministic CI/CD practices are required as the final safety net. Unit tests catch functional bugs but miss architectural violations. Your pipeline should include SAST tools, dependency scanning, secrets detection, and spec compliance checks. This is the layer that enforces what the AI promised.

Research from Yan et al. (2025, George Mason University) shows that LLMs generate vulnerable code at rates ranging from 9.8% to 42.1% across two security benchmarks (SecCodePLT and SecurityEval), which makes automated security validation in the pipeline non-negotiable, not optional.

Pillar 4: Living Documentation

Spec-driven development with AI turns documentation from a separate chore into a natural byproduct of the development process.

Traceability: Every implementation decision can be traced back to a specific section of a specification. Your AI can generate comments and documentation that include these references automatically. Code that says // FHIR R4 Patient constraint pat-1 is not just documentation; it is a traceable link between implementation and the standard that requires it.
Change impact analysis: When a specification updates, ask your AI to analyze which parts of your codebase are affected. "Compare our current JWT implementation against the changes between RFC 7519 and the proposed updates in draft-ietf-oauth-jwt-bcp" becomes a structured conversation rather than a research project. This capability alone justifies the SDD investment for teams maintaining compliance-sensitive systems.
Onboarding acceleration: New team members can ask why things are implemented certain ways and get answers that reference actual specifications, not just "that's how we've always done it." The specifications become institutional knowledge made queryable.

Practical Implementation

Let's make this concrete with a real-world scenario: implementing FHIR patient resource validation using a proper SDD workflow. The steps below use Claude Code as the AI tool, but the same pattern applies to Cursor, GitHub Copilot, or any other agentic coding assistant.

Prerequisites: a project directory open in your terminal, Claude Code installed (npm install -g @anthropic-ai/claude-code), and a .NET project scaffolded (dotnet new classlib -n FhirValidation).

Step 1: Establish Steering Rules

Create an AGENTS.md file in the root of your project. This file is read automatically by Claude Code at the start of every session and encodes your non-negotiable constraints so you never have to repeat them in prompts.

touch AGENTS.md

Add the following content:

# Architecture constraints for FHIR integration

- All FHIR resources must validate against R4 specification (not STU3 or R5)
- Validation errors must include the constraint code, human-readable message, and FHIRPath expression
- Terminology bindings: treat "required" bindings as hard failures, "extensible" as warnings
- Do not use third-party FHIR validation libraries; implement against the spec directly
- All public methods must have corresponding unit tests

If you use Cursor, put the same content in .cursorrules. In Kiro, add it as a Steering Rule via the IDE sidebar.

Step 2: Write the Specification

Create a specs/ directory and write the task specification before opening your AI tool. This is the document your AI will treat as its source of truth.

mkdir specs
touch specs/fhir-patient-validation.md

Populate specs/fhir-patient-validation.md:

# FHIR R4 Patient Resource Validation

## Outcome
A FhirPatientValidator class that validates Patient resources against the FHIR R4 
specification and returns structured errors with constraint codes, messages, and paths.

## Scope
In scope: structural constraints (pat-1), Identifier data type constraints (inv-1), 
required terminology binding validation for Identifier.use, Patient.gender, HumanName.use.

Out of scope: network calls, FHIR server interaction, STU3/R5 compatibility, 
third-party library usage.

## Constraints
- Target: .NET 8, C# 12
- Error format: ValidationError record with Code (string), Message (string), Path (string)
- ValidationResult must expose IsValid (bool) and Errors (List<ValidationError>)
- Use a single error code "terminology" for all terminology binding violations; include the FHIRPath in the Path field to distinguish them (e.g. "Identifier[0].use", "Patient.gender")
- Patient.gender is optional; only validate its value if one is present — a missing gender is not an error

## Tasks
1. Implement pat-1 constraint: Patient must have at least one of identifier, name, 
   telecom, or address
2. Implement inv-1 constraint: If Identifier.Value is present, Identifier.System must 
   be present
3. Validate Identifier.use against the required IdentifierUse value set 
   (usual | official | temp | secondary | old)
4. Validate Patient.gender against AdministrativeGender value set 
   (male | female | other | unknown)
5. Validate HumanName.use against the required NameUse value set
   (usual | official | temp | nickname | anonymous | old | maiden)

## Verification criteria
- A Patient with no identifier, name, telecom, or address fails with code "pat-1"
- An Identifier with a value but no system fails with code "inv-1"
- An invalid Identifier.use value fails with code "terminology"
- An invalid HumanName.use value fails with code "terminology"
- A valid Patient with name only passes validation
- Each test name includes the constraint code it covers

Step 3: Extract and Review Requirements

Open Claude Code in your project directory:

claude

Give it this prompt, referencing your spec file rather than pasting its contents:

Read specs/fhir-patient-validation.md and AGENTS.md. Before writing any code, 
extract all the validation rules we need to implement and list them with their 
constraint codes. Do not generate any implementation yet.

Claude Code will return a structured requirements list. Review it carefully against your own reading of the spec. This human checkpoint is where you catch misunderstandings before they become code. If anything looks wrong, correct it in the spec file and re-run the prompt.

Expected output will look something like:

Validation rules to implement:

1. pat-1 — Patient must have at least one of: identifier, name, telecom, address
   Source: specs/fhir-patient-validation.md, Structural constraints

2. inv-1 — If Identifier.Value is provided, Identifier.System must also be provided
   Source: specs/fhir-patient-validation.md, Identifier constraints

3. terminology/Identifier.use — Value must be one of: usual, official, temp, secondary, old
   Source: specs/fhir-patient-validation.md, Required terminology bindings

4. terminology/Patient.gender — Value must be one of: male, female, other, unknown
   Source: specs/fhir-patient-validation.md, Required terminology bindings

Only move to the next step once you are satisfied this list is complete and correct.

Step 4: Generate Implementation

With requirements confirmed, ask Claude Code to generate the implementation. Keep the prompt tight and spec-referenced:

Implement the FhirPatientValidator class based on specs/fhir-patient-validation.md 
and the requirements we just extracted. Follow all constraints in AGENTS.md. 
Include constraint code references in comments. Write the output to 
FhirValidation/FhirPatientValidator.cs.

Claude Code will create the file. Open it and check that the comments trace back to specific constraints, like this:

// pat-1: Patient must have at least one of identifier, name, telecom, or address
private static void ValidatePat1(FhirPatient patient, ValidationResult result)
{
   if (patient.Identifier.Count == 0 &&
      patient.Name.Count == 0 &&
      patient.Telecom.Count == 0 &&
      patient.Address.Count == 0)
   {
      result.Errors.Add(new ValidationError(
            "pat-1",
            "Patient must have at least one of: identifier, name, telecom, or address.",
            "Patient"));
   }
}

These comments serve a purpose beyond documentation: they provide traceability between implementation and the standard that requires it.

Step 5: Validate the Implementation

Start a fresh prompt — this is the verifier pass. Switching context forces a different reasoning mode than generation:

Review FhirValidation/FhirPatientValidator.cs against specs/fhir-patient-validation.md. 
Check:
1. Are all four rules from our requirements list implemented?
2. Does the error format match the spec (Code, Message, Path)?
3. Are there edge cases in the spec that are not handled?
4. Does the code comply with all constraints in AGENTS.md?

List any gaps. Do not generate fixes yet.

Read the gaps list and decide which to address before continuing.

Step 6: Generate Test Cases from the Spec

The specification defines the ground truth for tests, so generate them from it directly:

Based on specs/fhir-patient-validation.md and the validated implementation, generate 
NUnit test cases covering:
1. Each constraint violation (one test per constraint code)
2. A valid Patient that passes all rules
3. Boundary conditions for terminology bindings

Include the constraint code in each test method name. Write tests to 
FhirValidation.Tests/FhirPatientValidatorTests.cs.

The result will include tests that tie directly back to spec constraints:

[Test]
public void Pat1_EmptyPatient_FailsWithPat1()
{
   var result = _validator.Validate(new FhirPatient());

   Assert.That(result.IsValid, Is.False);
   Assert.That(result.Errors, Has.One.Matches<ValidationError>(e => e.Code == "pat-1"));
}

[Test]
public void Pat1_PatientWithIdentifierOnly_NoPat1Error()
{
   var patient = new FhirPatient
   {
      Identifier = [new FhirIdentifier { System = "http://example.com", Value = "1" }]
   };
   Assert.That(_validator.Validate(patient).Errors,
      Has.None.Matches<ValidationError>(e => e.Code == "pat-1"));
}

[Test]
public void Pat1_PatientWithNameOnly_NoPat1Error()
{
   var patient = new FhirPatient { Name = [new HumanName()] };
   Assert.That(_validator.Validate(patient).Errors,
      Has.None.Matches<ValidationError>(e => e.Code == "pat-1"));
}

[Test]
public void Pat1_PatientWithTelecomOnly_NoPat1Error()
{
   var patient = new FhirPatient { Telecom = ["+1-555-0100"] };
   Assert.That(_validator.Validate(patient).Errors,
      Has.None.Matches<ValidationError>(e => e.Code == "pat-1"));
}

[Test]
public void Pat1_PatientWithAddressOnly_NoPat1Error()
{
   var patient = new FhirPatient { Address = ["1 Example Road"] };
   Assert.That(_validator.Validate(patient).Errors,
      Has.None.Matches<ValidationError>(e => e.Code == "pat-1"));
}

Run the tests to confirm:

dotnet test

If any fail, feed the failure output back to Claude Code and ask it to fix the implementation against the spec, not against the test.

The Plot Twist: When Specs Contradict Reality

Specifications are written by humans, which means they're sometimes ambiguous, occasionally contradictory, and rarely updated as fast as the real world moves. This is where AI becomes particularly valuable as a navigation tool.

When you encounter ambiguity, use your AI assistant to:

Compare interpretations: "Section 4.2 says X, but section 6.1 seems to imply Y. How do these reconcile?" The AI can often spot cross-references you missed or identify which section takes precedence.
Check implementation patterns: "How have other implementations interpreted this requirement?" While AI shouldn't replace reading actual implementation guides, it can surface common patterns and known gotchas.
Document decisions: When you make an interpretation call, have the AI help you document it formally: "Generate an Architecture Decision Record explaining why we interpreted FHIR constraint pat-1 to mean X rather than Y, including specification references." This is the kind of institutional knowledge that survives team turnover.

One honest warning from the field: AI agents frequently ignore instructions or over-interpret specifications, especially in long sessions. Birgitta Böckeler at Thoughtworks flagged this as a persistent challenge: despite detailed specs, agents sometimes decide they know better. This is not a reason to avoid SDD; it is a reason to invest in the validation loop and CI/CD enforcement that catches deviations before they reach production.

Tooling Landscape in 2026

The SDD tooling ecosystem has matured considerably. The main options fall into three categories:

AI-native IDEs with built-in SDD workflows: Amazon Kiro (launched July 2025) is built on VS Code's open-source core and guides teams through a three-document workflow: requirements.md, design.md, and tasks.md, with requirements written in EARS notation and persistent Steering Rules for non-negotiable constraints. GitHub Spec Kit (open source, MIT, v0.8.4 with 92,000+ stars as of May 2026) provides a four-phase Specify, Plan, Tasks, Implement workflow and supports 30+ AI coding agents including Copilot, Claude Code, Gemini CLI, Cursor, and many others. Both appear in the Thoughtworks Technology Radar Assess ring and are strong choices for greenfield projects.
Agentic coding tools requiring manual discipline: Cursor and Claude Code are powerful but require you to impose SDD structure yourself through rules files, AGENTS.md, and prompt discipline. The upside is flexibility. The downside is that the structure lives in your habits, not in the tool.
MCP-based workflow servers: A growing number of MCP servers now provide spec-driven workflow tooling that integrates directly into your AI's context. The spec-workflow-mcp project provides structured SDD workflow tools (Requirements, Design, Tasks phases) with a real-time web dashboard and VS Code sidebar extension. The community has also produced OpenSpec, a lightweight agent-agnostic framework specifically designed for brownfield iteration, and BMAD-METHOD, which orchestrates multiple specialised AI agent personas across the full SDLC.

For most architecture teams, the practical advice is: start with GitHub Spec Kit or Kiro to learn the workflow, then graduate to Claude Code or Cursor once the discipline is internalized.

Honest Limitations

No architecture blog post about a new practice should skip the honest assessment. SDD has real limitations:

Specification overhead: Expect 20-40% more upfront time writing and reviewing specifications compared to jumping straight to implementation. This investment pays off over a sprint or two, but the break-even point is real. Industry benchmarks suggest 2-3 hours of weekly time savings per developer once established, reaching 6+ hours for top users.
Brownfield friction: SDD works best on greenfield projects and new features. Retrofitting specifications onto existing code is possible but significantly more difficult. Legacy modernization is a viable use case, but plan for the additional effort.
Exploratory work resists specification: Genuinely exploratory work, proof-of-concepts, and rapidly evolving requirements fight against the structure SDD requires. The hybrid approach makes sense: use SDD for production systems and team collaboration, use conversational prompting for exploration and prototyping.
Spec drift is real: When code and specs diverge, you lose most of the benefit. Tools like Intent address this with living, bidirectional specifications that update continuously as agents implement changes. For teams using simpler tools, maintaining spec-code alignment requires deliberate process discipline that tooling alone cannot substitute.

Wrapping Up the Spec-Driven Journey

Spec-driven development with AI is about amplifying your ability to work with complex specifications effectively, not replacing human judgment:

Front-load specification writing with outcomes, scope boundaries, constraints, task breakdown, and verification criteria before any code generation begins
Establish steering rules in AGENTS.md or equivalent to encode security policies, architectural constraints, and compliance requirements once rather than repeating them in every prompt
Use MCP servers to give your AI real-time access to current specification versions rather than relying on potentially stale training data
Build traceability into your code by having AI generate comments and documentation that reference specific spec sections and constraint codes
Enforce compliance in CI/CD because AI hallucinations and spec drift require a deterministic safety net beyond the AI's own self-review
Start spec-first, grow to spec-anchored rather than attempting spec-as-source before the team has the workflow discipline

The next time you're staring at a 200-page specification at 2 AM, ask yourself: should you really be reading this alone? Or should you have an AI assistant that's already processed it, can answer precise questions about it, and will flag when your code drifts from what it requires?

What specifications are currently slowing down your team's development? How would your architecture decisions change if you could instantly validate them against relevant standards? And more importantly, what are you going to do with all that time you're not spending squinting at RFC documents?