Skip to main content

Implement Phase - AI-Driven Code Generation and Validation

This is it: Implementation. Everything you've done-specification, planning, tasking-leads to this moment.

/sp.implement orchestrates AI code generation. The agent generates code, you review it, you validate against acceptance criteria, you commit. Then the next task.

This lesson teaches two critical skills:

  1. Code validation - How to review AI-generated code
  2. PHR auto-creation - Understanding automatic documentation of AI collaboration

What Does /sp.implement Do?

The Implement Command

/sp.implement analyzes your tasks and generates:

  • Code implementing each task
  • Tests validating the code
  • Documentation (docstrings, comments)

It works task-by-task, respecting your checkpoint pattern.

How Implementation Works

Input: Your specifications, plans, tasks

Agent's Process:

  1. Read spec, plan, and current task
  2. Generate code matching the specification
  3. Include type hints, docstrings, error handling
  4. Generate tests verifying acceptance criteria
  5. Output code + tests, ready for human review

Your Process:

  1. Review generated code
  2. Understand what it does
  3. Verify acceptance criteria
  4. Approve or request changes
  5. Ask to Commit to git
  6. Tell agent: "Next task"

💬 AI Colearning Prompt

"Why is the 5-step validation protocol systematic (Read → Check Spec → Run Tests → Manual Test → Approve) instead of just 'run tests and commit'? What issues does this prevent?"


The Validation Protocol

Validation is NOT just "does it work?" It's systematic verification against your specification.

The 5-Step Validation Process

Step 1: Read and Understand

Read the generated code without running anything:

  • Do you understand what it does?
  • Does it follow your Constitution (type hints, docstrings)?
  • Is the logic clear or does it seem hacky?

RED FLAG: If you don't understand the code, don't approve it. Ask the agent to explain or simplify.

Step 2: Check Against Specification

Compare code to your specification:

  • Does it do what the spec says?
  • Does it handle the edge cases you specified?
  • Does it match the error handling strategy (exceptions)?

RED FLAG: If code does something the spec doesn't mention, question it.

Step 3: Run Acceptance Criteria Tests

Run the generated tests:

  • All tests pass?
  • Coverage adequate?
  • Edge cases included?

RED FLAG: Any failing tests = don't approve. Agent fixes and retries.

Step 4: Manual Testing (Optional)

Step 5: Review and Approve

If all checks pass:

  • Mark as approved
  • Ask to Commit to git
  • Provide feedback to agent on quality
  • Request next task

PHRs - Automatic Documentation

While ADRs capture architectural decisions, PHRs capture collaboration and implementation decisions. Together, they form the project’s explainable memory.

What Are PHRs?

PHR = Prompt History Record

A PHR automatically documents:

  • What prompt you gave the agent
  • What the agent responded with
  • What decision was made
  • When it happened

PHRs are auto-created for all /sp. commands and Important clarifications during coding

Where Are PHRs Stored?

history/prompts/
├── calculator/
│ ├── 001-specify-phase.md (auto-created by /sp.specify)
│ ├── 002-clarify-phase.md (auto-created by /sp.clarify)
│ ├── 003-plan-phase.md (auto-created by /sp.plan)
│ ├── 004-tasks-phase.md (auto-created by /sp.tasks)
│ ├── 005-implement-phase-pt1.md (auto-created by /sp.implement)
└── general/
└── [Other non-feature PHRs]

What You Do With PHRs

You don't create them. You:

  1. Know they exist (understand they're being created automatically)
  2. Know where to find them (history/prompts/<feature>/)
  3. Review them later (for learning, compliance, debugging)
  4. Request explicit PHRs (only when system might miss something)

When to Request Explicit PHRs

Normally, the system auto-creates PHRs for every /sp.* command and major decisions. But occasionally you might ask:

Agent, this debugging session was complex and taught me something important
about floating-point precision. Can you record this as a PHR for future reference?

[Describe what you learned]

When to request:

  • ✅ Novel problem-solving approach
  • ✅ Non-obvious error resolution
  • ✅ Complex tradeoff decision
  • ✅ Learning moment worth preserving

When NOT to request:

  • ❌ Routine coding (PHRs already auto-created)
  • ❌ Simple bug fixes (already captured in git history)
  • ❌ Repeated issues (first occurrence captured, repeats unnecessary)

Your Interaction With PHRs

During Implementation:

  • You don't think about PHRs; agent creates them automatically
  • Focus on reviewing code and validating

After Implementation:

  • Browse history/prompts/calculator/ to see all implementation decisions
  • Review PHRs to understand "why" decisions were made
  • Use for documentation, compliance, or learning

If System Misses Something:

  • "Record this debugging session as PHR"
  • Agent creates explicit PHR for that decision
  • Rare; most things are auto-captured

🎓 Expert Insight

In AI-native development, validation is where human judgment creates value. AI can generate syntactically correct code that completely misses the specification intent. The 5-step protocol (Read → Spec Check → Test → Manual → Approve) catches three critical failure modes: (1) AI misunderstood requirements, (2) tests pass but don't cover edge cases, (3) code works but violates Constitution standards. Professional developers NEVER skip validation—the 10 minutes you invest prevents hours of debugging production failures.

🤝 Practice Exercise

Ask your AI: "I'm about to implement Task 1 (add operation) using /sp.implement. Walk me through what the 5-step validation protocol should check: (1) What Constitution standards should I verify in the generated code? (2) What specific acceptance criteria from my spec should the tests cover? (3) What edge cases must be tested (negative numbers, zero, floats vs ints)? (4) If tests pass but I don't understand the code, should I approve it? Why or why not? Then explain how PHRs will automatically capture this implementation session."

Expected Outcome: Your AI should identify Constitution standards (type hints, docstrings, error handling), map spec acceptance criteria to test cases (e.g., "add(5, 3) = 8.0", "add(-2, 5) = 3.0"), list calculator-specific edge cases, explain why understanding code is mandatory before approval (can't maintain what you don't understand), and clarify that PHRs auto-capture this workflow without manual intervention.


Implementing Your Calculator (50 minutes)

Now let's implement your calculator using the checkpoint pattern.

  • Step 1: Run /sp.implement

In Claude Code, from your calculator-project directory:

/sp.implement

My calculator tasks are at specs/calculator/tasks.md

Please implement tasks 1-3 (core operations: add, subtract, multiply):

1. Implement each operation with full type hints and docstrings
2. Generate comprehensive tests (unit + edge case + error handling)
3. Verify 100% code coverage for each operation
4. Output ready for my review

After I review and approve, I'll request the next tasks.
  • Step 2: Review Generated Code

Your Review Checklist:

  • Code is understandable (clear variable names, readable logic)

  • Type hints present on all functions

  • Docstrings present and clear

  • Follows Constitution standards

  • Handles edge cases specified

  • Error handling matches your error strategy

  • Tests cover all acceptance criteria

  • Step 3: Ask Agent to Run Tests

Your Prompt:

Run the complete test suite and show me the results.
Include coverage report to verify we meet the constitution requirements.

Agent Does:

  • Runs uv run pytest -v --cov=calculator --cov-report=term-missing

  • Shows all tests passing

  • Displays coverage report (should be 100%)

  • Confirms constitution requirements met

  • Step 4: Validate Acceptance Criteria

Verification Steps

Step 1: Run Complete Test Suite

Your Prompt:

Run the complete test suite and show me the results.
Include coverage report to verify we meet the constitution requirements.

Agent Does:

  • Runs uv run pytest -v --cov=calculator --cov-report=term-missing
  • Shows all tests passing
  • Displays coverage report (should be 100%)
  • Confirms constitution requirements met

Step 2: Type Checking

Your Prompt:

Run mypy to verify all type hints are correct.

Agent Does:

  • Runs uv run mypy src/
  • Shows type checking results
  • Confirms no type errors

Step 3: Code Quality Check

Your Prompt:

Run ruff to check code quality and formatting.

Agent Does:

  • Runs uv run ruff check src/ tests/
  • Shows linting results
  • Confirms code follows standards

Step 4: Approve and Commit

  • If all checks pass run /sp.git.commit_pr
  • Continue Implementation (Divide, Power, Tests, Docs)
  • Repeat the checkpoint pattern for remaining tasks.

Common Mistakes

Mistake 1: Accepting AI Code Without Reading It First

The Error: AI generates code → You immediately commit without review

Why It's Wrong: AI makes mistakes (missing error handling, hardcoded values, security issues). Blind trust leads to bugs.

The Fix: Validation protocol (5-step checklist):

  1. Read without running - Understand what code does
  2. Ask questions - "Why this approach?" "What does this line do?"
  3. Check against spec - Does it match acceptance criteria?
  4. Run tests - Do all tests pass?
  5. Review security - Any hardcoded secrets? Input validation?

Mistake 2: Requesting Too Many Features at Once

The Error: "Implement all 5 operations + tests + error handling + logging in one go"

Why It's Wrong: Violates checkpoint pattern. No opportunity to review incrementally.

The Fix: One task at a time:

  • Implement add() → Review → Commit → Next task
  • Not: Implement everything → Review 1000 lines → Hope it works

Try With AI

Ready to validate your implementation and understand intelligence capture? Reflect on your work:

🔍 Explore Implementation Quality:

"Review my calculator implementation at calculator/operations.py and tests at tests/test_operations.py. Verify: (1) Does the code match my specification exactly? (2) Are all acceptance criteria met? (3) Do I follow my Constitution rules (type hints, docstrings, error handling)? (4) Are there code quality issues (complexity, duplication, unclear naming)? Provide a detailed quality assessment."

🎯 Practice Checkpoint Review:

"I just completed Task 3: Implement divide() function. Walk me through the checkpoint review: (1) Run the tests—what should I look for beyond 'all passing'? (2) Review the code—what security issues might I miss (like division by zero handling)? (3) Check Constitution compliance—did I follow all project rules? (4) Verify acceptance criteria—did I meet every requirement from the spec? Create a review checklist."

🧪 Test PHR Understanding:

"Explain how PHRs capture implementation decisions: (1) What PHRs were auto-created during my /sp.implement run? (2) Where are they stored (history/prompts/calculator/)? (3) What information do they contain? (4) If I discover floating-point precision issues, should I create an explicit PHR? When is manual PHR creation warranted vs automatic?"

🚀 Apply to Your Implementation:

"I'm implementing [describe your feature]. Help me plan the implementation checkpoints: (1) How many tasks should I complete before committing? (2) What should I review at each checkpoint? (3) What tests prove the task is truly done? (4) How do I balance 'perfect code' vs 'good enough to move forward'? Give me practical checkpoint guidelines."