Skip to main content

Tasks Phase - Atomic Work Units and Checkpoints

You now have:

  • ✅ A clear specification
  • ✅ A detailed implementation plan
  • ✅ Documented architecture decisions (ADRs)

Next: Break the plan into atomic work units (tasks) that you'll implement.

This lesson teaches the checkpoint pattern-the critical workflow practice that keeps YOU in control. The pattern is:

Agent: "Here's Phase 1 code"
You: "Review... looks good!"
You: "Commit to git"
You: "Tell me what's next"
Agent: "Phase 2"

NOT:

Agent: "Here's everything" (no human control)

The difference is huge. Checkpoints keep you in control and catch issues early.


What Are Tasks?

A task is a unit of work that:

  • Takes 1-2 hours to complete
  • Has a single, clear acceptance criterion
  • Depends on specific other tasks
  • Can be reviewed and approved individually

Task Properties

Size: 2-5 minutes

  • Too small (>1 minute) = too many micro-tasks
  • Too large (30+ minutes) = hard to review, hard to fix if wrong

Criterion: Single, testable

  • "Write add operation" ✅
  • "Write add operation and all tests" ❌ (two things)
  • "Write something" ❌ (untestable)

Independence: Can be reviewed individually

  • Doesn't require other tasks to be done first
  • Or clearly depends on specific other tasks

💬 AI Colearning Prompt

"Why are tasks sized at 1-2 hours instead of larger chunks like 'full day' or smaller chunks like '15 minutes'? What's the advantage of this granularity for checkpoint-driven development?"


The Checkpoint Pattern (CRITICAL)

This is the most important concept in this lesson. The checkpoint pattern is how you maintain control of the workflow.

Pattern Definition

Loop:
1. Agent: "I've completed Phase X"
2. Human: "Review the work"
3. Human: "APPROVE" → Commit to git
4. Human: "Tell me next step"

Why Checkpoints Matter

Without Checkpoints (dangerous):

You: "Build my calculator"
Agent: "Done! 5000 lines of code, 47 files. All automated. You're welcome."
You: "Uh... wait, I need to review this..."
Agent: "Too late, already committed and deployed!"

With Checkpoints (controlled):

You: "Start implementation"
Agent: "Phase 1 (Core Operations) complete. 200 lines, ready for review."
You: "Read code... looks good. Commits. What's next?"
Agent: "Phase 2 (Tests) starting"
You: "Review tests... found a bug in edge case handling"
You: "Tell agent, agent fixes, re-reviews, commits"
Agent: "Phase 3..."

Your Role in Each Checkpoint

Step 1: Human Reviews

  • Read the generated code/tests
  • Ask: "Does this match the spec?"
  • Ask: "Are there bugs or edge cases missed?"
  • Ask: "Is the code understandable?"

Step 2: Human Decides

  • Approve ("Looks good, commit")
  • Reject ("Fix this issue")
  • Request clarification ("Explain this code")

Step 3: Human Directs

  • "What's next?"
  • You initiate next phase
  • Agent doesn't autonomously continue

Generating Your Tasks

Step 1: Run /sp.tasks

In Claude Code, from your calculator-project directory:

/sp.tasks

My calculator specification is at specs/calculator/spec.md
My implementation plan is at specs/calculator/plan.md

Please decompose the plan into atomic work units (tasks), each ≤ 2 hours,
testable, reversible, and with clear dependencies.

Use a TDD approach: for each operation (add, subtract, etc.),
1️⃣ Write RED tests → 2️⃣ Implement → 3️⃣ Refactor.
Pause after each group for human review before committing.

Also:
- Use Context7 MCP server for documentation lookups.
- Prefer CLI automation where possible.
- Ensure easy rollback and traceability.

Step 2: Review Generated Tasks

The tasks.md should show:

  • Task 1: [Description] - 1-2 hours - Depends on: Nothing
  • Task 2: [Description] - 1.5 hours - Depends on: Task 1
  • Task 3: [Description] - 2 hours - Depends on: Task 1, Task 2
  • ...

Understanding Your Task Breakdown (15 minutes)

Review your tasks and verify:

Dependency Graph

Here's how your calculator tasks depend on each other:

TDD Workflow: 🔴 RED (test) → 🟢 GREEN (implement) → 🔵 REFACTOR/DOCS

┌─────────────────────────────────────────────────────────────────┐
│ │
│ Task 1: 🔴 Write RED test: add() │
│ ↓ │
│ Task 2: 🟢 Implement add() │
│ ↓ │
│ Task 3: 🔴 Write RED test: subtract() │
│ ↓ │
│ Task 4: 🟢 Implement subtract() │
│ ↓ │
│ Task 5: 🔴 Write RED test: multiply() │
│ ↓ │
│ Task 6: 🟢 Implement multiply() │
│ ↓ │
│ Task 7: 🔴 Write RED test: divide() + error cases │
│ ↓ │
│ Task 8: 🟢 Implement divide() + error handling │
│ ↓ │
│ Task 9: 🔴 Write RED test: power() + edge cases │
│ ↓ │
│ Task 10: 🟢 Implement power() + edge case handling │
│ ↓ │
│ Task 11: 🔵 Write documentation + finalize │
│ │
└─────────────────────────────────────────────────────────────────┘

Pattern: Each operation follows RED → GREEN cycle
Tests MUST exist before implementation

Legend:

  • 🔴 Red tasks = Write failing tests first (TDD)
  • 🟢 Green tasks = Implement code to make tests pass
  • 🔵 Blue tasks = Documentation and polish

Key Insight: Tests MUST exist before implementation. You cannot implement Task 2 (add function) without Task 1 (add tests) being complete. This is the TDD (Test-Driven Development) pattern.

Lineage Traceability

Pick one task. Can you trace it back?

Specification: "Calculator must add two numbers"

Plan: "Phase 1: Core Operations - Implement basic arithmetic"

Task 1.1: "Implement add(a, b) returning float, handling negative inputs"

Acceptance Criterion: "add(5, 3) = 8.0, add(-2, 5) = 3.0, add('5', 3) raises TypeError"

If you can trace this lineage, your tasks are well-connected to your specification.

🎓 Expert Insight

In AI-native development, the checkpoint pattern transforms risk management. Without checkpoints, AI generates 5000 lines of code and you review it all at once (high risk: bugs are expensive to fix). With checkpoints, AI generates 200 lines, you review immediately, catch issues early (low risk: bugs are cheap to fix). Professional teams NEVER skip checkpoints—the cost of catching bugs late (in production) is 100x the cost of catching them at checkpoint review.

🤝 Practice Exercise

Ask your AI: "I've generated tasks for my calculator implementation. Can you review specs/calculator/tasks.md and tell me: (1) Are task sizes appropriate (1-2 hours, single testable criterion)? (2) Is the dependency graph correct (tests before implementation, TDD pattern followed)? (3) Can I trace Task 2 (implement add) back through the plan to the specification? (4) Are there tasks that are too large or too small? Then suggest improvements to the task breakdown."

Expected Outcome: Your AI should validate task granularity (e.g., "Implement all operations" = too large → split into per-operation tasks), confirm TDD pattern (tests exist before implementation), verify lineage traceability (specification → plan → task), and suggest optimal sizing for checkpoint-driven development.


Commit Your Tasks

Commit the generated tasks to git:

/sp.git_commit_pr commit the current work in same branch

Common Mistakes

Mistake 1: Tasks Too Large (8+ Hours)

The Error: "Task: Implement entire calculator (8-16 hours)"

Why It's Wrong: Large tasks hide complexity, delay feedback, and make checkpoints meaningless.

The Fix: Break into atomic units (1-2 hours each):

  • ❌ Large: "Implement all operations"
  • ✅ Atomic: "Implement add()" (30 min), "Implement multiply()" (30 min), "Implement divide() with error handling" (1 hour)

Mistake 2: Ignoring Dependencies

The Error: Planning to implement tests before implementing functions

Why It's Wrong: Tasks have natural dependencies. Tests depend on functions existing.

The Fix: Map dependencies explicitly:

  • Task 1: Implement add() → Task 2: Test add() (depends on Task 1)
  • Task 3: Implement divide() → Task 4: Test divide() (depends on Task 3)

Try With AI

Ready to validate your task breakdown and prepare for implementation? Test your tasks:

🔍 Explore Task Atomicity:

"Review my task breakdown at specs/calculator/tasks.md. For each task, evaluate: (1) Is it atomic (does ONE thing with ONE acceptance criterion)? (2) Is it sized right (1-2 hours, not days or minutes)? (3) Can it be tested independently? Identify any tasks that are too large (need splitting) or too small (should be combined)."

🎯 Practice Dependency Analysis:

"Analyze the dependencies in my task list. Are they correct and logical? What's the critical path (minimum sequence to reach 'done')? Which tasks could run in parallel? If I had 3 developers, how would you distribute these tasks? Draw me a dependency graph showing which tasks block others."

🧪 Test Checkpoint Readiness:

"I'm about to implement Task 1: [describe your first task]. Walk me through the checkpoint pattern: (1) What should AI generate? (2) What should I review for (not just 'does it work')? (3) What makes a good commit message? (4) How do I know I'm ready for the next task? Give me a checklist for each checkpoint phase."

🚀 Apply to Your Project:

"I need to break down [describe your project] into atomic tasks. Help me apply the task decomposition principles: Show me how to decompose ONE complex feature (like 'user authentication') into 5-8 atomic tasks with clear dependencies and acceptance criteria. Explain your reasoning for each split."