Tasks Phase - Atomic Work Units and Checkpoints
You now have:
- ✅ A clear specification
- ✅ A detailed implementation plan
- ✅ Documented architecture decisions (ADRs)
Next: Break the plan into atomic work units (tasks) that you'll implement.
This lesson teaches the checkpoint pattern-the critical workflow practice that keeps YOU in control. The pattern is:
Agent: "Here's Phase 1 code"
You: "Review... looks good!"
You: "Commit to git"
You: "Tell me what's next"
Agent: "Phase 2"
NOT:
Agent: "Here's everything" (no human control)
The difference is huge. Checkpoints keep you in control and catch issues early.
What Are Tasks?
A task is a unit of work that:
- Takes 1-2 hours to complete
- Has a single, clear acceptance criterion
- Depends on specific other tasks
- Can be reviewed and approved individually
Task Properties
Size: 2-5 minutes
- Too small (>1 minute) = too many micro-tasks
- Too large (30+ minutes) = hard to review, hard to fix if wrong
Criterion: Single, testable
- "Write add operation" ✅
- "Write add operation and all tests" ❌ (two things)
- "Write something" ❌ (untestable)
Independence: Can be reviewed individually
- Doesn't require other tasks to be done first
- Or clearly depends on specific other tasks
💬 AI Colearning Prompt
"Why are tasks sized at 1-2 hours instead of larger chunks like 'full day' or smaller chunks like '15 minutes'? What's the advantage of this granularity for checkpoint-driven development?"
The Checkpoint Pattern (CRITICAL)
This is the most important concept in this lesson. The checkpoint pattern is how you maintain control of the workflow.
Pattern Definition
Loop:
1. Agent: "I've completed Phase X"
2. Human: "Review the work"
3. Human: "APPROVE" → Commit to git
4. Human: "Tell me next step"
Why Checkpoints Matter
Without Checkpoints (dangerous):
You: "Build my calculator"
Agent: "Done! 5000 lines of code, 47 files. All automated. You're welcome."
You: "Uh... wait, I need to review this..."
Agent: "Too late, already committed and deployed!"
With Checkpoints (controlled):
You: "Start implementation"
Agent: "Phase 1 (Core Operations) complete. 200 lines, ready for review."
You: "Read code... looks good. Commits. What's next?"
Agent: "Phase 2 (Tests) starting"
You: "Review tests... found a bug in edge case handling"
You: "Tell agent, agent fixes, re-reviews, commits"
Agent: "Phase 3..."
Your Role in Each Checkpoint
Step 1: Human Reviews
- Read the generated code/tests
- Ask: "Does this match the spec?"
- Ask: "Are there bugs or edge cases missed?"
- Ask: "Is the code understandable?"
Step 2: Human Decides
- Approve ("Looks good, commit")
- Reject ("Fix this issue")
- Request clarification ("Explain this code")
Step 3: Human Directs
- "What's next?"
- You initiate next phase
- Agent doesn't autonomously continue
Generating Your Tasks
Step 1: Run /sp.tasks
In Claude Code, from your calculator-project directory:
/sp.tasks
My calculator specification is at specs/calculator/spec.md
My implementation plan is at specs/calculator/plan.md
Please decompose the plan into atomic work units (tasks), each ≤ 2 hours,
testable, reversible, and with clear dependencies.
Use a TDD approach: for each operation (add, subtract, etc.),
1️⃣ Write RED tests → 2️⃣ Implement → 3️⃣ Refactor.
Pause after each group for human review before committing.
Also:
- Use Context7 MCP server for documentation lookups.
- Prefer CLI automation where possible.
- Ensure easy rollback and traceability.
Step 2: Review Generated Tasks
The tasks.md should show:
- Task 1: [Description] - 1-2 hours - Depends on: Nothing
- Task 2: [Description] - 1.5 hours - Depends on: Task 1
- Task 3: [Description] - 2 hours - Depends on: Task 1, Task 2
- ...
Understanding Your Task Breakdown (15 minutes)
Review your tasks and verify:
Dependency Graph
Here's how your calculator tasks depend on each other:
TDD Workflow: 🔴 RED (test) → 🟢 GREEN (implement) → 🔵 REFACTOR/DOCS
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Task 1: 🔴 Write RED test: add() │
│ ↓ │
│ Task 2: 🟢 Implement add() │
│ ↓ │
│ Task 3: 🔴 Write RED test: subtract() │
│ ↓ │
│ Task 4: 🟢 Implement subtract() │
│ ↓ │
│ Task 5: 🔴 Write RED test: multiply() │
│ ↓ │
│ Task 6: 🟢 Implement multiply() │
│ ↓ │
│ Task 7: 🔴 Write RED test: divide() + error cases │
│ ↓ │
│ Task 8: 🟢 Implement divide() + error handling │
│ ↓ │
│ Task 9: 🔴 Write RED test: power() + edge cases │
│ ↓ │
│ Task 10: 🟢 Implement power() + edge case handling │
│ ↓ │
│ Task 11: 🔵 Write documentation + finalize │
│ │
└─────────────────────────────────────────────────────────────────┘
Pattern: Each operation follows RED → GREEN cycle
Tests MUST exist before implementation
Legend:
- 🔴 Red tasks = Write failing tests first (TDD)
- 🟢 Green tasks = Implement code to make tests pass
- 🔵 Blue tasks = Documentation and polish
Key Insight: Tests MUST exist before implementation. You cannot implement Task 2 (add function) without Task 1 (add tests) being complete. This is the TDD (Test-Driven Development) pattern.
Lineage Traceability
Pick one task. Can you trace it back?
Specification: "Calculator must add two numbers"
↓
Plan: "Phase 1: Core Operations - Implement basic arithmetic"
↓
Task 1.1: "Implement add(a, b) returning float, handling negative inputs"
↓
Acceptance Criterion: "add(5, 3) = 8.0, add(-2, 5) = 3.0, add('5', 3) raises TypeError"
If you can trace this lineage, your tasks are well-connected to your specification.
🎓 Expert Insight
In AI-native development, the checkpoint pattern transforms risk management. Without checkpoints, AI generates 5000 lines of code and you review it all at once (high risk: bugs are expensive to fix). With checkpoints, AI generates 200 lines, you review immediately, catch issues early (low risk: bugs are cheap to fix). Professional teams NEVER skip checkpoints—the cost of catching bugs late (in production) is 100x the cost of catching them at checkpoint review.
🤝 Practice Exercise
Ask your AI: "I've generated tasks for my calculator implementation. Can you review
specs/calculator/tasks.mdand tell me: (1) Are task sizes appropriate (1-2 hours, single testable criterion)? (2) Is the dependency graph correct (tests before implementation, TDD pattern followed)? (3) Can I trace Task 2 (implement add) back through the plan to the specification? (4) Are there tasks that are too large or too small? Then suggest improvements to the task breakdown."
Expected Outcome: Your AI should validate task granularity (e.g., "Implement all operations" = too large → split into per-operation tasks), confirm TDD pattern (tests exist before implementation), verify lineage traceability (specification → plan → task), and suggest optimal sizing for checkpoint-driven development.
Commit Your Tasks
Commit the generated tasks to git:
/sp.git_commit_pr commit the current work in same branch
Common Mistakes
Mistake 1: Tasks Too Large (8+ Hours)
The Error: "Task: Implement entire calculator (8-16 hours)"
Why It's Wrong: Large tasks hide complexity, delay feedback, and make checkpoints meaningless.
The Fix: Break into atomic units (1-2 hours each):
- ❌ Large: "Implement all operations"
- ✅ Atomic: "Implement add()" (30 min), "Implement multiply()" (30 min), "Implement divide() with error handling" (1 hour)
Mistake 2: Ignoring Dependencies
The Error: Planning to implement tests before implementing functions
Why It's Wrong: Tasks have natural dependencies. Tests depend on functions existing.
The Fix: Map dependencies explicitly:
- Task 1: Implement add() → Task 2: Test add() (depends on Task 1)
- Task 3: Implement divide() → Task 4: Test divide() (depends on Task 3)
Try With AI
Ready to validate your task breakdown and prepare for implementation? Test your tasks:
🔍 Explore Task Atomicity:
"Review my task breakdown at
specs/calculator/tasks.md. For each task, evaluate: (1) Is it atomic (does ONE thing with ONE acceptance criterion)? (2) Is it sized right (1-2 hours, not days or minutes)? (3) Can it be tested independently? Identify any tasks that are too large (need splitting) or too small (should be combined)."
🎯 Practice Dependency Analysis:
"Analyze the dependencies in my task list. Are they correct and logical? What's the critical path (minimum sequence to reach 'done')? Which tasks could run in parallel? If I had 3 developers, how would you distribute these tasks? Draw me a dependency graph showing which tasks block others."
🧪 Test Checkpoint Readiness:
"I'm about to implement Task 1: [describe your first task]. Walk me through the checkpoint pattern: (1) What should AI generate? (2) What should I review for (not just 'does it work')? (3) What makes a good commit message? (4) How do I know I'm ready for the next task? Give me a checklist for each checkpoint phase."
🚀 Apply to Your Project:
"I need to break down [describe your project] into atomic tasks. Help me apply the task decomposition principles: Show me how to decompose ONE complex feature (like 'user authentication') into 5-8 atomic tasks with clear dependencies and acceptance criteria. Explain your reasoning for each split."