Tasks Phase - Atomic Work Units and Checkpoints

You now have:

✅ A clear specification
✅ A detailed implementation plan
✅ Documented architecture decisions (ADRs)

Next: Break the plan into atomic work units (tasks) that you'll implement.

This lesson teaches the checkpoint pattern-the critical workflow practice that keeps YOU in control. The pattern is:

Agent: "Here's Phase 1 code"
You: "Review... looks good!"
You: "Commit to git"
You: "Tell me what's next"
Agent: "Phase 2"

NOT:

Agent: "Here's everything" (no human control)

The difference is huge. Checkpoints keep you in control and catch issues early.

What Are Tasks?

A task is a unit of work that:

Takes 1-2 hours to complete
Has a single, clear acceptance criterion
Depends on specific other tasks
Can be reviewed and approved individually

Task Properties

Size: 2-5 minutes

Too small (>1 minute) = too many micro-tasks
Too large (30+ minutes) = hard to review, hard to fix if wrong

Criterion: Single, testable

"Write add operation" ✅
"Write add operation and all tests" ❌ (two things)
"Write something" ❌ (untestable)

Independence: Can be reviewed individually

Doesn't require other tasks to be done first
Or clearly depends on specific other tasks

The Checkpoint Pattern (CRITICAL)

This is the most important concept in this lesson. The checkpoint pattern is how you maintain control of the workflow.

Pattern Definition

Loop:
Agent: "I've completed Phase X"
Human: "Review the work"
Human: "APPROVE" → Commit to git
Human: "Tell me next step"

Why Checkpoints Matter

Without Checkpoints (dangerous):

You: "Build my calculator"
Agent: "Done! 5000 lines of code, 47 files. All automated. You're welcome."
You: "Uh... wait, I need to review this..."
Agent: "Too late, already committed and deployed!"

With Checkpoints (controlled):

You: "Start implementation"
Agent: "Phase 1 (Core Operations) complete. 200 lines, ready for review."
You: "Read code... looks good. Commits. What's next?"
Agent: "Phase 2 (Tests) starting"
You: "Review tests... found a bug in edge case handling"
You: "Tell agent, agent fixes, re-reviews, commits"
Agent: "Phase 3..."

Your Role in Each Checkpoint

Step 1: Human Reviews

Read the generated code/tests
Ask: "Does this match the spec?"
Ask: "Are there bugs or edge cases missed?"
Ask: "Is the code understandable?"

Step 2: Human Decides

Approve ("Looks good, commit")
Reject ("Fix this issue")
Request clarification ("Explain this code")

Step 3: Human Directs

"What's next?"
You initiate next phase
Agent doesn't autonomously continue

Generating Your Tasks

Step 1: Run /sp.tasks

In Claude Code, from your calculator-project directory:

/sp.tasks

My calculator specification is at specs/calculator/spec.md  
My implementation plan is at specs/calculator/plan.md  

Please decompose the plan into atomic work units (tasks), each ≤ 2 hours,  
testable, reversible, and with clear dependencies.

Use a TDD approach: for each operation (add, subtract, etc.),  
1️⃣ Write RED tests → 2️⃣ Implement → 3️⃣ Refactor.  
Pause after each group for human review before committing.

Also:
- Use Context7 MCP server for documentation lookups.
- Prefer CLI automation where possible.
- Ensure easy rollback and traceability.

Step 2: Review Generated Tasks

The tasks.md should show:

Task 1: [Description] - 1-2 hours - Depends on: Nothing
Task 2: [Description] - 1.5 hours - Depends on: Task 1
Task 3: [Description] - 2 hours - Depends on: Task 1, Task 2
...

Understanding Your Task Breakdown (15 minutes)

Review your tasks and verify:

Dependency Graph

Here's how your calculator tasks depend on each other:

TDD Workflow: 🔴 RED (test) → 🟢 GREEN (implement) → 🔵 REFACTOR/DOCS

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  Task 1: 🔴 Write RED test: add()                              │
│      ↓                                                          │
│  Task 2: 🟢 Implement add()                                    │
│      ↓                                                          │
│  Task 3: 🔴 Write RED test: subtract()                         │
│      ↓                                                          │
│  Task 4: 🟢 Implement subtract()                               │
│      ↓                                                          │
│  Task 5: 🔴 Write RED test: multiply()                         │
│      ↓                                                          │
│  Task 6: 🟢 Implement multiply()                               │
│      ↓                                                          │
│  Task 7: 🔴 Write RED test: divide() + error cases            │
│      ↓                                                          │
│  Task 8: 🟢 Implement divide() + error handling               │
│      ↓                                                          │
│  Task 9: 🔴 Write RED test: power() + edge cases              │
│      ↓                                                          │
│  Task 10: 🟢 Implement power() + edge case handling           │
│      ↓                                                          │
│  Task 11: 🔵 Write documentation + finalize                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pattern: Each operation follows RED → GREEN cycle
         Tests MUST exist before implementation

Legend:

🔴 Red tasks = Write failing tests first (TDD)
🟢 Green tasks = Implement code to make tests pass
🔵 Blue tasks = Documentation and polish

Key Insight: Tests MUST exist before implementation. You cannot implement Task 2 (add function) without Task 1 (add tests) being complete. This is the TDD (Test-Driven Development) pattern.

Lineage Traceability

Pick one task. Can you trace it back?

Specification: "Calculator must add two numbers"
  ↓
Plan: "Phase 1: Core Operations - Implement basic arithmetic"
  ↓
Task 1.1: "Implement add(a, b) returning float, handling negative inputs"
  ↓
Acceptance Criterion: "add(5, 3) = 8.0, add(-2, 5) = 3.0, add('5', 3) raises TypeError"

If you can trace this lineage, your tasks are well-connected to your specification.

Commit Your Tasks

Commit the generated tasks to git:

/sp.git_commit_pr commit the current work in same branch

Common Mistakes

Mistake 1: Tasks Too Large (8+ Hours)

The Error: "Task: Implement entire calculator (8-16 hours)"

Why It's Wrong: Large tasks hide complexity, delay feedback, and make checkpoints meaningless.

The Fix: Break into atomic units (1-2 hours each):

❌ Large: "Implement all operations"
✅ Atomic: "Implement add()" (30 min), "Implement multiply()" (30 min), "Implement divide() with error handling" (1 hour)

Mistake 2: Ignoring Dependencies

The Error: Planning to implement tests before implementing functions

Why It's Wrong: Tasks have natural dependencies. Tests depend on functions existing.

The Fix: Map dependencies explicitly:

Task 1: Implement add() → Task 2: Test add() (depends on Task 1)
Task 3: Implement divide() → Task 4: Test divide() (depends on Task 3)

Try With AI: Validate Task Breakdown

Use your AI companion to confirm your tasks are well-decomposed and ready for implementation.

Setup

Tool: Claude Code (or your configured AI orchestrator)

Context: Your tasks.md file

Goal: Confirm tasks are atomic, dependencies are correct, and checkpoint pattern will work

⚠️ Learning WITH AI (Not Generating FROM AI)

What this exercise teaches:

❌ DON'T ask: "Implement all these tasks for me"
❌ DON'T ask: "Write the code for Task 1"
✅ DO ask: "Are my tasks atomic (1-2 hours each)?"
✅ DO ask: "Are the dependencies correct?"
✅ DO ask: "Which tasks could I run in parallel?"

Your role: Validate task breakdown, understand dependencies, plan execution strategy AI's role: Review atomicity, validate dependencies, suggest improvements

Prompt Set (Copy-Paste Ready)

Prompt 1 - Task Atomicity Check

Copy and paste this into Claude Code:

I've broken my calculator plan into tasks. 

For each task, is it:
1. Atomic? (Does it do one thing with one acceptance criterion?)
2. Sized appropriately? (1-2 hours, not too small or large?)
3. Independent? (Can it be reviewed and tested separately?)

Any tasks that are too big, too small, or trying to do multiple things?

Prompt 2 - Dependency Validation

After task validation, ask:

Looking at my task dependencies, are they correct?

Are these dependencies logical? Would you change the order?
What's the critical path (minimum tasks to complete before "done")?

Prompt 3 - Checkpoint Pattern Understanding

Finally, ask:

I'm about to start the implementation phase, and I want to use the
checkpoint pattern (Agent→Human Review→Commit→Next).
Is this the right approach? Any guidance on what to look for during review?

Expected Outcomes

Tasks are atomic and appropriately sized
Dependencies are correct
You understand the checkpoint pattern
Ready for implementation (Lesson 7)

What Are Tasks?​

Task Properties​

The Checkpoint Pattern (CRITICAL)​

Pattern Definition​

Why Checkpoints Matter​

Your Role in Each Checkpoint​

Generating Your Tasks​

Step 1: Run /sp.tasks​

Step 2: Review Generated Tasks​

Understanding Your Task Breakdown (15 minutes)​

Dependency Graph​

Lineage Traceability​

Commit Your Tasks​

Common Mistakes​

Mistake 1: Tasks Too Large (8+ Hours)​

Mistake 2: Ignoring Dependencies​

Try With AI: Validate Task Breakdown​

Setup​

Prompt Set (Copy-Paste Ready)​

Expected Outcomes​