Skip to main content

Error Handling Strategies – Defensive Programming

Catching exceptions prevents crashes. But what happens after you catch an error? How do you recover? This lesson teaches you defensive programming patterns—strategic decisions about how to handle different types of errors. You'll learn when to retry, when to use fallback values, when to degrade gracefully, and how to log errors for debugging.

Professional developers don't just catch errors and move on. They think strategically: What kind of error is this? Is it temporary? Can I recover? Should I notify the user? What should I log?

Beyond Catching – The Four Error Handling Strategies

By now, you've learned try/except/finally blocks and custom exceptions. But catching an exception is just the first step. The real skill is deciding what to do when an error occurs.

Let's explore four defensive programming strategies:

Strategy 1: Retry Logic – "Try Again"

When to use: Errors that are transient (temporary). Network hiccups, briefly unavailable services, temporary file locks.

The idea: If an operation fails, wait a moment and try again. Many transient errors resolve themselves on retry.

Pattern:

Loading Python environment...

Key idea: Count attempts. On the last attempt, re-raise the exception so the caller knows it failed completely.

💬 AI Colearning Prompt

"Show me the difference between retry logic and just catching an exception and returning a default value. When would you choose retry vs. fallback?"

🎓 Expert Insight

In AI-native development, you don't guess at error handling—you analyze the error type. Transient errors (network timeouts) demand retry. Permanent errors (file not found) demand fallback or graceful degradation.

Strategy 2: Fallback Values – "Use a Default"

When to use: Errors that are permanent or unrecoverable. Missing files, invalid data, unavailable services with no option to retry.

The idea: When an operation fails, use a sensible default value instead.

Pattern:

Loading Python environment...

Key idea: The program continues with sensible defaults. User gets a notice, but doesn't lose functionality.

🚀 CoLearning Challenge

Ask your AI Co-Teacher:

"For a social media app, if loading the user's profile picture fails, what's a good fallback? Why? If loading the feed fails, is fallback appropriate?"

Expected Outcome: You'll understand that some features are essential (feed must load), while others can gracefully fail (profile picture can be placeholder).

Strategy 3: Graceful Degradation – "Keep Going With Less"

When to use: Non-critical features fail, but the system should continue. Feature unavailability doesn't mean complete failure.

The idea: When a secondary feature fails, skip it and continue. The user loses functionality but doesn't lose the whole program.

Pattern:

Loading Python environment...

Key idea: The program doesn't crash on bad data. Invalid rows are skipped with logging. Valid rows are processed. User gets partial results.

✨ Teaching Tip

Use Claude Code to test graceful degradation: "Create a CSV file with 10 rows: 7 valid, 3 with bad data. Run my parser and show what happens. Does it handle all error types?"

Strategy 4: Logging Errors – "Keep a Record"

When to use: All the above strategies benefit from logging. Record errors with context for debugging without crashing the program.

The idea: Print or record error details: what failed, when, why, and what data was involved. This record helps diagnose issues later.

Pattern:

Loading Python environment...

Key idea: Log includes timestamp, error type, context (the actual values), and human-readable message. This helps debugging.

💬 AI Colearning Prompt

"What should error logs include to be useful for debugging? What context is essential?"


Code Example 1: Retry Logic with Exponential Backoff

Many production systems use exponential backoff—wait longer between retries. First retry after 1 second, second after 2 seconds, third after 4 seconds. This prevents overwhelming a service.

Loading Python environment...

Output:

Attempt 1: Fetching https://api.example.com/data
Failed: Network timeout. Waiting 1s before retry 2
Attempt 2: Fetching https://api.example.com/data
Failed: Network timeout. Waiting 2s before retry 3
Attempt 3: Fetching https://api.example.com/data
Success: Data retrieved

Specification Reference: This code demonstrates B1 application—student applies retry pattern to realistic network scenario.


Code Example 2: Combining Fallback with Logging

Real code often combines multiple strategies. Here's fallback + logging:

Loading Python environment...

Key details:

  • Catches specific exceptions first (FileNotFoundError, JSONDecodeError)
  • Falls back to sensible defaults
  • Logs each failure with timestamp and context
  • Generic catch-all for unexpected errors
  • All errors are logged—none are silent

Code Example 3: Graceful Degradation in Data Processing

This example processes data, skipping invalid entries:

Loading Python environment...

Output:

WARN: User 2: Name is required, skipping
WARN: User 3: invalid literal for int() with base 10: 'invalid', skipping
Valid users: 2, Errors: 2
- Alice (age 30)
- Diana (age 28)
ERROR: User 2: Name is required
ERROR: User 3: invalid literal for int() with base 10: 'invalid'

This demonstrates:

  • Validation happens row-by-row
  • Invalid rows are skipped gracefully
  • Valid data is processed completely
  • Errors are reported without crashing

Code Example 4: Logging with Structured Context

Professional systems log with enough detail to diagnose issues:

Loading Python environment...

Logging includes:

  • Timestamp: When the error occurred
  • Error type: Specific exception class
  • Error message: Details from exception
  • Context: Related information (user ID, operation)
  • Data sample: What data caused the problem (truncated for safety)

This information helps debuggers understand: What failed, when, why, and what triggered it.


Choosing the Right Strategy

Here's a decision matrix for choosing error handling strategies:

Error TypeBest StrategyWhyExample
Transient (temporary)RetryError resolves on retryNetwork timeout, service briefly down
Permanent, predictableFallbackOperation will always failFile not found, invalid format
Non-critical featureGraceful degradationCore function continuesThumbnail loading fails, app still works
All errorsLoggingTrack for debuggingRecord what happened for diagnosis

Exercise 1: Implement Retry Logic

Write a function that simulates a network request, fails twice, succeeds on third attempt. Add logging.

Loading Python environment...

Expected outcome: Your function retries the operation, logs each attempt, and succeeds when the underlying function succeeds.


Exercise 2: Implement Fallback for Missing Configuration

Write a function that loads configuration from a file, but falls back to defaults if missing or corrupted.

Loading Python environment...

Expected outcome: Function gracefully handles missing or corrupted files, returning sensible defaults while logging what happened.


Exercise 3: Combine Strategies – Robust Data Processing

Write a function that processes a list of user data, implementing all four strategies:

  • Retry: Attempts validation multiple times (in case data quality issue is temporary)
  • Fallback: Uses default values for missing fields
  • Graceful degradation: Skips invalid rows, continues processing others
  • Logging: Records errors with context

Loading Python environment...

Expected outcome: Function processes valid data, skips invalid, uses defaults where possible, logs all issues.


Key Takeaway: Defensive Programming Mindset

Professional developers ask:

  1. What errors are possible? (What exceptions could this raise?)
  2. Are they transient or permanent? (Will retry help?)
  3. What's the right recovery strategy? (Retry, fallback, degrade?)
  4. What should I log? (What context helps debugging?)

This systematic thinking transforms error handling from "catch and hope" to "anticipate and recover."


Try With AI

Master strategic error handling through decision frameworks for production systems.

🔍 Explore Strategy Selection:

"Explain the four error handling strategies: retry (transient errors), fallback (use defaults), graceful degradation (feature unavailable, core works), and fail-fast (unrecoverable). Show weather app examples for API fetch (critical), cached history (optional), location upload (background), and profile picture (UI)."

🎯 Practice Error Classification:

"I have a weather app with 4 operations. For each, identify: what errors occur (API timeout, file missing, network failure), are they transient or permanent, which strategy fits best, and what to log. Walk me through decision criteria for each operation."

🧪 Test Strategy Edge Cases:

"Analyze three scenarios: (A) Database connection retry with exponential backoff, (B) Corrupted image upload retry, (C) Silent fallback for corrupted JSON preferences. For each, critique my strategy choice and explain when retry vs fail-fast vs fallback is appropriate."

🚀 Apply Production Decision Framework:

"Build error_strategies.py with four decorator functions: @retry_on_failure(max_attempts=3, backoff=exponential), @fallback_on_error(default_value=None), @degrade_gracefully(feature_name), and @log_errors(log_level). Include decision tree for error classification (transient vs permanent, critical vs non-critical) and logging best practices."


Safety and Responsible Error Handling

As you implement error handling, remember:

  • Log securely: Don't log passwords, API keys, or personal data
  • User-friendly messages: Tell users what went wrong and what to do (not internal error codes)
  • Graceful failure: Never silently ignore errors—log and inform
  • Test error paths: Intentionally trigger errors to verify recovery works