Lesson 4: Structured Data Formats (CSV and JSON)
Real-world applications rarely work with data in isolation. You need to import employee lists from Excel, export analysis results to share with colleagues, integrate with APIs that return JSON, and store application data for later retrieval. This lesson teaches you how to work with two of the most common structured data formats: CSV (Comma-Separated Values) for tabular data and JSON (JavaScript Object Notation) for nested, hierarchical data.
By the end of this lesson, you'll understand when to use each format, how to read and write both with Python's standard library modules, and how to handle encoding correctly so international characters (emoji, accented letters, non-Latin scripts) are preserved.
Why Structured Data Formats Matter
Before diving into code, let's understand the problem these formats solve. When you have data—a list of contacts, inventory records, or application configuration—you need to:
- Store it persistently — Save data to disk so it survives when your program closes
- Share it with others — Export to Excel, send to API, or hand off to colleague
- Integrate with systems — Read data that other applications created (database exports, web APIs, configuration files)
- Structure it meaningfully — Preserve relationships between data (customer has multiple orders, organization has departments)
Structured formats give you a standard way to solve all these problems. CSV and JSON are ubiquitous because they're human-readable, language-independent, and handle international characters correctly.
Key decision: Use CSV for tabular data (rows and columns, like spreadsheets) and JSON for hierarchical data (nested objects, mixed types, API responses).
Understanding CSV: Simple, Tabular Data
CSV (Comma-Separated Values) represents data as rows and columns, just like a spreadsheet. Here's what a CSV file looks like:
name,age,department
Alice,28,Engineering
Bob,35,Sales
Carol,32,Marketing
Structure: Each row is one record. Each column is a field. Commas separate values. The first row (usually) contains headers—field names.
Strengths:
- Simple, human-readable format
- Native support in Excel, Google Sheets, databases
- Compact (no extra formatting syntax)
- Perfect for tabular data (employees, contacts, inventory)
Limitations:
- Only handles flat data (rows and columns)
- Can't represent nested structures (customer with multiple orders)
- Requires conventions for special cases (commas in values, newlines in cells)
When to use CSV: Sales data, employee directories, scientific measurements, any rectangular dataset
Understanding JSON: Flexible, Hierarchical Data
JSON (JavaScript Object Notation) represents data as nested objects and arrays, with support for strings, numbers, booleans, and null. Here's a JSON example:
{
"id": "1",
"title": "Python Basics",
"tags": ["learning", "python"],
"metadata": {
"created": "2025-11-09",
"updated": "2025-11-09"
}
}
Structure: Data is organized as key-value pairs (objects) and ordered lists (arrays). Values can be strings, numbers, booleans, null, objects, or arrays.
Strengths:
- Handles nested, hierarchical data naturally
- Supports multiple data types (strings, numbers, booleans, null)
- Standard format for web APIs (REST, GraphQL)
- Self-documenting (keys describe values)
- Readable with proper indentation
Limitations:
- More verbose than CSV (extra syntax)
- Overkill for simple tabular data
- Requires proper syntax (trailing commas cause errors)
When to use JSON: API responses, configuration files, application state, nested data structures, anything with hierarchy or mixed types
Core Concept: The csv Module
Python's csv module provides two main readers:
csv.reader — Simple tuple-based access
import csv
with open('employees.csv', 'r', encoding='utf-8') as f:
reader = csv.reader(f)
header = next(reader) # Get first row as list of headers
for row in reader:
print(row) # row is a tuple: ('Alice', '28', 'Engineering')
name = row[0] # Access by index
Access method: By index (row[0], row[1], row[2])
Advantage: Lightweight, works with any CSV structure
Disadvantage: Requires memorizing column positions (error-prone)
csv.DictReader — Dictionary-based access with headers
import csv
with open('employees.csv', 'r', encoding='utf-8') as f:
reader = csv.DictReader(f) # Automatically reads first row as headers
for row in reader:
print(row) # row is a dict: {'name': 'Alice', 'age': '28', 'department': 'Engineering'}
name = row['name'] # Access by key (more readable!)
Access method: By column name (row['name'], row['age'])
Advantage: Self-documenting, column names clear, robust to column reordering
Disadvantage: Slightly more overhead (creates dicts for each row)
Professional recommendation: Use csv.DictReader for production code—the clarity is worth the minimal overhead.
Core Concept: The json Module
Python's json module handles serialization (Python → JSON) and deserialization (JSON → Python):
json.load() — Read JSON from file
import json
with open('notes.json', 'r', encoding='utf-8') as f:
data = json.load(f) # data is now a Python object (dict or list)
json.dump() — Write JSON to file
import json
notes = [
{"id": "1", "title": "Python Basics", "tags": ["learning", "python"]},
{"id": "2", "title": "File I/O Guide", "tags": ["io", "files"]}
]
with open('notes.json', 'w', encoding='utf-8') as f:
json.dump(notes, f, indent=2, ensure_ascii=False)
Key parameters:
indent=2— Pretty-print with 2-space indentation (makes file readable)ensure_ascii=False— Preserve Unicode characters (emoji, accents, non-Latin scripts)encoding='utf-8'— Always use UTF-8 for text files
Core Concept: Character Encoding (UTF-8)
Problem: Without proper encoding, international characters become corrupted:
# WRONG — omits encoding
with open('messages.json', 'w') as f:
json.dump({"message": "Hello 👋"}, f) # Emoji might be escaped or corrupted
# RIGHT — explicit UTF-8
with open('messages.json', 'w', encoding='utf-8') as f:
json.dump({"message": "Hello 👋"}, f, ensure_ascii=False)
UTF-8: Universal, variable-width encoding that handles every human language plus emoji. It's the standard for web and modern applications.
ensure_ascii=False: By default, JSON escapes non-ASCII characters as \uXXXX (ugly and hard to read). Setting ensure_ascii=False keeps emoji and international characters visible in the file.
Best practice: Always use encoding='utf-8' and ensure_ascii=False for JSON files that contain international text.
💬 AI Colearning Prompt
"Show me the difference between JSON with
ensure_ascii=True(default) andensure_ascii=Falsewhen saving emoji. Which is more readable and why?"
Reading CSV Files: Pattern and Practice
Here's the professional pattern for reading CSV files:
Code Example 4.1: Reading CSV Files
import csv
from pathlib import Path
# Create sample CSV file
csv_file: Path = Path("employees.csv")
csv_file.write_text("""name,age,department
Alice,28,Engineering
Bob,35,Sales
Carol,32,Marketing
""")
# ========== Pattern 1: csv.reader (tuples) ==========
print("Using csv.reader (by index):")
with open(csv_file, 'r', encoding='utf-8') as f:
reader = csv.reader(f)
header = next(reader) # Manually get first row
print(f"Columns: {header}")
for row in reader:
# Access by index: row[0] = name, row[1] = age, row[2] = department
print(f" {row[0]} ({row[2]})")
# ========== Pattern 2: csv.DictReader (dictionaries) ==========
print("\nUsing csv.DictReader (by name):")
with open(csv_file, 'r', encoding='utf-8') as f:
reader = csv.DictReader(f) # Automatically reads headers
for row in reader:
# Access by column name: row['name'], row['age'], row['department']
print(f" {row['name']} - {row['department']}")
Specification Reference: This example demonstrates FR-019 (reading CSV with csv.reader and csv.DictReader)
AI Prompts Used:
- "Show me how to read CSV files with both csv.reader and csv.DictReader"
- "Explain the difference between accessing by index vs by column name"
Validation Steps:
- Created sample CSV with headers and 3 data rows
- Read with csv.reader and printed rows as tuples
- Read with csv.DictReader and accessed columns by name
- Verified that DictReader automatically handled headers
Expected Outcome: You see csv.DictReader is clearer—you use column names instead of remembering positions.
🎓 Instructor Commentary
In AI-native development, you don't memorize CSV module parameters—you understand your DATA STRUCTURE. Need column access by name? csv.DictReader. Dealing with unheadered data? csv.reader. Your AI generates the boilerplate; your job is specifying your intent.
Writing CSV Files: Preserving Structure
Writing CSV requires careful handling of headers and row formatting:
Code Example 4.2: Writing CSV Files
import csv
from pathlib import Path
# Data to write (list of dictionaries)
employees: list[dict[str, str | int]] = [
{"name": "Alice", "age": 28, "department": "Engineering"},
{"name": "Bob", "age": 35, "department": "Sales"},
{"name": "Carol", "age": 32, "department": "Marketing"},
]
csv_file: Path = Path("output.csv")
# ========== Writing with csv.DictWriter ==========
with open(csv_file, 'w', newline='', encoding='utf-8') as f:
fieldnames = ["name", "age", "department"] # Define column order
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader() # Write header row
writer.writerows(employees) # Write data rows
# ========== Verify by reading back ==========
print("Written CSV content:")
with open(csv_file, 'r', encoding='utf-8') as f:
print(f.read())
# Output:
# name,age,department
# Alice,28,Engineering
# Bob,35,Sales
# Carol,32,Marketing
Key detail: newline='' in open() — This prevents extra blank lines in CSV output (platform-specific newline handling).
Specification Reference: This example demonstrates FR-020 (writing CSV with proper headers)
Validation Steps:
- Created list of dictionaries with employee data
- Opened file in write mode with newline=''
- Used csv.DictWriter to write headers and rows
- Read file back to verify output format
- Confirmed headers and data rows are correctly formatted
Expected Outcome: You understand csv.DictWriter handles header order and formatting automatically.
🚀 CoLearning Challenge
Ask your AI Co-Teacher:
"I have employee data with fields: id, name, email, hire_date. Write code using csv.DictWriter to export this to a CSV file. Then explain why we use DictWriter instead of manually writing strings with commas."
Expected Outcome: You'll understand csv.DictWriter handles edge cases (commas in values, special characters) automatically.
Reading and Writing JSON: Serialization/Deserialization
JSON is Python's preferred format for storing application data:
Code Example 4.3: Reading and Writing JSON
import json
from pathlib import Path
# ========== Create Python data structure ==========
notes: list[dict[str, str | list[str]]] = [
{"id": "1", "title": "Python Basics", "tags": ["learning", "python"]},
{"id": "2", "title": "File I/O Guide", "tags": ["io", "files", "python"]},
]
json_file: Path = Path("notes.json")
# ========== Write to JSON (serialization) ==========
with open(json_file, 'w', encoding='utf-8') as f:
json.dump(notes, f, indent=2, ensure_ascii=False)
print(f"Written {json_file}")
# ========== Display raw file content ==========
print("\nFile content (human-readable):")
print(json_file.read_text(encoding='utf-8'))
# Output:
# [
# {
# "id": "1",
# "title": "Python Basics",
# "tags": [
# "learning",
# "python"
# ]
# },
# {
# "id": "2",
# "title": "File I/O Guide",
# "tags": [
# "io",
# "files",
# "python"
# ]
# }
# ]
# ========== Read from JSON (deserialization) ==========
with open(json_file, 'r', encoding='utf-8') as f:
loaded_notes: list[dict] = json.load(f)
print(f"\nLoaded {len(loaded_notes)} notes:")
for note in loaded_notes:
print(f" - {note['title']} (tags: {', '.join(note['tags'])})")
# Output:
# Loaded 2 notes:
# - Python Basics (tags: learning, python)
# - File I/O Guide (tags: io, files, python)
Key points:
json.dump()converts Python objects → JSON (serialization)json.load()converts JSON → Python objects (deserialization)indent=2makes output human-readable (not required, but professional)- Nested structures (dicts, lists) are preserved perfectly
Specification Reference: This example demonstrates FR-021 and FR-022 (JSON load and dump with encoding)
Validation Steps:
- Created list of dicts with nested structure
- Used json.dump() with indent=2 for readability
- Displayed raw file content to show pretty-printing
- Used json.load() to read back into Python
- Verified structure is preserved exactly
Expected Outcome: You see JSON preserves nested structures perfectly—Python dicts become JSON objects, lists become arrays.
Handling Encoding for International Text
One of the biggest real-world problems: international characters get corrupted. Here's how to prevent it:
Code Example 4.4: Handling Encoding for International Text
import json
from pathlib import Path
# ========== Data with international characters ==========
messages: list[dict[str, str]] = [
{"user": "Alice", "text": "Hello! 👋"},
{"user": "Bob", "text": "Привет (Russian for hello)"},
{"user": "Carol", "text": "你好 (Chinese for hello)"},
]
json_file: Path = Path("messages.json")
# ========== WRONG: omits ensure_ascii=False ==========
print("WITHOUT ensure_ascii=False (default):")
with open(json_file, 'w', encoding='utf-8') as f:
json.dump(messages, f, indent=2) # ensure_ascii=True by default
# Looks ugly: emoji and accents become escaped
content = json_file.read_text(encoding='utf-8')
print(content)
# Output shows: \u1f44b (escaped emoji) and \u043f (escaped Cyrillic)
# ========== CORRECT: use ensure_ascii=False ==========
print("\n\nWITH ensure_ascii=False:")
with open(json_file, 'w', encoding='utf-8') as f:
json.dump(messages, f, indent=2, ensure_ascii=False)
# Looks great: emoji and international characters visible
content = json_file.read_text(encoding='utf-8')
print(content)
# Output shows readable emoji 👋 and text "Привет", "你好"
# ========== Read back and verify ==========
print("\n\nLoaded messages:")
with open(json_file, 'r', encoding='utf-8') as f:
loaded = json.load(f)
for msg in loaded:
print(f" {msg['user']}: {msg['text']}")
# Output:
# Alice: Hello! 👋
# Bob: Привет (Russian for hello)
# Carol: 你好 (Chinese for hello)
The lesson: Always use both:
encoding='utf-8'in open()ensure_ascii=Falsein json.dump()
Without both, emoji and international characters become either corrupted or escaped.
Specification Reference: This example demonstrates FR-024 (understanding encoding and ensure_ascii)
Validation Steps:
- Created data with emoji and international scripts
- Demonstrated problem: escaped characters without ensure_ascii=False
- Showed solution: both encoding='utf-8' and ensure_ascii=False
- Read back and verified characters preserved
- Confirmed file is both readable (in text editor) and parseable (by json.load())
Expected Outcome: You understand why UTF-8 + ensure_ascii=False matter for international applications.
✨ Teaching Tip
Use Claude Code to explore encoding issues: "What happens if I save JSON with emoji but use
encoding='latin-1'or omitensure_ascii=False? Show me the broken output and how to fix it."
Error Handling for Format Errors
Real-world files are often corrupted, incomplete, or malformed. Here's professional error handling:
Code Example 4.5: Error Handling for Format Errors
import json
import csv
from pathlib import Path
# ========== Safe JSON loading ==========
def safe_load_json(filename: str) -> list[dict] | None:
"""Load JSON with comprehensive error handling."""
try:
path = Path(filename)
with open(path, 'r', encoding='utf-8') as f:
return json.load(f)
except FileNotFoundError:
print(f"Error: File '{filename}' not found")
return None
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON format - {e.msg} at line {e.lineno}")
return None
except Exception as e:
print(f"Unexpected error: {e}")
return None
# ========== Safe CSV loading ==========
def safe_load_csv(filename: str) -> list[dict[str, str]] | None:
"""Load CSV with error handling."""
try:
path = Path(filename)
with open(path, 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)
if reader.fieldnames is None:
print("Error: CSV file is empty or has no headers")
return None
return list(reader)
except FileNotFoundError:
print(f"Error: File '{filename}' not found")
return None
except ValueError as e:
print(f"Error: Invalid CSV format - {e}")
return None
except Exception as e:
print(f"Unexpected error: {e}")
return None
# ========== Test with valid files ==========
valid_json = Path("valid.json")
valid_json.write_text('[{"name": "Alice"}, {"name": "Bob"}]')
data = safe_load_json("valid.json")
print(f"Loaded JSON: {data}")
valid_csv = Path("valid.csv")
valid_csv.write_text("name,age\nAlice,28\nBob,35")
data = safe_load_csv("valid.csv")
print(f"Loaded CSV: {data}")
# ========== Test with invalid files ==========
invalid_json = Path("invalid.json")
invalid_json.write_text('[{"name": "Alice"} BROKEN') # Missing closing bracket
print("\nAttempting to load invalid JSON:")
data = safe_load_json("invalid.json")
# Output: Error: Invalid JSON format - Expecting ',' delimiter at line 1
# Missing file
print("\nAttempting to load missing file:")
data = safe_load_json("nonexistent.json")
# Output: Error: File 'nonexistent.json' not found
What this teaches:
- JSONDecodeError includes line and position information (helps debug)
- FileNotFoundError for missing files
- Return None on error (allows graceful handling)
- Check for empty CSV (no headers = invalid)
Specification Reference: This example demonstrates FR-023 (handling JSONDecodeError and ValueError)
Validation Steps:
- Wrote safe_load_json() with specific exception handling
- Wrote safe_load_csv() with header validation
- Tested both with valid files
- Tested with invalid JSON (syntax error)
- Tested with missing file
- Verified error messages are helpful, not crashes
Expected Outcome: You see professional error handling prevents crashes and provides actionable error messages.
Python 3.14 Improved Error Messages
Python 3.14 enhanced JSON error reporting with exception notes that identify exactly where errors occur:
# Python 3.14+ provides richer error context
try:
data = json.loads('{"name": "Alice", "age": }') # Missing value
except json.JSONDecodeError as e:
print(f"Error: {e}")
# Python 3.14+ adds a note showing the exact issue:
# "Expecting value: line 1 column 28 (char 27)"
# The note highlights the missing value location
What improved: In Python 3.13 and earlier, you got basic line/column numbers. In Python 3.14+, exception notes provide contextual hints about what's wrong (missing value, trailing comma, unclosed string, etc.).
Why this matters for AI-native development: When JSON parsing fails, your AI companion gets better error messages to diagnose the issue. This means faster debugging—your AI can pinpoint "missing closing brace on line 12" instead of generic "invalid JSON" errors.
For this chapter: The error handling patterns we teach work in all Python versions. When you encounter JSON errors, Python 3.14+ will just give you more helpful messages automatically.
Choosing Between CSV and JSON: Decision Framework
When should you use each format? Here's the decision tree:
| Question | Answer | Use CSV | Use JSON |
|---|---|---|---|
| Is data tabular? (rows and columns) | Yes | ✓ CSV | |
| Does data have nested structure? (objects within objects) | Yes | ✓ JSON | |
| Will it open in Excel? | Required | ✓ CSV | |
| Is it an API response? | Yes | ✓ JSON | |
| Configuration file? | Yes | ✓ JSON | |
| Only strings and numbers? | No (has booleans, nulls) | ✓ CSV |
Real-world examples:
- CSV: Employee directory, sales data, scientific measurements, database exports, anything from Excel
- JSON: API responses, application configuration, note storage (Lesson 5), any hierarchical data, web application state
Professional guideline: When in doubt, use JSON. It's more flexible, handles edge cases better, and integrates naturally with web APIs.
Practice Exercises
Exercise 1: Read and Display CSV
Write a program that:
- Reads a CSV file with at least 3 columns and 5 rows
- Displays each row in a formatted table with column headers
- Uses csv.DictReader for clean column access
Validation: Test with a file containing special characters (commas in values, quotes)
Success: All rows display correctly, headers shown
Exercise 2: Convert CSV to JSON
Write a program that:
- Reads employee data from a CSV file
- Converts to a Python list of dictionaries
- Saves to JSON with
indent=2andensure_ascii=False - Reads back and verifies structure matches
Validation: Test with data containing international characters (names with accents, emoji in descriptions)
Success: CSV and JSON contain same data, international characters preserved
Exercise 3: Error Handling for Both Formats
Write error-handling code that:
- Attempts to load both CSV and JSON files
- Catches FileNotFoundError, JSONDecodeError, and ValueError
- Provides helpful error messages (not just crashes)
- Returns None on failure so calling code can handle gracefully
Validation: Test with:
- Valid CSV and JSON files
- Corrupted JSON (missing bracket)
- Missing file
- Empty file
Success: No crashes, appropriate error messages for each case
Try With AI
These prompts help you synthesize everything this lesson taught: reading/writing both formats, handling encoding, managing errors, and choosing appropriate formats.
Prompt 1: Understand Format Differences
Ask your AI: "What's the difference between CSV and JSON?
When would you use each format for real-world applications?
Show me examples."
Expected Outcome: You understand when tabular (CSV) is appropriate
vs hierarchical/nested (JSON) data, and can justify format selection.
Prompt 2: Apply Format Conversion
Ask your AI: "Write code that reads a CSV file containing contacts
(name, email, phone), filters for a specific email domain, and saves
the results to JSON with proper UTF-8 encoding."
Expected Outcome: You see CSV→JSON transformation with filtering,
and understand how csv.DictReader and json.dump work together.
Prompt 3: Analyze Performance Tradeoffs
Ask your AI: "Compare reading a large CSV file (10,000 rows) with
csv.reader vs csv.DictReader. What's the memory and performance
difference? When would you use each?"
Expected Outcome: You understand tradeoffs between convenience
(DictReader) and efficiency (reader), and when each is appropriate.
Prompt 4: Synthesize Real-World Scenario
Ask your AI: "Design a data import system that reads CSV from a
user's export, validates required fields, handles encoding issues
(international names), converts to JSON, saves to file, and
implements comprehensive error handling. What error cases must
you handle? What does the data structure look like?"
Expected Outcome: You've integrated CSV/JSON with validation and
error handling in a realistic workflow, demonstrating mastery of
structured data formats. This prepares you for Lesson 5's Note-Taking
App capstone that persists data as JSON files.
Safety & Ethics Note: When importing data from external sources, always validate and sanitize inputs. Don't assume CSV/JSON files are trustworthy—implement bounds checking, type validation, and error handling. This protects your application and data integrity.