Structured Data Formats (CSV and JSON)
Real-world applications rarely work with data in isolation. You need to import employee lists from Excel, export analysis results to share with colleagues, integrate with APIs that return JSON, and store application data for later retrieval. This lesson teaches you how to work with two of the most common structured data formats: CSV (Comma-Separated Values) for tabular data and JSON (JavaScript Object Notation) for nested, hierarchical data.
By the end of this lesson, you'll understand when to use each format, how to read and write both with Python's standard library modules, and how to handle encoding correctly so international characters (emoji, accented letters, non-Latin scripts) are preserved.
Why Structured Data Formats Matter
Before diving into code, let's understand the problem these formats solve. When you have data—a list of contacts, inventory records, or application configuration—you need to:
- Store it persistently — Save data to disk so it survives when your program closes
- Share it with others — Export to Excel, send to API, or hand off to colleague
- Integrate with systems — Read data that other applications created (database exports, web APIs, configuration files)
- Structure it meaningfully — Preserve relationships between data (customer has multiple orders, organization has departments)
Structured formats give you a standard way to solve all these problems. CSV and JSON are ubiquitous because they're human-readable, language-independent, and handle international characters correctly.
Key decision: Use CSV for tabular data (rows and columns, like spreadsheets) and JSON for hierarchical data (nested objects, mixed types, API responses).
Understanding CSV: Simple, Tabular Data
CSV (Comma-Separated Values) represents data as rows and columns, just like a spreadsheet. Here's what a CSV file looks like:
name,age,department
Alice,28,Engineering
Bob,35,Sales
Carol,32,Marketing
Structure: Each row is one record. Each column is a field. Commas separate values. The first row (usually) contains headers—field names.
Strengths:
- Simple, human-readable format
- Native support in Excel, Google Sheets, databases
- Compact (no extra formatting syntax)
- Perfect for tabular data (employees, contacts, inventory)
Limitations:
- Only handles flat data (rows and columns)
- Can't represent nested structures (customer with multiple orders)
- Requires conventions for special cases (commas in values, newlines in cells)
When to use CSV: Sales data, employee directories, scientific measurements, any rectangular dataset
Understanding JSON: Flexible, Hierarchical Data
JSON (JavaScript Object Notation) represents data as nested objects and arrays, with support for strings, numbers, booleans, and null. Here's a JSON example:
{
"id": "1",
"title": "Python Basics",
"tags": ["learning", "python"],
"metadata": {
"created": "2025-11-09",
"updated": "2025-11-09"
}
}
Structure: Data is organized as key-value pairs (objects) and ordered lists (arrays). Values can be strings, numbers, booleans, null, objects, or arrays.
Strengths:
- Handles nested, hierarchical data naturally
- Supports multiple data types (strings, numbers, booleans, null)
- Standard format for web APIs (REST, GraphQL)
- Self-documenting (keys describe values)
- Readable with proper indentation
Limitations:
- More verbose than CSV (extra syntax)
- Overkill for simple tabular data
- Requires proper syntax (trailing commas cause errors)
When to use JSON: API responses, configuration files, application state, nested data structures, anything with hierarchy or mixed types
Core Concept: The csv Module
Python's csv module provides two main readers:
csv.reader — Simple tuple-based access
Loading Python environment...
Access method: By index (row[0], row[1], row[2])
Advantage: Lightweight, works with any CSV structure
Disadvantage: Requires memorizing column positions (error-prone)
csv.DictReader — Dictionary-based access with headers
Loading Python environment...
Access method: By column name (row['name'], row['age'])
Advantage: Self-documenting, column names clear, robust to column reordering
Disadvantage: Slightly more overhead (creates dicts for each row)
Professional recommendation: Use csv.DictReader for production code—the clarity is worth the minimal overhead.
Core Concept: The json Module
Python's json module handles serialization (Python → JSON) and deserialization (JSON → Python):
json.load() — Read JSON from file
Loading Python environment...
json.dump() — Write JSON to file
Loading Python environment...
Key parameters:
indent=2— Pretty-print with 2-space indentation (makes file readable)ensure_ascii=False— Preserve Unicode characters (emoji, accents, non-Latin scripts)encoding='utf-8'— Always use UTF-8 for text files
Core Concept: Character Encoding (UTF-8)
Problem: Without proper encoding, international characters become corrupted:
Loading Python environment...
UTF-8: Universal, variable-width encoding that handles every human language plus emoji. It's the standard for web and modern applications.
ensure_ascii=False: By default, JSON escapes non-ASCII characters as \uXXXX (ugly and hard to read). Setting ensure_ascii=False keeps emoji and international characters visible in the file.
Best practice: Always use encoding='utf-8' and ensure_ascii=False for JSON files that contain international text.
💬 AI Colearning Prompt
"Show me the difference between JSON with
ensure_ascii=True(default) andensure_ascii=Falsewhen saving emoji. Which is more readable and why?"
Reading CSV Files: Pattern and Practice
Here's the professional pattern for reading CSV files:
Code Example 4.1: Reading CSV Files
Loading Python environment...
Specification Reference: This example demonstrates FR-019 (reading CSV with csv.reader and csv.DictReader)
AI Prompts Used:
- "Show me how to read CSV files with both csv.reader and csv.DictReader"
- "Explain the difference between accessing by index vs by column name"
Validation Steps:
- Created sample CSV with headers and 3 data rows
- Read with csv.reader and printed rows as tuples
- Read with csv.DictReader and accessed columns by name
- Verified that DictReader automatically handled headers
Expected Outcome: You see csv.DictReader is clearer—you use column names instead of remembering positions.
🎓 Expert Insight
In AI-native development, you don't memorize CSV module parameters—you understand your DATA STRUCTURE. Need column access by name? csv.DictReader. Dealing with unheadered data? csv.reader. Your AI generates the boilerplate; your job is specifying your intent.
Writing CSV Files: Preserving Structure
Writing CSV requires careful handling of headers and row formatting:
Code Example 4.2: Writing CSV Files
Loading Python environment...
Key detail: newline='' in open() — This prevents extra blank lines in CSV output (platform-specific newline handling).
Specification Reference: This example demonstrates FR-020 (writing CSV with proper headers)
Validation Steps:
- Created list of dictionaries with employee data
- Opened file in write mode with newline=''
- Used csv.DictWriter to write headers and rows
- Read file back to verify output format
- Confirmed headers and data rows are correctly formatted
Expected Outcome: You understand csv.DictWriter handles header order and formatting automatically.
🚀 CoLearning Challenge
Ask your AI Co-Teacher:
"I have employee data with fields: id, name, email, hire_date. Write code using csv.DictWriter to export this to a CSV file. Then explain why we use DictWriter instead of manually writing strings with commas."
Expected Outcome: You'll understand csv.DictWriter handles edge cases (commas in values, special characters) automatically.
Reading and Writing JSON: Serialization/Deserialization
JSON is Python's preferred format for storing application data:
Code Example 4.3: Reading and Writing JSON
Loading Python environment...
Key points:
json.dump()converts Python objects → JSON (serialization)json.load()converts JSON → Python objects (deserialization)indent=2makes output human-readable (not required, but professional)- Nested structures (dicts, lists) are preserved perfectly
Specification Reference: This example demonstrates FR-021 and FR-022 (JSON load and dump with encoding)
Validation Steps:
- Created list of dicts with nested structure
- Used json.dump() with indent=2 for readability
- Displayed raw file content to show pretty-printing
- Used json.load() to read back into Python
- Verified structure is preserved exactly
Expected Outcome: You see JSON preserves nested structures perfectly—Python dicts become JSON objects, lists become arrays.
Handling Encoding for International Text
One of the biggest real-world problems: international characters get corrupted. Here's how to prevent it:
Code Example 4.4: Handling Encoding for International Text
Loading Python environment...
The lesson: Always use both:
encoding='utf-8'in open()ensure_ascii=Falsein json.dump()
Without both, emoji and international characters become either corrupted or escaped.
Specification Reference: This example demonstrates FR-024 (understanding encoding and ensure_ascii)
Validation Steps:
- Created data with emoji and international scripts
- Demonstrated problem: escaped characters without ensure_ascii=False
- Showed solution: both encoding='utf-8' and ensure_ascii=False
- Read back and verified characters preserved
- Confirmed file is both readable (in text editor) and parseable (by json.load())
Expected Outcome: You understand why UTF-8 + ensure_ascii=False matter for international applications.
✨ Teaching Tip
Use Claude Code to explore encoding issues: "What happens if I save JSON with emoji but use
encoding='latin-1'or omitensure_ascii=False? Show me the broken output and how to fix it."
Error Handling for Format Errors
Real-world files are often corrupted, incomplete, or malformed. Here's professional error handling:
Code Example 4.5: Error Handling for Format Errors
Loading Python environment...
What this teaches:
- JSONDecodeError includes line and position information (helps debug)
- FileNotFoundError for missing files
- Return None on error (allows graceful handling)
- Check for empty CSV (no headers = invalid)
Specification Reference: This example demonstrates FR-023 (handling JSONDecodeError and ValueError)
Validation Steps:
- Wrote safe_load_json() with specific exception handling
- Wrote safe_load_csv() with header validation
- Tested both with valid files
- Tested with invalid JSON (syntax error)
- Tested with missing file
- Verified error messages are helpful, not crashes
Expected Outcome: You see professional error handling prevents crashes and provides actionable error messages.
Python 3.14 Improved Error Messages
Python 3.14 enhanced JSON error reporting with exception notes that identify exactly where errors occur:
Loading Python environment...
What improved: In Python 3.13 and earlier, you got basic line/column numbers. In Python 3.14+, exception notes provide contextual hints about what's wrong (missing value, trailing comma, unclosed string, etc.).
Why this matters for AI-native development: When JSON parsing fails, your AI companion gets better error messages to diagnose the issue. This means faster debugging—your AI can pinpoint "missing closing brace on line 12" instead of generic "invalid JSON" errors.
For this chapter: The error handling patterns we teach work in all Python versions. When you encounter JSON errors, Python 3.14+ will just give you more helpful messages automatically.
Choosing Between CSV and JSON: Decision Framework
When should you use each format? Here's the decision tree:
| Question | Answer | Use CSV | Use JSON |
|---|---|---|---|
| Is data tabular? (rows and columns) | Yes | ✓ CSV | |
| Does data have nested structure? (objects within objects) | Yes | ✓ JSON | |
| Will it open in Excel? | Required | ✓ CSV | |
| Is it an API response? | Yes | ✓ JSON | |
| Configuration file? | Yes | ✓ JSON | |
| Only strings and numbers? | No (has booleans, nulls) | ✓ CSV |
Real-world examples:
- CSV: Employee directory, sales data, scientific measurements, database exports, anything from Excel
- JSON: API responses, application configuration, note storage (Lesson 5), any hierarchical data, web application state
Professional guideline: When in doubt, use JSON. It's more flexible, handles edge cases better, and integrates naturally with web APIs.
Practice Exercises
Exercise 1: Read and Display CSV
Write a program that:
- Reads a CSV file with at least 3 columns and 5 rows
- Displays each row in a formatted table with column headers
- Uses csv.DictReader for clean column access
Validation: Test with a file containing special characters (commas in values, quotes)
Success: All rows display correctly, headers shown
Exercise 2: Convert CSV to JSON
Write a program that:
- Reads employee data from a CSV file
- Converts to a Python list of dictionaries
- Saves to JSON with
indent=2andensure_ascii=False - Reads back and verifies structure matches
Validation: Test with data containing international characters (names with accents, emoji in descriptions)
Success: CSV and JSON contain same data, international characters preserved
Exercise 3: Error Handling for Both Formats
Write error-handling code that:
- Attempts to load both CSV and JSON files
- Catches FileNotFoundError, JSONDecodeError, and ValueError
- Provides helpful error messages (not just crashes)
- Returns None on failure so calling code can handle gracefully
Validation: Test with:
- Valid CSV and JSON files
- Corrupted JSON (missing bracket)
- Missing file
- Empty file
Success: No crashes, appropriate error messages for each case
Try With AI
Master data format conversion between CSV and JSON with validation and encoding handling.
🔍 Explore Format Tradeoffs:
"Compare CSV vs JSON for employee data (name, email, department, start_year). Explain: CSV pros (compact, Excel-compatible, simple) vs JSON pros (nested data, type preservation, metadata). Show manual conversion and what's lost in each direction. When to choose each format?"
🎯 Practice Format Handling:
"Demonstrate
csv.DictReaderreading CSV rows as dictionaries andjson.dump()writing with indentation. Show edge cases: comma in name field ('Alice, Jr.'), newline in description (CSV breaks), emoji encoding (café, José with ensure_ascii=False), and performance for 100K records."
🧪 Test Special Characters:
"Create test data with problematic characters: CSV with commas ('Johnson, Jr.'), quotes, and newlines. JSON with international characters (café, José) and emojis. Show how CSV quoting handles commas, when parsing breaks, and proper encoding settings (utf-8, ensure_ascii=False) for international data."
🚀 Apply Migration Tool:
"Build bi-directional converter: CSV to JSON with validation (required fields, email format check, start_year range), add metadata (UUID, timestamp, data_source), handle errors (missing fields skip with warning, invalid email skip with report), save employees.json. Reverse: JSON to CSV. Show 'Imported 47 employees, skipped 3 with errors' report."
Safety & Ethics Note: When importing data from external sources, always validate and sanitize inputs. Don't assume CSV/JSON files are trustworthy—implement bounds checking, type validation, and error handling. This protects your application and data integrity.