Skip to main content

Structured Data Formats (CSV and JSON)

Real-world applications rarely work with data in isolation. You need to import employee lists from Excel, export analysis results to share with colleagues, integrate with APIs that return JSON, and store application data for later retrieval. This lesson teaches you how to work with two of the most common structured data formats: CSV (Comma-Separated Values) for tabular data and JSON (JavaScript Object Notation) for nested, hierarchical data.

By the end of this lesson, you'll understand when to use each format, how to read and write both with Python's standard library modules, and how to handle encoding correctly so international characters (emoji, accented letters, non-Latin scripts) are preserved.


Why Structured Data Formats Matter

Before diving into code, let's understand the problem these formats solve. When you have data—a list of contacts, inventory records, or application configuration—you need to:

  1. Store it persistently — Save data to disk so it survives when your program closes
  2. Share it with others — Export to Excel, send to API, or hand off to colleague
  3. Integrate with systems — Read data that other applications created (database exports, web APIs, configuration files)
  4. Structure it meaningfully — Preserve relationships between data (customer has multiple orders, organization has departments)

Structured formats give you a standard way to solve all these problems. CSV and JSON are ubiquitous because they're human-readable, language-independent, and handle international characters correctly.

Key decision: Use CSV for tabular data (rows and columns, like spreadsheets) and JSON for hierarchical data (nested objects, mixed types, API responses).


Understanding CSV: Simple, Tabular Data

CSV (Comma-Separated Values) represents data as rows and columns, just like a spreadsheet. Here's what a CSV file looks like:

name,age,department
Alice,28,Engineering
Bob,35,Sales
Carol,32,Marketing

Structure: Each row is one record. Each column is a field. Commas separate values. The first row (usually) contains headers—field names.

Strengths:

  • Simple, human-readable format
  • Native support in Excel, Google Sheets, databases
  • Compact (no extra formatting syntax)
  • Perfect for tabular data (employees, contacts, inventory)

Limitations:

  • Only handles flat data (rows and columns)
  • Can't represent nested structures (customer with multiple orders)
  • Requires conventions for special cases (commas in values, newlines in cells)

When to use CSV: Sales data, employee directories, scientific measurements, any rectangular dataset


Understanding JSON: Flexible, Hierarchical Data

JSON (JavaScript Object Notation) represents data as nested objects and arrays, with support for strings, numbers, booleans, and null. Here's a JSON example:

{
"id": "1",
"title": "Python Basics",
"tags": ["learning", "python"],
"metadata": {
"created": "2025-11-09",
"updated": "2025-11-09"
}
}

Structure: Data is organized as key-value pairs (objects) and ordered lists (arrays). Values can be strings, numbers, booleans, null, objects, or arrays.

Strengths:

  • Handles nested, hierarchical data naturally
  • Supports multiple data types (strings, numbers, booleans, null)
  • Standard format for web APIs (REST, GraphQL)
  • Self-documenting (keys describe values)
  • Readable with proper indentation

Limitations:

  • More verbose than CSV (extra syntax)
  • Overkill for simple tabular data
  • Requires proper syntax (trailing commas cause errors)

When to use JSON: API responses, configuration files, application state, nested data structures, anything with hierarchy or mixed types


Core Concept: The csv Module

Python's csv module provides two main readers:

csv.reader — Simple tuple-based access

Loading Python environment...

Access method: By index (row[0], row[1], row[2])

Advantage: Lightweight, works with any CSV structure

Disadvantage: Requires memorizing column positions (error-prone)

csv.DictReader — Dictionary-based access with headers

Loading Python environment...

Access method: By column name (row['name'], row['age'])

Advantage: Self-documenting, column names clear, robust to column reordering

Disadvantage: Slightly more overhead (creates dicts for each row)

Professional recommendation: Use csv.DictReader for production code—the clarity is worth the minimal overhead.


Core Concept: The json Module

Python's json module handles serialization (Python → JSON) and deserialization (JSON → Python):

json.load() — Read JSON from file

Loading Python environment...

json.dump() — Write JSON to file

Loading Python environment...

Key parameters:

  • indent=2 — Pretty-print with 2-space indentation (makes file readable)
  • ensure_ascii=False — Preserve Unicode characters (emoji, accents, non-Latin scripts)
  • encoding='utf-8' — Always use UTF-8 for text files

Core Concept: Character Encoding (UTF-8)

Problem: Without proper encoding, international characters become corrupted:

Loading Python environment...

UTF-8: Universal, variable-width encoding that handles every human language plus emoji. It's the standard for web and modern applications.

ensure_ascii=False: By default, JSON escapes non-ASCII characters as \uXXXX (ugly and hard to read). Setting ensure_ascii=False keeps emoji and international characters visible in the file.

Best practice: Always use encoding='utf-8' and ensure_ascii=False for JSON files that contain international text.

💬 AI Colearning Prompt

"Show me the difference between JSON with ensure_ascii=True (default) and ensure_ascii=False when saving emoji. Which is more readable and why?"


Reading CSV Files: Pattern and Practice

Here's the professional pattern for reading CSV files:

Code Example 4.1: Reading CSV Files

Loading Python environment...

Specification Reference: This example demonstrates FR-019 (reading CSV with csv.reader and csv.DictReader)

AI Prompts Used:

  1. "Show me how to read CSV files with both csv.reader and csv.DictReader"
  2. "Explain the difference between accessing by index vs by column name"

Validation Steps:

  1. Created sample CSV with headers and 3 data rows
  2. Read with csv.reader and printed rows as tuples
  3. Read with csv.DictReader and accessed columns by name
  4. Verified that DictReader automatically handled headers

Expected Outcome: You see csv.DictReader is clearer—you use column names instead of remembering positions.

🎓 Expert Insight

In AI-native development, you don't memorize CSV module parameters—you understand your DATA STRUCTURE. Need column access by name? csv.DictReader. Dealing with unheadered data? csv.reader. Your AI generates the boilerplate; your job is specifying your intent.


Writing CSV Files: Preserving Structure

Writing CSV requires careful handling of headers and row formatting:

Code Example 4.2: Writing CSV Files

Loading Python environment...

Key detail: newline='' in open() — This prevents extra blank lines in CSV output (platform-specific newline handling).

Specification Reference: This example demonstrates FR-020 (writing CSV with proper headers)

Validation Steps:

  1. Created list of dictionaries with employee data
  2. Opened file in write mode with newline=''
  3. Used csv.DictWriter to write headers and rows
  4. Read file back to verify output format
  5. Confirmed headers and data rows are correctly formatted

Expected Outcome: You understand csv.DictWriter handles header order and formatting automatically.

🚀 CoLearning Challenge

Ask your AI Co-Teacher:

"I have employee data with fields: id, name, email, hire_date. Write code using csv.DictWriter to export this to a CSV file. Then explain why we use DictWriter instead of manually writing strings with commas."

Expected Outcome: You'll understand csv.DictWriter handles edge cases (commas in values, special characters) automatically.


Reading and Writing JSON: Serialization/Deserialization

JSON is Python's preferred format for storing application data:

Code Example 4.3: Reading and Writing JSON

Loading Python environment...

Key points:

  • json.dump() converts Python objects → JSON (serialization)
  • json.load() converts JSON → Python objects (deserialization)
  • indent=2 makes output human-readable (not required, but professional)
  • Nested structures (dicts, lists) are preserved perfectly

Specification Reference: This example demonstrates FR-021 and FR-022 (JSON load and dump with encoding)

Validation Steps:

  1. Created list of dicts with nested structure
  2. Used json.dump() with indent=2 for readability
  3. Displayed raw file content to show pretty-printing
  4. Used json.load() to read back into Python
  5. Verified structure is preserved exactly

Expected Outcome: You see JSON preserves nested structures perfectly—Python dicts become JSON objects, lists become arrays.


Handling Encoding for International Text

One of the biggest real-world problems: international characters get corrupted. Here's how to prevent it:

Code Example 4.4: Handling Encoding for International Text

Loading Python environment...

The lesson: Always use both:

  1. encoding='utf-8' in open()
  2. ensure_ascii=False in json.dump()

Without both, emoji and international characters become either corrupted or escaped.

Specification Reference: This example demonstrates FR-024 (understanding encoding and ensure_ascii)

Validation Steps:

  1. Created data with emoji and international scripts
  2. Demonstrated problem: escaped characters without ensure_ascii=False
  3. Showed solution: both encoding='utf-8' and ensure_ascii=False
  4. Read back and verified characters preserved
  5. Confirmed file is both readable (in text editor) and parseable (by json.load())

Expected Outcome: You understand why UTF-8 + ensure_ascii=False matter for international applications.

✨ Teaching Tip

Use Claude Code to explore encoding issues: "What happens if I save JSON with emoji but use encoding='latin-1' or omit ensure_ascii=False? Show me the broken output and how to fix it."


Error Handling for Format Errors

Real-world files are often corrupted, incomplete, or malformed. Here's professional error handling:

Code Example 4.5: Error Handling for Format Errors

Loading Python environment...

What this teaches:

  • JSONDecodeError includes line and position information (helps debug)
  • FileNotFoundError for missing files
  • Return None on error (allows graceful handling)
  • Check for empty CSV (no headers = invalid)

Specification Reference: This example demonstrates FR-023 (handling JSONDecodeError and ValueError)

Validation Steps:

  1. Wrote safe_load_json() with specific exception handling
  2. Wrote safe_load_csv() with header validation
  3. Tested both with valid files
  4. Tested with invalid JSON (syntax error)
  5. Tested with missing file
  6. Verified error messages are helpful, not crashes

Expected Outcome: You see professional error handling prevents crashes and provides actionable error messages.

Python 3.14 Improved Error Messages

Python 3.14 enhanced JSON error reporting with exception notes that identify exactly where errors occur:

Loading Python environment...

What improved: In Python 3.13 and earlier, you got basic line/column numbers. In Python 3.14+, exception notes provide contextual hints about what's wrong (missing value, trailing comma, unclosed string, etc.).

Why this matters for AI-native development: When JSON parsing fails, your AI companion gets better error messages to diagnose the issue. This means faster debugging—your AI can pinpoint "missing closing brace on line 12" instead of generic "invalid JSON" errors.

For this chapter: The error handling patterns we teach work in all Python versions. When you encounter JSON errors, Python 3.14+ will just give you more helpful messages automatically.


Choosing Between CSV and JSON: Decision Framework

When should you use each format? Here's the decision tree:

QuestionAnswerUse CSVUse JSON
Is data tabular? (rows and columns)Yes✓ CSV
Does data have nested structure? (objects within objects)Yes✓ JSON
Will it open in Excel?Required✓ CSV
Is it an API response?Yes✓ JSON
Configuration file?Yes✓ JSON
Only strings and numbers?No (has booleans, nulls)✓ CSV

Real-world examples:

  • CSV: Employee directory, sales data, scientific measurements, database exports, anything from Excel
  • JSON: API responses, application configuration, note storage (Lesson 5), any hierarchical data, web application state

Professional guideline: When in doubt, use JSON. It's more flexible, handles edge cases better, and integrates naturally with web APIs.


Practice Exercises

Exercise 1: Read and Display CSV

Write a program that:

  1. Reads a CSV file with at least 3 columns and 5 rows
  2. Displays each row in a formatted table with column headers
  3. Uses csv.DictReader for clean column access

Validation: Test with a file containing special characters (commas in values, quotes)

Success: All rows display correctly, headers shown

Exercise 2: Convert CSV to JSON

Write a program that:

  1. Reads employee data from a CSV file
  2. Converts to a Python list of dictionaries
  3. Saves to JSON with indent=2 and ensure_ascii=False
  4. Reads back and verifies structure matches

Validation: Test with data containing international characters (names with accents, emoji in descriptions)

Success: CSV and JSON contain same data, international characters preserved

Exercise 3: Error Handling for Both Formats

Write error-handling code that:

  1. Attempts to load both CSV and JSON files
  2. Catches FileNotFoundError, JSONDecodeError, and ValueError
  3. Provides helpful error messages (not just crashes)
  4. Returns None on failure so calling code can handle gracefully

Validation: Test with:

  • Valid CSV and JSON files
  • Corrupted JSON (missing bracket)
  • Missing file
  • Empty file

Success: No crashes, appropriate error messages for each case


Try With AI

Master data format conversion between CSV and JSON with validation and encoding handling.

🔍 Explore Format Tradeoffs:

"Compare CSV vs JSON for employee data (name, email, department, start_year). Explain: CSV pros (compact, Excel-compatible, simple) vs JSON pros (nested data, type preservation, metadata). Show manual conversion and what's lost in each direction. When to choose each format?"

🎯 Practice Format Handling:

"Demonstrate csv.DictReader reading CSV rows as dictionaries and json.dump() writing with indentation. Show edge cases: comma in name field ('Alice, Jr.'), newline in description (CSV breaks), emoji encoding (café, José with ensure_ascii=False), and performance for 100K records."

🧪 Test Special Characters:

"Create test data with problematic characters: CSV with commas ('Johnson, Jr.'), quotes, and newlines. JSON with international characters (café, José) and emojis. Show how CSV quoting handles commas, when parsing breaks, and proper encoding settings (utf-8, ensure_ascii=False) for international data."

🚀 Apply Migration Tool:

"Build bi-directional converter: CSV to JSON with validation (required fields, email format check, start_year range), add metadata (UUID, timestamp, data_source), handle errors (missing fields skip with warning, invalid email skip with report), save employees.json. Reverse: JSON to CSV. Show 'Imported 47 employees, skipped 3 with errors' report."


Safety & Ethics Note: When importing data from external sources, always validate and sanitize inputs. Don't assume CSV/JSON files are trustworthy—implement bounds checking, type validation, and error handling. This protects your application and data integrity.