Hybrid Workloads — Combining I/O and CPU for Peak Performance

Here's a realistic problem: You're building an AI system that needs to fetch data from 1,000 API endpoints and process each response with heavy analysis.

You have three choices:

Sequential approach: Fetch one, process it, then fetch the next. Total time = (1000 × fetch_time) + (1000 × process_time). On a typical system: 1000 × (0.1s fetch + 2s processing) = 2,100 seconds ≈ 35 minutes.
Concurrent fetching only (from Lesson 2): Fetch all 1,000 at once. But now you hit memory limits and API rate limits crash your system.
Hybrid approach: Fetch in batches of 10 while processing previous batches in parallel. Total time ≈ 300 seconds ≈ 5 minutes. That's 7x faster, and you never run out of resources.

This is the hybrid workload pattern. It's what powers production AI systems that need both concurrent I/O (API calls) and parallel processing (inference, data transformation).

By the end of this lesson, you'll understand why hybrid workloads matter for AI applications, how to architect them efficiently, and how to optimize them for your specific hardware and network conditions.

Understanding the Hybrid Pattern: I/O + CPU Together

Why Separate I/O and CPU?

From Lesson 4, you learned that asyncio helps with I/O-bound work (waiting for network, disk, etc.) but doesn't help with CPU-bound work (the GIL prevents true parallelism). So what do you do when your workload has both?

Answer: You use both tools simultaneously.

asyncio.TaskGroup() — Run multiple I/O operations concurrently (while one waits, another runs)
InterpreterPoolExecutor — Run multiple CPU operations in true parallel (separate interpreters = separate GILs)

💬 AI Colearning Prompt

"In a system that fetches data from APIs and processes it with heavy computation, explain how I/O and CPU work could overlap efficiently. What's the benefit of overlap versus sequential execution?"

The Core Pattern

Imagine stages in a pipeline:

Time:    0s         2s              4s              6s
Stage 1: [Fetch #1]  [Fetch #2]     [Fetch #3]
Stage 2:            [Process #1]    [Process #2]   [Process #3]
Stage 3:                           [Store #1]     [Store #2]

While you're fetching item #2, you're processing item #1 (CPU cores stay busy). While you're processing item #3, you're storing item #1 (I/O pipeline keeps flowing).

This is what hybrid workloads achieve: parallel execution of fundamentally different types of work.

🎓 Expert Insight

In AI-native development, you don't choose between concurrency and parallelism—you use both. This is why Python 3.14's InterpreterPoolExecutor paired with asyncio creates such powerful systems. The pattern is: "concurrent I/O boundaries" around "parallel CPU cores."

Real-World Example: AI Workload Characteristics

Most AI applications look like this:

Fetch — Call an API, get data (I/O-bound, waiting on network)
Process — Run inference, transform data (CPU-bound, heavy computation)
Store — Write to database (I/O-bound, waiting on storage)

Each stage has different resource constraints:

Fetching is limited by network bandwidth (often 100-1000 requests in flight is fine)
Processing is limited by CPU cores (4-8 workers for a typical machine)
Storing is limited by database connection pool (often 5-20 connections)

A naive approach would run all three sequentially. A better approach would batch them: fetch 10, process in parallel, store, repeat.

🚀 CoLearning Challenge

Ask your AI Co-Teacher:

"Describe how you would optimize a system that needs to fetch 10,000 items from an API, analyze each with CPU-intensive text processing, and store results in a database. What would your batch size be? How many workers for each stage? Why?"

Expected Outcome: You'll understand how to think about bottlenecks, resource constraints, and throughput optimization.

Code Example 1: Simple Hybrid Pattern (Fetch → Process)

Let's start with the simplest hybrid case: fetch one item, process it in parallel, repeat.

Specification Reference: Basic hybrid pattern Prompt Used: "Create a system that fetches N items from a mock API concurrently and processes each with CPU-intensive work, with proper type hints and error handling"

Loading Python environment...

Validation Steps:

Run the code and observe total time (~0.7s for fetch+process)
Notice how fetching completes quickly, then processing happens in parallel
With ThreadPoolExecutor shown here (for simplicity), CPU doesn't parallelize; with InterpreterPoolExecutor (Python 3.14), you'd see true parallelism

Key Insight: The loop.run_in_executor() bridges sync CPU work into async I/O flow. While one CPU task runs, the event loop can handle other I/O operations.

✨ Teaching Tip

Use your AI co-teacher to explore: "In the simple_hybrid function, why do we fetch all items first, then process all items second? What would happen if we fetched and processed in a mixed order?"

Code Example 2: Batch Processing Pattern

Real-world systems rarely have enough memory to fetch all items at once. Instead, you fetch in batches.

Specification Reference: Batch processing with resource limits Prompt Used: "Create a batch processing system that fetches N items in groups of B, processes in parallel, then repeats"

Loading Python environment...

Why Batching Matters:

Memory: Fetch 5 items, process, store. Not 1,000 in memory at once.
Throughput: Process while fetching next batch (pipeline overlap)
Resource Control: Never overwhelm the server or database

💬 AI Colearning Prompt

"In batch processing, why is batch size important? How would you determine the optimal batch size for your specific system?"

Code Example 3: Pipeline Pattern (Fetch → Transform → Store)

The most powerful hybrid pattern has three stages running in parallel.

Specification Reference: Three-stage pipeline with overlapping execution Prompt Used: "Implement a fetch-transform-store pipeline where stages overlap using asyncio queues"

Loading Python environment...

Pipeline Benefits:

Overlap: While fetching item #20, transform item #10, store item #5
Throughput: Queues decouple stages, each stage keeps busy
Resource Control: Queues have max sizes, preventing runaway memory

🎓 Expert Insight

Pipeline patterns are everywhere in production systems: data ETL, stream processing, real-time inference. The pattern is simple: decouple stages with queues, run concurrently, and watch throughput improve. This is why asyncio + queues are so powerful for backend systems.

Code Example 4: AI Workload Simulation (API Calls + Inference)

Now let's simulate a realistic AI workload: fetch data from multiple APIs, then run "inference" on each.

Specification Reference: Realistic AI pattern with multiple data sources Prompt Used: "Create a system that concurrently fetches from 3 APIs and processes with simulated inference, showing what production AI pipelines look like"

Loading Python environment...

Key Insight for AI Applications:

Fetch phase is I/O-bound (network waiting)
Inference phase is CPU-bound (computation)
Hybrid pattern overlaps them: fetch next batch while processing current batch
This is exactly how production AI systems (like Claude's backend) work

💬 AI Colearning Prompt

"In this AI workload example, could we process inference results WHILE still fetching from the APIs? How would you structure that? What would the advantage be?"

Code Example 5: Resource Limiting with Semaphores

Real-world systems have limits: you can't fetch infinitely from an API, can't run infinitely many inference tasks.

Specification Reference: Controlling concurrency with Semaphores Prompt Used: "Create a system that limits concurrent I/O to N requests and CPU workers to M processes using Semaphores"

Loading Python environment...

Why Semaphores Matter:

API Rate Limiting: Don't send 1000 requests at once; send 5-10 at a time
CPU Worker Limits: Don't spawn 100 workers on a 4-core machine; spawn 4
Database Connections: Don't open 1000 connections; use connection pool (5-20)

✨ Teaching Tip

Use your AI co-teacher to explore: "If you have 1000 items to process and a 4-core machine, is max_concurrent_processes = 4 always optimal? When might you want 8 or 2 instead?"

Code Example 6: Production Hybrid System (Complete Example)

Here's a complete example combining all patterns: realistic system with error handling, timeouts, and monitoring.

Specification Reference: Production-grade hybrid system architecture Prompt Used: "Design a production hybrid system: fetch 100 items with timeouts, process in batches, handle partial failures, log progress"

Loading Python environment...

Production-Ready Features:

Error handling: Try/except at each stage, graceful failures
Timeouts: Each operation has timeout to prevent hanging
Fallback strategies: If database fails, log to file instead
Partial success: If 1-2 items fail, system continues
Monitoring: Log progress, measure timing, calculate metrics
Batching: Process in chunks for memory efficiency

Identifying and Optimizing Bottlenecks

Real hybrid systems don't work optimally by accident. You need to measure where the bottleneck is.

Common Bottleneck Scenarios

Scenario 1: I/O-Bound Bottleneck

Time: 0s         5s              10s             15s
Fetch: [-------fetch-10s-------]
Proc:             [--process--]   [--process--]
Time to completion: 15s (waiting for fetches)

Solution: Increase fetch concurrency (higher TaskGroup density)

Scenario 2: CPU-Bound Bottleneck

Time: 0s    2s      4s      6s      8s      10s
Fetch:  [fetch]
Proc:         [----process--]  [----process--]  [----process--]
Time to completion: 10s (waiting for CPU processing)

Solution: Increase CPU workers (more InterpreterPoolExecutor workers)

Scenario 3: Storage Bottleneck

Time: 0s    2s    4s    6s    8s    10s   12s
Fetch: [fetch]
Proc:      [proc]
Store:         [store----]  [store----]  [store----]
Time to completion: 12s (waiting for database)

Solution: Increase database connections, or batch stores

How to Identify Your Bottleneck

Loading Python environment...

🚀 CoLearning Challenge

Ask your AI Co-Teacher:

"You have a system where fetch takes 0.1s per item, process takes 0.5s per item, and store takes 0.05s per item. You have 1000 items to process and a 4-core machine. What's the optimal batch size and worker allocation?"

Expected Outcome: You'll think about bottlenecks systematically—identifying which stage limits throughput, then optimizing resource allocation accordingly.

Challenge 5: The Hybrid Workload Builder

This challenge teaches you to architect complete production systems combining asyncio, executors, and intelligent orchestration.

Initial Exploration

Your Challenge: Experience the power of pipelining without AI guidance.

Deliverable: Create /tmp/pipeline_discovery.py containing:

A 3-stage pipeline (fetch, process, store) each taking 1 second
Sequential version: run all 3 for 5 items — measure time (should be ~15 seconds)
Pipelined version: start fetching item 2 while processing item 1, start storing item 0 while processing item 2 — measure time (should be ~7 seconds)

Expected Observation:

Sequential (stage after stage): ~15 seconds
Pipelined (stages overlap): ~7 seconds
Lesson: Pipelining nearly halves the time by overlapping stages

Self-Validation:

How does pipelining differ from just running stages in parallel?
What's the minimum time needed with pipelining? (max stage time + buffer time, not sum of all)
When does pipelining help? (When stages have different speeds, creating bottlenecks)

Understanding Pipeline Architecture

💬 AI Colearning Prompt: "I have a system: fetch documents (2s), extract text (3s), generate embeddings (4s), store results (1s). I run these sequentially: 4 documents = 40 seconds. I tried running them in parallel but wasted resources. Teach me about pipelining. How would I fetch doc 2 while extracting doc 1 while generating embeddings for doc 0? Show me the architecture using asyncio + executors. What's the optimal pipeline depth?"

What You'll Learn: Pipelining concept, queue-based architecture, how to balance stage speeds, and resource tradeoffs.

Clarifying Question: Deepen your understanding:

"You used a Queue to buffer between stages. What happens if the extract stage is much slower than fetch? Does the queue grow infinitely? How do I prevent memory exhaustion? What's backpressure?"

Expected Outcome: AI clarifies queue dynamics and backpressure concepts. You understand that pipelines need flow control, not just throughput maximization.

Identifying and Optimizing Bottlenecks

Activity: Work with AI to identify bottlenecks and optimize pipeline performance.

First, ask AI to generate a simple pipeline:

Loading Python environment...

Your Task:

Run this code. Measure time. (Should be sequential: ~35 seconds for 10 items)
Identify the problem: stages run one-at-a-time, no parallelism
Teach AI:

"This takes 35 seconds sequentially. I want it pipelined: fetch multiple items while processing previous ones. Show me how to use asyncio.Queue to buffer between fetch and process, and between process and store. How deep should the queues be?"

Your Edge Case Discovery: Ask AI:

"What if fetch is fast (0.5s) but process is slow (3s)? Process becomes the bottleneck. How many concurrent process workers should I have? What if I can't increase workers due to memory limits? How do I measure which stage is the bottleneck?"

Expected Outcome: You discover bottleneck analysis—profiling each stage, identifying slowest link, optimizing for throughput (not latency). You teach AI how production systems think about optimization.

Building an Optimized Data Pipeline

Capstone Activity: Build an optimized end-to-end data pipeline.

Specification:

3-stage pipeline: Fetch, Process, Store
Fetch 12 items from API simulation (asyncio.sleep 0.5-1.5s each)
Process each with CPU work (simulated with time.sleep, 1-2s each)
Store results (asyncio.sleep 0.1-0.3s each)
Stages connected with asyncio.Queue (implement backpressure)
Measure: throughput (items/second), latency (item start to finish), bottleneck identification
Return: {item: (fetch_ms, process_ms, store_ms, e2e_ms)}, plus throughput metrics
Type hints throughout

Deliverable: Save to /tmp/data_pipeline.py

Testing Your Work:

python /tmp/data_pipeline.py
# Expected output:
# Processed 12 items in ~12 seconds (pipelined, not ~24s sequential)
# Throughput: ~1 item/second
# Bottleneck: Process stage (slowest at 1-2s per item)
# Fetch queue depth: varies (shows backpressure)
# Pipeline efficiency: ~75% (good sign of effective overlap)

Validation Checklist:

Code runs without deadlocks or race conditions
Stages run concurrently (not sequentially)
Queues implement backpressure (don't grow unbounded)
Total time < sequential time by at least 20%
Bottleneck identified correctly (slowest stage)
Type hints complete
Production-ready: proper cleanup, error handling, queue drains

Time Estimate: 35-42 minutes (5 min discover, 8 min teach/learn, 10 min edge cases, 12-19 min build artifact)

Key Takeaway: You've mastered production system design. Pipelining, queues, backpressure, and bottleneck analysis are the foundation of real-world AI data systems.

Try With AI

How does a 3-stage pipeline (Fetch → Process → Store) achieve higher throughput than sequential execution?

🔍 Explore Pipeline Architecture:

"Explain pipelining with a 3-stage example: stage 1 takes 1s, stage 2 takes 2s, stage 3 takes 1s. For 10 items, show sequential time (40s) vs pipelined time (~22s). What enables the speedup?"

🎯 Practice Queue-Based Coordination:

"Implement asyncio.Queue connecting Fetch (producer) and Process (consumer) stages. Show how maxsize=5 implements backpressure. What happens when Process is slower than Fetch?"

🧪 Test Bottleneck Identification:

"Create a pipeline where Fetch=0.5s, Process=2s, Store=0.3s per item. Measure stage utilization and queue depths. Which stage is the bottleneck? How would you optimize throughput?"

🚀 Apply to Data Ingestion:

"Design a production data pipeline: fetch from 5 APIs (asyncio), transform each item (CPU, ProcessPoolExecutor), store to database (asyncio). Include backpressure, error handling, and throughput measurement."

Understanding the Hybrid Pattern: I/O + CPU Together​

Why Separate I/O and CPU?​

💬 AI Colearning Prompt​

The Core Pattern​

🎓 Expert Insight​

Real-World Example: AI Workload Characteristics​

🚀 CoLearning Challenge​

Code Example 1: Simple Hybrid Pattern (Fetch → Process)​

✨ Teaching Tip​

Code Example 2: Batch Processing Pattern​

💬 AI Colearning Prompt​

Code Example 3: Pipeline Pattern (Fetch → Transform → Store)​

🎓 Expert Insight​

Code Example 4: AI Workload Simulation (API Calls + Inference)​

💬 AI Colearning Prompt​

Code Example 5: Resource Limiting with Semaphores​

✨ Teaching Tip​

Code Example 6: Production Hybrid System (Complete Example)​

Identifying and Optimizing Bottlenecks​

Common Bottleneck Scenarios​

How to Identify Your Bottleneck​

🚀 CoLearning Challenge​

Challenge 5: The Hybrid Workload Builder​

Initial Exploration​

Understanding Pipeline Architecture​

Identifying and Optimizing Bottlenecks​

Building an Optimized Data Pipeline​

Try With AI​

Understanding the Hybrid Pattern: I/O + CPU Together

Why Separate I/O and CPU?

💬 AI Colearning Prompt

The Core Pattern

🎓 Expert Insight

Real-World Example: AI Workload Characteristics

🚀 CoLearning Challenge

Code Example 1: Simple Hybrid Pattern (Fetch → Process)

✨ Teaching Tip

Code Example 2: Batch Processing Pattern

💬 AI Colearning Prompt

Code Example 3: Pipeline Pattern (Fetch → Transform → Store)

🎓 Expert Insight

Code Example 4: AI Workload Simulation (API Calls + Inference)

💬 AI Colearning Prompt

Code Example 5: Resource Limiting with Semaphores

✨ Teaching Tip

Code Example 6: Production Hybrid System (Complete Example)

Identifying and Optimizing Bottlenecks

Common Bottleneck Scenarios

How to Identify Your Bottleneck

🚀 CoLearning Challenge

Challenge 5: The Hybrid Workload Builder

Initial Exploration

Understanding Pipeline Architecture

Identifying and Optimizing Bottlenecks

Building an Optimized Data Pipeline

Try With AI