Hybrid Workloads — Combining I/O and CPU for Peak Performance
Here's a realistic problem: You're building an AI system that needs to fetch data from 1,000 API endpoints and process each response with heavy analysis.
You have three choices:
-
Sequential approach: Fetch one, process it, then fetch the next. Total time = (1000 × fetch_time) + (1000 × process_time). On a typical system: 1000 × (0.1s fetch + 2s processing) = 2,100 seconds ≈ 35 minutes.
-
Concurrent fetching only (from Lesson 2): Fetch all 1,000 at once. But now you hit memory limits and API rate limits crash your system.
-
Hybrid approach: Fetch in batches of 10 while processing previous batches in parallel. Total time ≈ 300 seconds ≈ 5 minutes. That's 7x faster, and you never run out of resources.
This is the hybrid workload pattern. It's what powers production AI systems that need both concurrent I/O (API calls) and parallel processing (inference, data transformation).
By the end of this lesson, you'll understand why hybrid workloads matter for AI applications, how to architect them efficiently, and how to optimize them for your specific hardware and network conditions.
Understanding the Hybrid Pattern: I/O + CPU Together
Why Separate I/O and CPU?
From Lesson 4, you learned that asyncio helps with I/O-bound work (waiting for network, disk, etc.) but doesn't help with CPU-bound work (the GIL prevents true parallelism). So what do you do when your workload has both?
Answer: You use both tools simultaneously.
- asyncio.TaskGroup() — Run multiple I/O operations concurrently (while one waits, another runs)
- InterpreterPoolExecutor — Run multiple CPU operations in true parallel (separate interpreters = separate GILs)
💬 AI Colearning Prompt
"In a system that fetches data from APIs and processes it with heavy computation, explain how I/O and CPU work could overlap efficiently. What's the benefit of overlap versus sequential execution?"
The Core Pattern
Imagine stages in a pipeline:
Time: 0s 2s 4s 6s
Stage 1: [Fetch #1] [Fetch #2] [Fetch #3]
Stage 2: [Process #1] [Process #2] [Process #3]
Stage 3: [Store #1] [Store #2]
While you're fetching item #2, you're processing item #1 (CPU cores stay busy). While you're processing item #3, you're storing item #1 (I/O pipeline keeps flowing).
This is what hybrid workloads achieve: parallel execution of fundamentally different types of work.
🎓 Expert Insight
In AI-native development, you don't choose between concurrency and parallelism—you use both. This is why Python 3.14's InterpreterPoolExecutor paired with asyncio creates such powerful systems. The pattern is: "concurrent I/O boundaries" around "parallel CPU cores."
Real-World Example: AI Workload Characteristics
Most AI applications look like this:
- Fetch — Call an API, get data (I/O-bound, waiting on network)
- Process — Run inference, transform data (CPU-bound, heavy computation)
- Store — Write to database (I/O-bound, waiting on storage)
Each stage has different resource constraints:
- Fetching is limited by network bandwidth (often 100-1000 requests in flight is fine)
- Processing is limited by CPU cores (4-8 workers for a typical machine)
- Storing is limited by database connection pool (often 5-20 connections)
A naive approach would run all three sequentially. A better approach would batch them: fetch 10, process in parallel, store, repeat.
🚀 CoLearning Challenge
Ask your AI Co-Teacher:
"Describe how you would optimize a system that needs to fetch 10,000 items from an API, analyze each with CPU-intensive text processing, and store results in a database. What would your batch size be? How many workers for each stage? Why?"
Expected Outcome: You'll understand how to think about bottlenecks, resource constraints, and throughput optimization.
Code Example 1: Simple Hybrid Pattern (Fetch → Process)
Let's start with the simplest hybrid case: fetch one item, process it in parallel, repeat.
Specification Reference: Basic hybrid pattern Prompt Used: "Create a system that fetches N items from a mock API concurrently and processes each with CPU-intensive work, with proper type hints and error handling"
Loading Python environment...
Validation Steps:
- Run the code and observe total time (~0.7s for fetch+process)
- Notice how fetching completes quickly, then processing happens in parallel
- With ThreadPoolExecutor shown here (for simplicity), CPU doesn't parallelize; with InterpreterPoolExecutor (Python 3.14), you'd see true parallelism
Key Insight: The loop.run_in_executor() bridges sync CPU work into async I/O flow. While one CPU task runs, the event loop can handle other I/O operations.
✨ Teaching Tip
Use your AI co-teacher to explore: "In the simple_hybrid function, why do we fetch all items first, then process all items second? What would happen if we fetched and processed in a mixed order?"
Code Example 2: Batch Processing Pattern
Real-world systems rarely have enough memory to fetch all items at once. Instead, you fetch in batches.
Specification Reference: Batch processing with resource limits Prompt Used: "Create a batch processing system that fetches N items in groups of B, processes in parallel, then repeats"
Loading Python environment...
Why Batching Matters:
- Memory: Fetch 5 items, process, store. Not 1,000 in memory at once.
- Throughput: Process while fetching next batch (pipeline overlap)
- Resource Control: Never overwhelm the server or database
💬 AI Colearning Prompt
"In batch processing, why is batch size important? How would you determine the optimal batch size for your specific system?"
Code Example 3: Pipeline Pattern (Fetch → Transform → Store)
The most powerful hybrid pattern has three stages running in parallel.
Specification Reference: Three-stage pipeline with overlapping execution Prompt Used: "Implement a fetch-transform-store pipeline where stages overlap using asyncio queues"
Loading Python environment...
Pipeline Benefits:
- Overlap: While fetching item #20, transform item #10, store item #5
- Throughput: Queues decouple stages, each stage keeps busy
- Resource Control: Queues have max sizes, preventing runaway memory
🎓 Expert Insight
Pipeline patterns are everywhere in production systems: data ETL, stream processing, real-time inference. The pattern is simple: decouple stages with queues, run concurrently, and watch throughput improve. This is why asyncio + queues are so powerful for backend systems.
Code Example 4: AI Workload Simulation (API Calls + Inference)
Now let's simulate a realistic AI workload: fetch data from multiple APIs, then run "inference" on each.
Specification Reference: Realistic AI pattern with multiple data sources Prompt Used: "Create a system that concurrently fetches from 3 APIs and processes with simulated inference, showing what production AI pipelines look like"
Loading Python environment...
Key Insight for AI Applications:
- Fetch phase is I/O-bound (network waiting)
- Inference phase is CPU-bound (computation)
- Hybrid pattern overlaps them: fetch next batch while processing current batch
- This is exactly how production AI systems (like Claude's backend) work
💬 AI Colearning Prompt
"In this AI workload example, could we process inference results WHILE still fetching from the APIs? How would you structure that? What would the advantage be?"
Code Example 5: Resource Limiting with Semaphores
Real-world systems have limits: you can't fetch infinitely from an API, can't run infinitely many inference tasks.
Specification Reference: Controlling concurrency with Semaphores Prompt Used: "Create a system that limits concurrent I/O to N requests and CPU workers to M processes using Semaphores"
Loading Python environment...
Why Semaphores Matter:
- API Rate Limiting: Don't send 1000 requests at once; send 5-10 at a time
- CPU Worker Limits: Don't spawn 100 workers on a 4-core machine; spawn 4
- Database Connections: Don't open 1000 connections; use connection pool (5-20)
✨ Teaching Tip
Use your AI co-teacher to explore: "If you have 1000 items to process and a 4-core machine, is max_concurrent_processes = 4 always optimal? When might you want 8 or 2 instead?"
Code Example 6: Production Hybrid System (Complete Example)
Here's a complete example combining all patterns: realistic system with error handling, timeouts, and monitoring.
Specification Reference: Production-grade hybrid system architecture Prompt Used: "Design a production hybrid system: fetch 100 items with timeouts, process in batches, handle partial failures, log progress"
Loading Python environment...
Production-Ready Features:
- Error handling: Try/except at each stage, graceful failures
- Timeouts: Each operation has timeout to prevent hanging
- Fallback strategies: If database fails, log to file instead
- Partial success: If 1-2 items fail, system continues
- Monitoring: Log progress, measure timing, calculate metrics
- Batching: Process in chunks for memory efficiency
Identifying and Optimizing Bottlenecks
Real hybrid systems don't work optimally by accident. You need to measure where the bottleneck is.
Common Bottleneck Scenarios
Scenario 1: I/O-Bound Bottleneck
Time: 0s 5s 10s 15s
Fetch: [-------fetch-10s-------]
Proc: [--process--] [--process--]
Time to completion: 15s (waiting for fetches)
Solution: Increase fetch concurrency (higher TaskGroup density)
Scenario 2: CPU-Bound Bottleneck
Time: 0s 2s 4s 6s 8s 10s
Fetch: [fetch]
Proc: [----process--] [----process--] [----process--]
Time to completion: 10s (waiting for CPU processing)
Solution: Increase CPU workers (more InterpreterPoolExecutor workers)
Scenario 3: Storage Bottleneck
Time: 0s 2s 4s 6s 8s 10s 12s
Fetch: [fetch]
Proc: [proc]
Store: [store----] [store----] [store----]
Time to completion: 12s (waiting for database)
Solution: Increase database connections, or batch stores
How to Identify Your Bottleneck
Loading Python environment...
🚀 CoLearning Challenge
Ask your AI Co-Teacher:
"You have a system where fetch takes 0.1s per item, process takes 0.5s per item, and store takes 0.05s per item. You have 1000 items to process and a 4-core machine. What's the optimal batch size and worker allocation?"
Expected Outcome: You'll think about bottlenecks systematically—identifying which stage limits throughput, then optimizing resource allocation accordingly.
Challenge 5: The Hybrid Workload Builder
This challenge teaches you to architect complete production systems combining asyncio, executors, and intelligent orchestration.
Initial Exploration
Your Challenge: Experience the power of pipelining without AI guidance.
Deliverable: Create /tmp/pipeline_discovery.py containing:
- A 3-stage pipeline (fetch, process, store) each taking 1 second
- Sequential version: run all 3 for 5 items — measure time (should be ~15 seconds)
- Pipelined version: start fetching item 2 while processing item 1, start storing item 0 while processing item 2 — measure time (should be ~7 seconds)
Expected Observation:
- Sequential (stage after stage): ~15 seconds
- Pipelined (stages overlap): ~7 seconds
- Lesson: Pipelining nearly halves the time by overlapping stages
Self-Validation:
- How does pipelining differ from just running stages in parallel?
- What's the minimum time needed with pipelining? (max stage time + buffer time, not sum of all)
- When does pipelining help? (When stages have different speeds, creating bottlenecks)
Understanding Pipeline Architecture
💬 AI Colearning Prompt: "I have a system: fetch documents (2s), extract text (3s), generate embeddings (4s), store results (1s). I run these sequentially: 4 documents = 40 seconds. I tried running them in parallel but wasted resources. Teach me about pipelining. How would I fetch doc 2 while extracting doc 1 while generating embeddings for doc 0? Show me the architecture using asyncio + executors. What's the optimal pipeline depth?"
What You'll Learn: Pipelining concept, queue-based architecture, how to balance stage speeds, and resource tradeoffs.
Clarifying Question: Deepen your understanding:
"You used a Queue to buffer between stages. What happens if the extract stage is much slower than fetch? Does the queue grow infinitely? How do I prevent memory exhaustion? What's backpressure?"
Expected Outcome: AI clarifies queue dynamics and backpressure concepts. You understand that pipelines need flow control, not just throughput maximization.
Identifying and Optimizing Bottlenecks
Activity: Work with AI to identify bottlenecks and optimize pipeline performance.
First, ask AI to generate a simple pipeline:
Loading Python environment...
Your Task:
- Run this code. Measure time. (Should be sequential: ~35 seconds for 10 items)
- Identify the problem: stages run one-at-a-time, no parallelism
- Teach AI:
"This takes 35 seconds sequentially. I want it pipelined: fetch multiple items while processing previous ones. Show me how to use asyncio.Queue to buffer between fetch and process, and between process and store. How deep should the queues be?"
Your Edge Case Discovery: Ask AI:
"What if fetch is fast (0.5s) but process is slow (3s)? Process becomes the bottleneck. How many concurrent process workers should I have? What if I can't increase workers due to memory limits? How do I measure which stage is the bottleneck?"
Expected Outcome: You discover bottleneck analysis—profiling each stage, identifying slowest link, optimizing for throughput (not latency). You teach AI how production systems think about optimization.
Building an Optimized Data Pipeline
Capstone Activity: Build an optimized end-to-end data pipeline.
Specification:
- 3-stage pipeline: Fetch, Process, Store
- Fetch 12 items from API simulation (asyncio.sleep 0.5-1.5s each)
- Process each with CPU work (simulated with time.sleep, 1-2s each)
- Store results (asyncio.sleep 0.1-0.3s each)
- Stages connected with asyncio.Queue (implement backpressure)
- Measure: throughput (items/second), latency (item start to finish), bottleneck identification
- Return:
{item: (fetch_ms, process_ms, store_ms, e2e_ms)}, plus throughput metrics - Type hints throughout
Deliverable: Save to /tmp/data_pipeline.py
Testing Your Work:
python /tmp/data_pipeline.py
# Expected output:
# Processed 12 items in ~12 seconds (pipelined, not ~24s sequential)
# Throughput: ~1 item/second
# Bottleneck: Process stage (slowest at 1-2s per item)
# Fetch queue depth: varies (shows backpressure)
# Pipeline efficiency: ~75% (good sign of effective overlap)
Validation Checklist:
- Code runs without deadlocks or race conditions
- Stages run concurrently (not sequentially)
- Queues implement backpressure (don't grow unbounded)
- Total time < sequential time by at least 20%
- Bottleneck identified correctly (slowest stage)
- Type hints complete
- Production-ready: proper cleanup, error handling, queue drains
Time Estimate: 35-42 minutes (5 min discover, 8 min teach/learn, 10 min edge cases, 12-19 min build artifact)
Key Takeaway: You've mastered production system design. Pipelining, queues, backpressure, and bottleneck analysis are the foundation of real-world AI data systems.
Try With AI
How does a 3-stage pipeline (Fetch → Process → Store) achieve higher throughput than sequential execution?
🔍 Explore Pipeline Architecture:
"Explain pipelining with a 3-stage example: stage 1 takes 1s, stage 2 takes 2s, stage 3 takes 1s. For 10 items, show sequential time (40s) vs pipelined time (~22s). What enables the speedup?"
🎯 Practice Queue-Based Coordination:
"Implement asyncio.Queue connecting Fetch (producer) and Process (consumer) stages. Show how maxsize=5 implements backpressure. What happens when Process is slower than Fetch?"
🧪 Test Bottleneck Identification:
"Create a pipeline where Fetch=0.5s, Process=2s, Store=0.3s per item. Measure stage utilization and queue depths. Which stage is the bottleneck? How would you optimize throughput?"
🚀 Apply to Data Ingestion:
"Design a production data pipeline: fetch from 5 APIs (asyncio), transform each item (CPU, ProcessPoolExecutor), store to database (asyncio). Include backpressure, error handling, and throughput measurement."