Multi-Stage Builds & Optimization

Container images that work are good. Container images that work AND are small are great. This lesson teaches you why—and more importantly, how.

When you build a Docker image for a Python service, you typically need two things during the build process: a compiler and development libraries to install dependencies. But in production, you only need the installed packages themselves. A naive Dockerfile includes everything—build tools, development headers, cache files—all of which add hundreds of megabytes of unnecessary weight.

Multi-stage builds solve this elegantly. You perform the heavy lifting (dependency installation, compilation) in a large build image, then copy only the artifacts you need into a small production image. The result: images that are 85-90% smaller, faster to push, faster to pull, and have smaller attack surfaces.

In this lesson, you'll start with a bloated Dockerfile and progressively optimize it through multiple iterations. You'll measure the size reduction at each step, understand the tradeoffs, and learn techniques you'll use in every production Dockerfile going forward.

The Problem: Bloated Images

Let's start with a naive Dockerfile that doesn't think about image size at all.

File: Dockerfile.naive

FROM python:3.12

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY main.py .

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

And a minimal FastAPI application to containerize:

File: requirements.txt

fastapi==0.115.0
uvicorn==0.30.0
pydantic==2.6.0

File: main.py

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def read_root():
    return {"message": "Hello from Docker!"}

@app.get("/health")
def health_check():
    return {"status": "healthy"}

Build this image and check its size:

docker build -t naive-image:latest -f Dockerfile.naive .
docker images naive-image:latest

Output:

REPOSITORY     TAG      IMAGE ID       CREATED         SIZE
naive-image    latest   a1b2c3d4e5f6   10 seconds ago  1.2GB

1.2 gigabytes for a tiny FastAPI app. That bloat comes from:

Full Python image: ~900MB (includes compilers, development headers, build tools)
Pip cache: ~150MB (stored in /root/.cache/pip)
Development dependencies: ~150MB (libraries needed only during installation)

None of that is needed to RUN the application. You only need the installed Python packages themselves—maybe 100-150MB total.

Iteration 1: Use a Slim Base Image

The python:3.12 image is the full-featured version. Docker provides alternatives:

python:3.12 (full) — ~900MB, includes build tools, development headers, compilers
python:3.12-slim (slim) — ~150MB, includes essentials but no build tools
python:3.12-alpine (alpine) — ~50MB, minimal Linux, tiny footprint
distroless/python3.12 (distroless) — ~50MB, only runtime, no shell or package manager

For most cases, slim strikes the right balance: small enough to matter, but large enough to have essential libraries for most Python packages.

File: Dockerfile.v1-slim

FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY main.py .

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and measure:

docker build -t slim-image:latest -f Dockerfile.v1-slim .
docker images slim-image:latest

Output:

REPOSITORY    TAG      IMAGE ID       CREATED        SIZE
slim-image    latest   f6e5d4c3b2a1   5 seconds ago  450MB

Progress: 1.2GB → 450MB (62% reduction)

Better, but we're still carrying unnecessary files. The Python slim image has development libraries needed during installation (like gcc for compiling C extensions), but we don't need them in the final image.

Iteration 2: Multi-Stage Builds (Separate Build & Runtime)

Multi-stage builds use multiple FROM instructions in a single Dockerfile. Each stage can use a different base image. You build dependencies in a large stage, then copy only what you need into a small stage.

File: Dockerfile.v2-multistage

# Stage 1: Build stage (large image with build tools)
FROM python:3.12-slim AS builder

WORKDIR /app

COPY requirements.txt .

# Install dependencies, but keep them in the builder stage
RUN pip install --user --no-cache-dir -r requirements.txt

# Stage 2: Runtime stage (small image with only what's needed)
FROM python:3.12-slim

WORKDIR /app

# Copy installed packages from builder stage
COPY --from=builder /root/.local /root/.local

# Set PATH so Python finds installed packages
ENV PATH=/root/.local/bin:$PATH \
    PYTHONUNBUFFERED=1

# Copy application code
COPY main.py .

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Let's understand what's happening:

Stage 1 (builder):

Starts with python:3.12-slim (all build tools available)
Installs dependencies with pip install --user (stores in /root/.local)
This stage is used only for building; it's discarded when the build finishes

Stage 2 (runtime):

Also starts with python:3.12-slim (fresh, clean slate)
Copies /root/.local from builder stage (all installed packages)
Copies application code
Does NOT include build tools, development headers, or pip cache
This is the final image Docker keeps

Build and measure:

docker build -t multistage-image:latest -f Dockerfile.v2-multistage .
docker images multistage-image:latest

Output:

REPOSITORY       TAG      IMAGE ID       CREATED        SIZE
multistage-image latest   d3c2b1a0f9e8   3 seconds ago  180MB

Progress: 1.2GB → 180MB (85% reduction from original)

Iteration 3: Use Alpine Base Image + UV Package Manager

Alpine is a minimal Linux distribution (50MB vs 150MB for slim). It's tiny but requires careful package selection because some Python packages expect standard Linux tools.

More importantly, we'll introduce UV, a Rust-based Python package manager that's 10-100x faster than pip while using less memory.

File: Dockerfile.v3-alpine-uv

# Stage 1: Build stage with Alpine (minimal)
FROM python:3.12-alpine AS builder

WORKDIR /app

# Install UV package manager
RUN pip install uv

COPY requirements.txt .

# UV installs 10-100x faster than pip and uses less memory
# --system installs to system Python instead of virtual environment
RUN uv pip install --system --no-cache -r requirements.txt

# Stage 2: Runtime stage with Alpine (minimal)
FROM python:3.12-alpine

WORKDIR /app

COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

ENV PYTHONUNBUFFERED=1

COPY main.py .

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Key changes:

Alpine base: 50MB instead of 150MB
UV package manager: Installs dependencies 10-100x faster
--system flag: UV installs to system Python, not a virtual environment (simpler copying)
--no-cache: Doesn't cache pip data in the builder stage

Build and measure:

docker build -t alpine-uv-image:latest -f Dockerfile.v3-alpine-uv .
docker images alpine-uv-image:latest

Output:

REPOSITORY     TAG      IMAGE ID       CREATED        SIZE
alpine-uv-image latest  e4f5a6b7c8d9   2 seconds ago  120MB

Progress: 1.2GB → 120MB (90% reduction from original)

Iteration 4: Combine RUN Commands & Clean Caches

Docker builds images in layers. Each RUN instruction creates a new layer. Layers are cached, which speeds up subsequent builds, but it also means intermediate files are retained.

By combining RUN commands, you reduce layers and can clean up intermediate files in the same layer (so they don't persist).

File: Dockerfile.v4-optimized

# Stage 1: Build stage
FROM python:3.12-alpine AS builder

WORKDIR /app

# Single RUN command: install UV + dependencies + clean cache
RUN pip install uv && \
    pip cache purge

COPY requirements.txt .

RUN uv pip install --system --no-cache -r requirements.txt && \
    rm -rf /root/.cache && \
    find /usr/local -type d -name '__pycache__' -exec rm -rf {} + 2>/dev/null || true

# Stage 2: Runtime stage
FROM python:3.12-alpine

WORKDIR /app

# Minimal runtime: copy only what's needed
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# Set environment variables in a single RUN to reduce layers
RUN echo "PYTHONUNBUFFERED=1" >> /etc/environment

COPY main.py .

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Optimizations:

Combined RUN commands: Fewer layers = smaller overhead
Explicit cache cleanup: pip cache purge, removed __pycache__ directories
Clean intermediate artifacts: Anything created during build but not needed is discarded

Build and measure:

docker build -t optimized-image:latest -f Dockerfile.v4-optimized .
docker images optimized-image:latest

Output:

REPOSITORY      TAG      IMAGE ID       CREATED        SIZE
optimized-image latest   f7g8h9i0j1k2   2 seconds ago  118MB

Progress: Nearly the same (118MB vs 120MB). Cleanup saved only 2MB because UV already doesn't cache. But the technique is important for other package managers and dependencies.

Understanding Base Image Tradeoffs

Let's summarize the three base image options and when to use each:

Base Image	Size	Use Case	Tradeoff
python:3.12-slim	150MB	Default choice for most apps	Includes build tools you might not need
python:3.12-alpine	50MB	Size-critical deployments (Kubernetes, edge)	Missing some standard libraries; C extensions may not compile
distroless/python3.12	50MB	Maximum security (no shell, no package manager)	Can't debug in container; requires all dependencies pre-installed

For this lesson, slim is the safest, and alpine is the best for containerized AI services where size matters. Distroless is advanced—useful for production security but harder to debug.

Handling Large Model Files (>1GB)

A Dockerfile with COPY model.bin . would embed a 4GB model file into the image itself. That's wasteful: the image would be 4GB+, slow to push, slow to pull, and duplicated on every machine.

Instead, use volume mounts to inject model files at runtime:

File: Dockerfile.v4-optimized (no model file)

# Same as before—NO COPY of model files
FROM python:3.12-alpine AS builder
# ... rest same ...

FROM python:3.12-alpine
# ... rest same ...

Run with volume mount:

docker run -v $(pwd)/models:/app/models optimized-image:latest

Or in docker-compose.yaml:

services:
  app:
    image: optimized-image:latest
    volumes:
      - ./models:/app/models
    ports:
      - "8000:8000"

Application code (main.py):

from fastapi import FastAPI
from pathlib import Path

app = FastAPI()

# Models loaded from volume-mounted directory at runtime
models_dir = Path("/app/models")

@app.on_event("startup")
async def load_model():
    global model
    model_path = models_dir / "model.bin"
    # Load model from file
    print(f"Loading model from {model_path}")
    # model = load_model_function(model_path)

@app.get("/")
def read_root():
    return {"message": "Model loaded from volume mount"}

Benefits:

Image stays small (no model embedded)
Models can be shared across containers (single volume mount point)
Models can be updated without rebuilding image
Perfect for AI services with large model files

Measuring Progress: docker images and docker history

Docker provides commands to inspect image size and layers:

docker images shows the final image size:

docker images

Output:

REPOSITORY      TAG       IMAGE ID       CREATED        SIZE
optimized-image latest    f7g8h9i0j1k2   2 seconds ago  118MB
naive-image     latest    a1b2c3d4e5f6   15 mins ago    1.2GB
slim-image      latest    f6e5d4c3b2a1   10 mins ago    450MB

docker history shows what each layer contains:

docker history optimized-image:latest

Output:

IMAGE          CREATED        CREATED BY                                      SIZE
f7g8h9i0j1k2   2 seconds ago  CMD ["uvicorn" "main:app" "--host" "0.0.0.0"]  0B
<missing>      2 seconds ago  COPY main.py . # buildkit                       5.2kB
<missing>      2 seconds ago  RUN /bin/sh -c echo "PYTHONUNBUFFERED=1"...    42B
<missing>      2 seconds ago  COPY --from=builder /usr/local/bin...          15MB
<missing>      2 seconds ago  COPY --from=builder /usr/local/lib/python...   103MB

Each line is a layer. The SIZE column shows how much that layer added. If you see a large layer, that's where to optimize.

BuildKit: Faster, Smarter Builds

Docker's modern build system uses BuildKit, which offers advanced caching and parallel stage execution.

BuildKit is enabled by default in Docker Desktop and modern versions. You can explicitly enable it:

export DOCKER_BUILDKIT=1
docker build -t optimized-image:latest -f Dockerfile.v4-optimized .

Output:

[+] Building 8.2s (13/13) FINISHED
 => [internal] load build definition from Dockerfile              0.0s
 => => transferring dockerfile: 1.2kB                             0.0s
 => [builder 1/3] FROM python:3.12-alpine                         2.3s
 => [builder 2/3] RUN pip install uv && pip cache purge           4.1s
 => [builder 3/3] RUN uv pip install --system --no-cache...       1.5s  <<---- Fast!
 => [stage-1 1/5] FROM python:3.12-alpine                         0.0s  (reused)
 => [stage-1 2/5] COPY --from=builder /usr/local/lib/python...   0.2s
 => [stage-1 3/5] COPY --from=builder /usr/local/bin ...          0.1s
 => [stage-1 4/5] RUN echo "PYTHONUNBUFFERED=1"...               0.2s
 => [stage-1 5/5] COPY main.py .                                  0.1s
 => exporting to image                                            0.3s

BuildKit advantages:

Parallel stage execution: Stages with no dependencies run in parallel
Advanced caching: Smarter cache invalidation
Faster builds: Noticeably quicker for large Dockerfiles

Practice: Optimize Your Own Dockerfile

You now have a template for optimized multi-stage builds. Here's the pattern to apply to any Python service:

Pattern: Multi-stage Dockerfile for Python AI Services

# Stage 1: Build (large, has build tools)
FROM python:3.12-alpine AS builder

WORKDIR /app

# Install UV for fast dependency installation
RUN pip install uv

COPY requirements.txt .

# Install dependencies with UV
RUN uv pip install --system --no-cache -r requirements.txt

# Stage 2: Runtime (small, only what's needed)
FROM python:3.12-alpine

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# Set environment
ENV PYTHONUNBUFFERED=1

# Copy application code
COPY . .

# For AI services with models: models come via volume mount, not COPY
# COPY models /app/models  <<---- Don't do this for large files!

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

When to deviate from this pattern:

Alpine not available for your stack: Use python:3.12-slim
Static files needed at build time: COPY them in builder stage, but only if under 100MB
Model files required: Use volume mounts, never COPY large files
Security-critical: Consider distroless base image (but lose debugging capability)
Need specific system libraries: Add RUN apk add ... in builder, copy results to runtime stage

Try With AI

Setup: You have a FastAPI agent service from Part 6. Now you'll containerize it with an optimized multi-stage Dockerfile.

Prompts:

Part 1: Initial Dockerfile Ask AI: "I have a FastAPI service with dependencies [list your requirements.txt]. Create a multi-stage Dockerfile that minimizes image size. Use python:3.12-alpine as base image and UV for package installation."

Part 2: Critical Evaluation Review AI's response. Ask yourself:

Are there two stages (build and runtime)?
Does the runtime stage copy only necessary files from builder?
Does it use UV with --no-cache flag?
Is alpine used as the final base image?

Part 3: Size Validation Build the image and check its size:

docker build -t my-agent:optimized .
docker images my-agent:optimized

Compare to a naive version:

# Naive: single stage, full Python image
docker build -t my-agent:naive -f Dockerfile.naive .
docker images

Measure the size reduction. You should see 70-85% reduction from naive to optimized.

Part 4: Layer Analysis Examine the layers:

docker history my-agent:optimized

Ask yourself:

What's the largest layer?
Are RUN commands combined where possible?
Could you remove any intermediate files?

Part 5: Production Readiness Consider:

If you have model files >1GB, did you remove COPY and plan for volume mount instead?
Is the image size now suitable for pushing to a registry?
Does the image run your service correctly (test locally first)?

Compare your final optimized image to the initial naive version. Document the size reduction and the key optimizations that made the biggest difference.

The Problem: Bloated Images​

Iteration 1: Use a Slim Base Image​

Iteration 2: Multi-Stage Builds (Separate Build & Runtime)​

Iteration 3: Use Alpine Base Image + UV Package Manager​

Iteration 4: Combine RUN Commands & Clean Caches​

Understanding Base Image Tradeoffs​

Handling Large Model Files (>1GB)​

Measuring Progress: docker images and docker history​

BuildKit: Faster, Smarter Builds​

Practice: Optimize Your Own Dockerfile​

Try With AI​

The Problem: Bloated Images

Iteration 1: Use a Slim Base Image

Iteration 2: Multi-Stage Builds (Separate Build & Runtime)

Iteration 3: Use Alpine Base Image + UV Package Manager

Iteration 4: Combine RUN Commands & Clean Caches

Understanding Base Image Tradeoffs

Handling Large Model Files (>1GB)

Measuring Progress: docker images and docker history

BuildKit: Faster, Smarter Builds

Practice: Optimize Your Own Dockerfile

Try With AI