Wan2.1/DOCKER_SETUP.md
Claude 0bd40b9bf0
Add professional-grade Docker setup for local deployment
This commit introduces comprehensive Docker support for running Wan2.1
video generation models locally with GPU acceleration.

Changes:
- Add Dockerfile with CUDA 12.1 support and optimized layer caching
- Add docker-compose.yml for easy container orchestration
- Add .dockerignore for efficient Docker builds
- Add DOCKER_SETUP.md with detailed setup and troubleshooting guide
- Add DOCKER_QUICKSTART.md for rapid deployment
- Add docker-run.sh helper script for container management
- Update Makefile with Docker management commands

Features:
- Full GPU support with NVIDIA Docker runtime
- Single-GPU and multi-GPU (FSDP + xDiT) configurations
- Memory optimization flags for consumer GPUs (8GB+)
- Gradio web interface support on port 7860
- Volume mounts for models, outputs, and cache
- Comprehensive troubleshooting and optimization guides
- Production-ready security best practices

The Docker setup supports all Wan2.1 models (T2V, I2V, FLF2V, VACE)
and includes both 1.3B (consumer GPU) and 14B (high-end GPU) variants.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 03:06:50 +00:00

15 KiB

Wan2.1 Docker Setup Guide

Professional-grade instructions for running Wan2.1 video generation models in Docker containers with GPU support.


Table of Contents


Prerequisites

Required Software

  1. Docker Engine (version 20.10+)

  2. NVIDIA Docker Runtime (for GPU support)

  3. NVIDIA Drivers (version 525.60.13+)

    • CUDA 12.1 compatible drivers
    • Check with: nvidia-smi
  4. Docker Compose (version 2.0+)

Optional Software

  • Git - For cloning the repository
  • Make - For using convenience commands
  • NVIDIA Container Toolkit - For multi-GPU support

System Requirements

Minimum Requirements (T2V-1.3B at 480P)

  • GPU: NVIDIA GPU with 8GB+ VRAM (e.g., RTX 4060 Ti)
  • RAM: 16GB system memory
  • Storage: 50GB free space (for models and cache)
  • OS: Linux (Ubuntu 20.04+), Windows 10/11 with WSL2
  • GPU: NVIDIA GPU with 24GB+ VRAM (e.g., RTX 4090, A5000)
  • RAM: 32GB+ system memory
  • Storage: 100GB+ free space
  • OS: Linux (Ubuntu 22.04+)

Multi-GPU Setup (for 8x GPU)

  • GPUs: 8x NVIDIA GPUs (A100, H100, etc.)
  • RAM: 128GB+ system memory
  • Storage: 200GB+ free space
  • Network: High-bandwidth GPU interconnect (NVLink preferred)

Installation Steps

Step 1: Verify Docker and NVIDIA Runtime

# Check Docker installation
docker --version
docker compose version

# Check NVIDIA driver
nvidia-smi

# Test NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi

Expected output: You should see your GPU(s) listed in the nvidia-smi output.

Step 2: Clone the Repository

git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1

Step 3: Create Required Directories

# Create directories for models, outputs, and cache
mkdir -p models outputs cache examples

Step 4: Set Environment Variables (Optional)

For prompt extension with Dashscope API:

# Create a .env file
cat > .env << EOF
DASH_API_KEY=your_dashscope_api_key_here
DASH_API_URL=https://dashscope.aliyuncs.com/api/v1
EOF

For international Alibaba Cloud users:

DASH_API_URL=https://dashscope-intl.aliyuncs.com/api/v1

Step 5: Build the Docker Image

# Build using Docker Compose (recommended)
docker compose build

# OR build manually
docker build -t wan2.1:latest .

Build time: Approximately 10-20 minutes depending on your internet connection.


Quick Start

# Start the container with GPU support
docker compose up -d wan2-1

# Check container status
docker compose ps

# View logs
docker compose logs -f wan2-1

# Access the container shell
docker compose exec wan2-1 bash

Option 2: Using Docker Run

docker run -it --gpus all \
  --name wan2.1-container \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/cache:/app/cache \
  -p 7860:7860 \
  --shm-size=16g \
  wan2.1:latest bash

For CPU-only Mode

# Using Docker Compose
docker compose --profile cpu up -d wan2-1-cpu

# Using Docker Run
docker run -it \
  --name wan2.1-cpu \
  -e CUDA_VISIBLE_DEVICES="" \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/cache:/app/cache \
  -p 7860:7860 \
  wan2.1:latest bash

Model Download

Download models before running inference. Models should be placed in the ./models directory.

Using Hugging Face CLI (Inside Container)

# Enter the container
docker compose exec wan2-1 bash

# Download T2V-14B model
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir /app/models/Wan2.1-T2V-14B

# Download T2V-1.3B model
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir /app/models/Wan2.1-T2V-1.3B

# Download I2V-14B-720P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir /app/models/Wan2.1-I2V-14B-720P

# Download I2V-14B-480P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir /app/models/Wan2.1-I2V-14B-480P

# Download FLF2V-14B model
huggingface-cli download Wan-AI/Wan2.1-FLF2V-14B-720P --local-dir /app/models/Wan2.1-FLF2V-14B-720P

# Download VACE models
huggingface-cli download Wan-AI/Wan2.1-VACE-1.3B --local-dir /app/models/Wan2.1-VACE-1.3B
huggingface-cli download Wan-AI/Wan2.1-VACE-14B --local-dir /app/models/Wan2.1-VACE-14B

Using ModelScope (Alternative for Chinese Users)

pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B --local_dir /app/models/Wan2.1-T2V-14B

Download from Host Machine

You can also download models on your host machine and they will be accessible in the container:

# On host machine (outside Docker)
cd Wan2.1/models
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./Wan2.1-T2V-1.3B

Running Inference

All commands below should be run inside the container.

Text-to-Video Generation

1.3B Model (480P) - Consumer GPU Friendly

python generate.py \
  --task t2v-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-T2V-1.3B \
  --offload_model True \
  --t5_cpu \
  --sample_shift 8 \
  --sample_guide_scale 6 \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

14B Model (720P) - High-End GPU

python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

With Prompt Extension (Better Quality)

# Using local Qwen model
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --use_prompt_extend \
  --prompt_extend_method 'local_qwen' \
  --prompt "A beautiful sunset over the ocean"

# Using Dashscope API (requires DASH_API_KEY)
DASH_API_KEY=your_key python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --use_prompt_extend \
  --prompt_extend_method 'dashscope' \
  --prompt "A beautiful sunset over the ocean"

Image-to-Video Generation

python generate.py \
  --task i2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-I2V-14B-720P \
  --image /app/examples/i2v_input.JPG \
  --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard."

First-Last-Frame-to-Video

python generate.py \
  --task flf2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-FLF2V-14B-720P \
  --first_frame /app/examples/flf2v_input_first_frame.png \
  --last_frame /app/examples/flf2v_input_last_frame.png \
  --prompt "CG animation style, a small blue bird takes off from the ground"

Text-to-Image Generation

python generate.py \
  --task t2i-14B \
  --size 1024*1024 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt "A serene mountain landscape at dawn"

VACE (Video Creation and Editing)

python generate.py \
  --task vace-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-VACE-1.3B \
  --src_ref_images /app/examples/girl.png,/app/examples/snake.png \
  --prompt "Your detailed prompt here"

Gradio Web Interface

Start Gradio Interface

Text-to-Video (14B)

cd gradio
python t2v_14B_singleGPU.py \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt_extend_method 'local_qwen'

Image-to-Video (14B)

cd gradio
python i2v_14B_singleGPU.py \
  --ckpt_dir_720p /app/models/Wan2.1-I2V-14B-720P \
  --prompt_extend_method 'local_qwen'

VACE (All-in-One)

cd gradio
python vace.py --ckpt_dir /app/models/Wan2.1-VACE-1.3B

Access the Web Interface

  1. Open your web browser
  2. Navigate to: http://localhost:7860
  3. Use the intuitive interface to generate videos

For Remote Access

If running on a remote server:

# Start with public URL (Gradio share feature)
python gradio/t2v_14B_singleGPU.py \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --server_name 0.0.0.0 \
  --server_port 7860 \
  --share

Then access via: http://your-server-ip:7860


Advanced Configuration

Multi-GPU Inference (FSDP + xDiT)

For 8-GPU setup using Ulysses or Ring attention strategies:

# Install xDiT
pip install "xfuser>=0.4.1"

# Run with Ulysses strategy (8 GPUs)
torchrun --nproc_per_node=8 generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --dit_fsdp \
  --t5_fsdp \
  --ulysses_size 8 \
  --prompt "Your prompt here"

# Run with Ring strategy (for sequence parallelism)
torchrun --nproc_per_node=8 generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --dit_fsdp \
  --t5_fsdp \
  --ring_size 8 \
  --prompt "Your prompt here"

Memory Optimization Flags

For limited VRAM:

python generate.py \
  --task t2v-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-T2V-1.3B \
  --offload_model True \  # Offload model to CPU when not in use
  --t5_cpu \               # Keep T5 encoder on CPU
  --sample_shift 8 \
  --sample_guide_scale 6 \
  --prompt "Your prompt"

Custom Output Directory

python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --output_dir /app/outputs/my_generation \
  --prompt "Your prompt"

Batch Generation

Generate multiple variations:

python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --base_seed 0 \
  --num_samples 4 \  # Generate 4 variations
  --prompt "Your prompt"

Troubleshooting

Issue: "CUDA out of memory"

Solutions:

  1. Use smaller model (1.3B instead of 14B)
  2. Reduce resolution (480P instead of 720P)
  3. Enable memory optimization flags:
    --offload_model True --t5_cpu
    
  4. Increase Docker shared memory:
    docker run --shm-size=32g ...
    

Issue: "nvidia-smi not found" inside container

Solutions:

  1. Verify NVIDIA Docker runtime is installed on host
  2. Check Docker daemon configuration:
    # Edit /etc/docker/daemon.json
    {
      "runtimes": {
        "nvidia": {
          "path": "nvidia-container-runtime",
          "runtimeArgs": []
        }
      },
      "default-runtime": "nvidia"
    }
    
  3. Restart Docker daemon:
    sudo systemctl restart docker
    

Issue: "Flash attention installation failed"

Solution: Flash attention is optional. The Dockerfile continues even if it fails. For better performance, install manually:

# Inside container
pip install flash-attn --no-build-isolation

Issue: Model download fails

Solutions:

  1. Check internet connection
  2. Use mirror sites (ModelScope for Chinese users)
  3. Download models on host machine and mount them
  4. Increase Docker download timeout

Issue: "RuntimeError: CUDA error: device-side assert triggered"

Solutions:

  1. Check CUDA compatibility:
    python -c "import torch; print(torch.cuda.is_available())"
    
  2. Update NVIDIA drivers
  3. Rebuild Docker image with matching CUDA version

Issue: Gradio interface not accessible

Solutions:

  1. Check if port is exposed:
    docker ps | grep 7860
    
  2. Ensure firewall allows port 7860
  3. Try binding to all interfaces:
    python gradio/app.py --server_name 0.0.0.0
    

Issue: Permission denied errors

Solution:

# Fix ownership of mounted volumes
sudo chown -R $(id -u):$(id -g) models outputs cache

Performance Optimization

1. Use SSD Storage

  • Store models and cache on SSD for faster loading
  • Use NVMe for best performance

2. Increase Shared Memory

# In docker-compose.yml
shm_size: '32gb'

3. Use Mixed Precision

  • The model uses bfloat16 by default (optimal for modern GPUs)

4. Enable Xformers (if available)

pip install xformers

5. Multi-GPU Best Practices

  • Use NVLink/NVSwitch for GPU communication
  • Balance model sharding with Ulysses + Ring strategies
  • Monitor GPU utilization: watch -n 1 nvidia-smi

6. Optimize Inference Parameters

# For T2V-1.3B
--sample_shift 8 \        # Adjust 8-12 based on quality
--sample_guide_scale 6    # Lower = faster, higher = better quality

# For T2V-14B
--sample_guide_scale 5.0  # Default recommended

7. Use Persistent Cache

# Models and transformers will be cached in ./cache
# Reusing the cache speeds up subsequent runs

Container Management

Stop Container

docker compose down

Restart Container

docker compose restart wan2-1

View Logs

docker compose logs -f wan2-1

Clean Up

# Remove containers
docker compose down -v

# Remove images
docker rmi wan2.1:latest

# Clean up Docker system
docker system prune -a

Update Container

# Pull latest code
git pull origin main

# Rebuild image
docker compose build --no-cache

# Restart containers
docker compose up -d

Security Best Practices

  1. Do not commit API keys to version control
  2. Use .env files for sensitive environment variables
  3. Limit container privileges: Avoid running as root
  4. Keep Docker updated for security patches
  5. Scan images for vulnerabilities:
    docker scan wan2.1:latest
    

Support and Resources


License

This Docker setup follows the same Apache 2.0 License as the Wan2.1 project. See LICENSE.txt for details.


Last Updated: 2025-10-26 Version: 1.0.0 Maintainer: Wan2.1 Community