mirror of https://github.com/Wan-Video/Wan2.1.git synced 2025-11-04 06:15:17 +00:00

Add professional-grade Docker setup for local deployment

This commit introduces comprehensive Docker support for running Wan2.1
video generation models locally with GPU acceleration.

Changes:
- Add Dockerfile with CUDA 12.1 support and optimized layer caching
- Add docker-compose.yml for easy container orchestration
- Add .dockerignore for efficient Docker builds
- Add DOCKER_SETUP.md with detailed setup and troubleshooting guide
- Add DOCKER_QUICKSTART.md for rapid deployment
- Add docker-run.sh helper script for container management
- Update Makefile with Docker management commands

Features:
- Full GPU support with NVIDIA Docker runtime
- Single-GPU and multi-GPU (FSDP + xDiT) configurations
- Memory optimization flags for consumer GPUs (8GB+)
- Gradio web interface support on port 7860
- Volume mounts for models, outputs, and cache
- Comprehensive troubleshooting and optimization guides
- Production-ready security best practices

The Docker setup supports all Wan2.1 models (T2V, I2V, FLF2V, VACE)
and includes both 1.3B (consumer GPU) and 14B (high-end GPU) variants.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-26 03:06:50 +00:00

15 KiB

Raw Blame History

Wan2.1 Docker Setup Guide

Professional-grade instructions for running Wan2.1 video generation models in Docker containers with GPU support.

Prerequisites
System Requirements
Installation Steps
Quick Start
Model Download
Running Inference
Gradio Web Interface
Advanced Configuration
Troubleshooting
Performance Optimization

Prerequisites

Required Software

Docker Engine (version 20.10+)
- Installation Guide
NVIDIA Docker Runtime (for GPU support)
- Required for GPU acceleration
- Installation Guide
NVIDIA Drivers (version 525.60.13+)
- CUDA 12.1 compatible drivers
- Check with: nvidia-smi
Docker Compose (version 2.0+)
- Typically included with Docker Desktop
- Installation Guide

Optional Software

Git - For cloning the repository
Make - For using convenience commands
NVIDIA Container Toolkit - For multi-GPU support

System Requirements

Minimum Requirements (T2V-1.3B at 480P)

GPU: NVIDIA GPU with 8GB+ VRAM (e.g., RTX 4060 Ti)
RAM: 16GB system memory
Storage: 50GB free space (for models and cache)
OS: Linux (Ubuntu 20.04+), Windows 10/11 with WSL2

Recommended Requirements (T2V-14B at 720P)

GPU: NVIDIA GPU with 24GB+ VRAM (e.g., RTX 4090, A5000)
RAM: 32GB+ system memory
Storage: 100GB+ free space
OS: Linux (Ubuntu 22.04+)

Multi-GPU Setup (for 8x GPU)

GPUs: 8x NVIDIA GPUs (A100, H100, etc.)
RAM: 128GB+ system memory
Storage: 200GB+ free space
Network: High-bandwidth GPU interconnect (NVLink preferred)

Installation Steps

Step 1: Verify Docker and NVIDIA Runtime

# Check Docker installation
docker --version
docker compose version

# Check NVIDIA driver
nvidia-smi

# Test NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi

Expected output: You should see your GPU(s) listed in the nvidia-smi output.

Step 2: Clone the Repository

git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1

Step 3: Create Required Directories

# Create directories for models, outputs, and cache
mkdir -p models outputs cache examples

Step 4: Set Environment Variables (Optional)

For prompt extension with Dashscope API:

# Create a .env file
cat > .env << EOF
DASH_API_KEY=your_dashscope_api_key_here
DASH_API_URL=https://dashscope.aliyuncs.com/api/v1
EOF

For international Alibaba Cloud users:

DASH_API_URL=https://dashscope-intl.aliyuncs.com/api/v1

Step 5: Build the Docker Image

# Build using Docker Compose (recommended)
docker compose build

# OR build manually
docker build -t wan2.1:latest .

Build time: Approximately 10-20 minutes depending on your internet connection.

Quick Start

Option 1: Using Docker Compose (Recommended)

# Start the container with GPU support
docker compose up -d wan2-1

# Check container status
docker compose ps

# View logs
docker compose logs -f wan2-1

# Access the container shell
docker compose exec wan2-1 bash

Option 2: Using Docker Run

docker run -it --gpus all \
  --name wan2.1-container \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/cache:/app/cache \
  -p 7860:7860 \
  --shm-size=16g \
  wan2.1:latest bash

For CPU-only Mode

# Using Docker Compose
docker compose --profile cpu up -d wan2-1-cpu

# Using Docker Run
docker run -it \
  --name wan2.1-cpu \
  -e CUDA_VISIBLE_DEVICES="" \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/cache:/app/cache \
  -p 7860:7860 \
  wan2.1:latest bash

Model Download

Download models before running inference. Models should be placed in the ./models directory.

Using Hugging Face CLI (Inside Container)

# Enter the container
docker compose exec wan2-1 bash

# Download T2V-14B model
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir /app/models/Wan2.1-T2V-14B

# Download T2V-1.3B model
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir /app/models/Wan2.1-T2V-1.3B

# Download I2V-14B-720P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir /app/models/Wan2.1-I2V-14B-720P

# Download I2V-14B-480P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir /app/models/Wan2.1-I2V-14B-480P

# Download FLF2V-14B model
huggingface-cli download Wan-AI/Wan2.1-FLF2V-14B-720P --local-dir /app/models/Wan2.1-FLF2V-14B-720P

# Download VACE models
huggingface-cli download Wan-AI/Wan2.1-VACE-1.3B --local-dir /app/models/Wan2.1-VACE-1.3B
huggingface-cli download Wan-AI/Wan2.1-VACE-14B --local-dir /app/models/Wan2.1-VACE-14B

Using ModelScope (Alternative for Chinese Users)

pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B --local_dir /app/models/Wan2.1-T2V-14B

Download from Host Machine

You can also download models on your host machine and they will be accessible in the container:

# On host machine (outside Docker)
cd Wan2.1/models
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./Wan2.1-T2V-1.3B

Running Inference

All commands below should be run inside the container.

Text-to-Video Generation

1.3B Model (480P) - Consumer GPU Friendly

python generate.py \
  --task t2v-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-T2V-1.3B \
  --offload_model True \
  --t5_cpu \
  --sample_shift 8 \
  --sample_guide_scale 6 \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

14B Model (720P) - High-End GPU

python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

With Prompt Extension (Better Quality)

# Using local Qwen model
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --use_prompt_extend \
  --prompt_extend_method 'local_qwen' \
  --prompt "A beautiful sunset over the ocean"

# Using Dashscope API (requires DASH_API_KEY)
DASH_API_KEY=your_key python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --use_prompt_extend \
  --prompt_extend_method 'dashscope' \
  --prompt "A beautiful sunset over the ocean"

Image-to-Video Generation

python generate.py \
  --task i2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-I2V-14B-720P \
  --image /app/examples/i2v_input.JPG \
  --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard."

First-Last-Frame-to-Video

python generate.py \
  --task flf2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-FLF2V-14B-720P \
  --first_frame /app/examples/flf2v_input_first_frame.png \
  --last_frame /app/examples/flf2v_input_last_frame.png \
  --prompt "CG animation style, a small blue bird takes off from the ground"

Text-to-Image Generation

python generate.py \
  --task t2i-14B \
  --size 1024*1024 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt "A serene mountain landscape at dawn"

VACE (Video Creation and Editing)

python generate.py \
  --task vace-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-VACE-1.3B \
  --src_ref_images /app/examples/girl.png,/app/examples/snake.png \
  --prompt "Your detailed prompt here"

Gradio Web Interface

Start Gradio Interface

Text-to-Video (14B)

cd gradio
python t2v_14B_singleGPU.py \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt_extend_method 'local_qwen'

Image-to-Video (14B)

cd gradio
python i2v_14B_singleGPU.py \
  --ckpt_dir_720p /app/models/Wan2.1-I2V-14B-720P \
  --prompt_extend_method 'local_qwen'

VACE (All-in-One)

cd gradio
python vace.py --ckpt_dir /app/models/Wan2.1-VACE-1.3B

Access the Web Interface

Open your web browser
Navigate to: http://localhost:7860
Use the intuitive interface to generate videos

For Remote Access

If running on a remote server:

# Start with public URL (Gradio share feature)
python gradio/t2v_14B_singleGPU.py \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --server_name 0.0.0.0 \
  --server_port 7860 \
  --share

Then access via: http://your-server-ip:7860

Advanced Configuration

Multi-GPU Inference (FSDP + xDiT)

For 8-GPU setup using Ulysses or Ring attention strategies:

# Install xDiT
pip install "xfuser>=0.4.1"

# Run with Ulysses strategy (8 GPUs)
torchrun --nproc_per_node=8 generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --dit_fsdp \
  --t5_fsdp \
  --ulysses_size 8 \
  --prompt "Your prompt here"

# Run with Ring strategy (for sequence parallelism)
torchrun --nproc_per_node=8 generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --dit_fsdp \
  --t5_fsdp \
  --ring_size 8 \
  --prompt "Your prompt here"

Memory Optimization Flags

For limited VRAM:

python generate.py \
  --task t2v-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-T2V-1.3B \
  --offload_model True \  # Offload model to CPU when not in use
  --t5_cpu \               # Keep T5 encoder on CPU
  --sample_shift 8 \
  --sample_guide_scale 6 \
  --prompt "Your prompt"

Custom Output Directory

python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --output_dir /app/outputs/my_generation \
  --prompt "Your prompt"

Batch Generation

Generate multiple variations:

python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --base_seed 0 \
  --num_samples 4 \  # Generate 4 variations
  --prompt "Your prompt"

Troubleshooting

Issue: "CUDA out of memory"

Solutions:

Use smaller model (1.3B instead of 14B)
Reduce resolution (480P instead of 720P)
Enable memory optimization flags:
```
--offload_model True --t5_cpu
```
Increase Docker shared memory:
```
docker run --shm-size=32g ...
```

Issue: "nvidia-smi not found" inside container

Solutions:

Verify NVIDIA Docker runtime is installed on host

Check Docker daemon configuration:

# Edit /etc/docker/daemon.json
{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

Restart Docker daemon:
```
sudo systemctl restart docker
```

Issue: "Flash attention installation failed"

Solution: Flash attention is optional. The Dockerfile continues even if it fails. For better performance, install manually:

# Inside container
pip install flash-attn --no-build-isolation

Issue: Model download fails

Solutions:

Check internet connection
Use mirror sites (ModelScope for Chinese users)
Download models on host machine and mount them
Increase Docker download timeout

Issue: "RuntimeError: CUDA error: device-side assert triggered"

Solutions:

Check CUDA compatibility:

python -c "import torch; print(torch.cuda.is_available())"

Update NVIDIA drivers
Rebuild Docker image with matching CUDA version

Issue: Gradio interface not accessible

Solutions:

Check if port is exposed:
```
docker ps | grep 7860
```
Ensure firewall allows port 7860

Try binding to all interfaces:

python gradio/app.py --server_name 0.0.0.0

Issue: Permission denied errors

Solution:

# Fix ownership of mounted volumes
sudo chown -R $(id -u):$(id -g) models outputs cache

Performance Optimization

1. Use SSD Storage

Store models and cache on SSD for faster loading
Use NVMe for best performance

2. Increase Shared Memory

# In docker-compose.yml
shm_size: '32gb'

3. Use Mixed Precision

The model uses bfloat16 by default (optimal for modern GPUs)

4. Enable Xformers (if available)

pip install xformers

5. Multi-GPU Best Practices

Use NVLink/NVSwitch for GPU communication
Balance model sharding with Ulysses + Ring strategies
Monitor GPU utilization: watch -n 1 nvidia-smi

6. Optimize Inference Parameters

# For T2V-1.3B
--sample_shift 8 \        # Adjust 8-12 based on quality
--sample_guide_scale 6    # Lower = faster, higher = better quality

# For T2V-14B
--sample_guide_scale 5.0  # Default recommended

7. Use Persistent Cache

# Models and transformers will be cached in ./cache
# Reusing the cache speeds up subsequent runs

Container Management

Stop Container

docker compose down

Restart Container

docker compose restart wan2-1

View Logs

docker compose logs -f wan2-1

Clean Up

# Remove containers
docker compose down -v

# Remove images
docker rmi wan2.1:latest

# Clean up Docker system
docker system prune -a

Update Container

# Pull latest code
git pull origin main

# Rebuild image
docker compose build --no-cache

# Restart containers
docker compose up -d

Security Best Practices

Do not commit API keys to version control
Use .env files for sensitive environment variables
Limit container privileges: Avoid running as root
Keep Docker updated for security patches
Scan images for vulnerabilities:
```
docker scan wan2.1:latest
```

Support and Resources

GitHub Issues: https://github.com/Wan-Video/Wan2.1/issues
Discord: Join the community
Technical Report: arXiv:2503.20314
Docker Documentation: https://docs.docker.com/
NVIDIA Container Toolkit: https://github.com/NVIDIA/nvidia-docker

License

This Docker setup follows the same Apache 2.0 License as the Wan2.1 project. See LICENSE.txt for details.

Last Updated: 2025-10-26 Version: 1.0.0 Maintainer: Wan2.1 Community

15 KiB Raw Blame History

Wan2.1 Docker Setup Guide

Table of Contents

Prerequisites

Required Software

Optional Software

System Requirements

Minimum Requirements (T2V-1.3B at 480P)

Recommended Requirements (T2V-14B at 720P)

Multi-GPU Setup (for 8x GPU)

Installation Steps

Step 1: Verify Docker and NVIDIA Runtime

Step 2: Clone the Repository

Step 3: Create Required Directories

Step 4: Set Environment Variables (Optional)

Step 5: Build the Docker Image

Quick Start

Option 1: Using Docker Compose (Recommended)

Option 2: Using Docker Run

For CPU-only Mode

Model Download

Using Hugging Face CLI (Inside Container)

Using ModelScope (Alternative for Chinese Users)

Download from Host Machine

Running Inference

Text-to-Video Generation

1.3B Model (480P) - Consumer GPU Friendly

14B Model (720P) - High-End GPU

With Prompt Extension (Better Quality)

Image-to-Video Generation

First-Last-Frame-to-Video

Text-to-Image Generation

VACE (Video Creation and Editing)

Gradio Web Interface

Start Gradio Interface

Text-to-Video (14B)

Image-to-Video (14B)

VACE (All-in-One)

Access the Web Interface

For Remote Access

Advanced Configuration

Multi-GPU Inference (FSDP + xDiT)

Memory Optimization Flags

Custom Output Directory

Batch Generation

Troubleshooting

Issue: "CUDA out of memory"

Issue: "nvidia-smi not found" inside container

Issue: "Flash attention installation failed"

Issue: Model download fails

Issue: "RuntimeError: CUDA error: device-side assert triggered"

Issue: Gradio interface not accessible

Issue: Permission denied errors

Performance Optimization

1. Use SSD Storage

2. Increase Shared Memory

3. Use Mixed Precision

4. Enable Xformers (if available)

5. Multi-GPU Best Practices

6. Optimize Inference Parameters

7. Use Persistent Cache

Container Management

Stop Container

Restart Container

View Logs

Clean Up

Update Container

Security Best Practices

Support and Resources

License

15 KiB

Raw Blame History