This commit introduces comprehensive Docker support for running Wan2.1 video generation models locally with GPU acceleration. Changes: - Add Dockerfile with CUDA 12.1 support and optimized layer caching - Add docker-compose.yml for easy container orchestration - Add .dockerignore for efficient Docker builds - Add DOCKER_SETUP.md with detailed setup and troubleshooting guide - Add DOCKER_QUICKSTART.md for rapid deployment - Add docker-run.sh helper script for container management - Update Makefile with Docker management commands Features: - Full GPU support with NVIDIA Docker runtime - Single-GPU and multi-GPU (FSDP + xDiT) configurations - Memory optimization flags for consumer GPUs (8GB+) - Gradio web interface support on port 7860 - Volume mounts for models, outputs, and cache - Comprehensive troubleshooting and optimization guides - Production-ready security best practices The Docker setup supports all Wan2.1 models (T2V, I2V, FLF2V, VACE) and includes both 1.3B (consumer GPU) and 14B (high-end GPU) variants. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
		
			
				
	
	
	
		
			15 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Wan2.1 Docker Setup Guide
Professional-grade instructions for running Wan2.1 video generation models in Docker containers with GPU support.
Table of Contents
- Prerequisites
 - System Requirements
 - Installation Steps
 - Quick Start
 - Model Download
 - Running Inference
 - Gradio Web Interface
 - Advanced Configuration
 - Troubleshooting
 - Performance Optimization
 
Prerequisites
Required Software
- 
Docker Engine (version 20.10+)
 - 
NVIDIA Docker Runtime (for GPU support)
- Required for GPU acceleration
 - Installation Guide
 
 - 
NVIDIA Drivers (version 525.60.13+)
- CUDA 12.1 compatible drivers
 - Check with: 
nvidia-smi 
 - 
Docker Compose (version 2.0+)
- Typically included with Docker Desktop
 - Installation Guide
 
 
Optional Software
- Git - For cloning the repository
 - Make - For using convenience commands
 - NVIDIA Container Toolkit - For multi-GPU support
 
System Requirements
Minimum Requirements (T2V-1.3B at 480P)
- GPU: NVIDIA GPU with 8GB+ VRAM (e.g., RTX 4060 Ti)
 - RAM: 16GB system memory
 - Storage: 50GB free space (for models and cache)
 - OS: Linux (Ubuntu 20.04+), Windows 10/11 with WSL2
 
Recommended Requirements (T2V-14B at 720P)
- GPU: NVIDIA GPU with 24GB+ VRAM (e.g., RTX 4090, A5000)
 - RAM: 32GB+ system memory
 - Storage: 100GB+ free space
 - OS: Linux (Ubuntu 22.04+)
 
Multi-GPU Setup (for 8x GPU)
- GPUs: 8x NVIDIA GPUs (A100, H100, etc.)
 - RAM: 128GB+ system memory
 - Storage: 200GB+ free space
 - Network: High-bandwidth GPU interconnect (NVLink preferred)
 
Installation Steps
Step 1: Verify Docker and NVIDIA Runtime
# Check Docker installation
docker --version
docker compose version
# Check NVIDIA driver
nvidia-smi
# Test NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi
Expected output: You should see your GPU(s) listed in the nvidia-smi output.
Step 2: Clone the Repository
git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1
Step 3: Create Required Directories
# Create directories for models, outputs, and cache
mkdir -p models outputs cache examples
Step 4: Set Environment Variables (Optional)
For prompt extension with Dashscope API:
# Create a .env file
cat > .env << EOF
DASH_API_KEY=your_dashscope_api_key_here
DASH_API_URL=https://dashscope.aliyuncs.com/api/v1
EOF
For international Alibaba Cloud users:
DASH_API_URL=https://dashscope-intl.aliyuncs.com/api/v1
Step 5: Build the Docker Image
# Build using Docker Compose (recommended)
docker compose build
# OR build manually
docker build -t wan2.1:latest .
Build time: Approximately 10-20 minutes depending on your internet connection.
Quick Start
Option 1: Using Docker Compose (Recommended)
# Start the container with GPU support
docker compose up -d wan2-1
# Check container status
docker compose ps
# View logs
docker compose logs -f wan2-1
# Access the container shell
docker compose exec wan2-1 bash
Option 2: Using Docker Run
docker run -it --gpus all \
  --name wan2.1-container \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/cache:/app/cache \
  -p 7860:7860 \
  --shm-size=16g \
  wan2.1:latest bash
For CPU-only Mode
# Using Docker Compose
docker compose --profile cpu up -d wan2-1-cpu
# Using Docker Run
docker run -it \
  --name wan2.1-cpu \
  -e CUDA_VISIBLE_DEVICES="" \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/cache:/app/cache \
  -p 7860:7860 \
  wan2.1:latest bash
Model Download
Download models before running inference. Models should be placed in the ./models directory.
Using Hugging Face CLI (Inside Container)
# Enter the container
docker compose exec wan2-1 bash
# Download T2V-14B model
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir /app/models/Wan2.1-T2V-14B
# Download T2V-1.3B model
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir /app/models/Wan2.1-T2V-1.3B
# Download I2V-14B-720P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir /app/models/Wan2.1-I2V-14B-720P
# Download I2V-14B-480P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir /app/models/Wan2.1-I2V-14B-480P
# Download FLF2V-14B model
huggingface-cli download Wan-AI/Wan2.1-FLF2V-14B-720P --local-dir /app/models/Wan2.1-FLF2V-14B-720P
# Download VACE models
huggingface-cli download Wan-AI/Wan2.1-VACE-1.3B --local-dir /app/models/Wan2.1-VACE-1.3B
huggingface-cli download Wan-AI/Wan2.1-VACE-14B --local-dir /app/models/Wan2.1-VACE-14B
Using ModelScope (Alternative for Chinese Users)
pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B --local_dir /app/models/Wan2.1-T2V-14B
Download from Host Machine
You can also download models on your host machine and they will be accessible in the container:
# On host machine (outside Docker)
cd Wan2.1/models
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./Wan2.1-T2V-1.3B
Running Inference
All commands below should be run inside the container.
Text-to-Video Generation
1.3B Model (480P) - Consumer GPU Friendly
python generate.py \
  --task t2v-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-T2V-1.3B \
  --offload_model True \
  --t5_cpu \
  --sample_shift 8 \
  --sample_guide_scale 6 \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
14B Model (720P) - High-End GPU
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
With Prompt Extension (Better Quality)
# Using local Qwen model
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --use_prompt_extend \
  --prompt_extend_method 'local_qwen' \
  --prompt "A beautiful sunset over the ocean"
# Using Dashscope API (requires DASH_API_KEY)
DASH_API_KEY=your_key python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --use_prompt_extend \
  --prompt_extend_method 'dashscope' \
  --prompt "A beautiful sunset over the ocean"
Image-to-Video Generation
python generate.py \
  --task i2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-I2V-14B-720P \
  --image /app/examples/i2v_input.JPG \
  --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard."
First-Last-Frame-to-Video
python generate.py \
  --task flf2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-FLF2V-14B-720P \
  --first_frame /app/examples/flf2v_input_first_frame.png \
  --last_frame /app/examples/flf2v_input_last_frame.png \
  --prompt "CG animation style, a small blue bird takes off from the ground"
Text-to-Image Generation
python generate.py \
  --task t2i-14B \
  --size 1024*1024 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt "A serene mountain landscape at dawn"
VACE (Video Creation and Editing)
python generate.py \
  --task vace-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-VACE-1.3B \
  --src_ref_images /app/examples/girl.png,/app/examples/snake.png \
  --prompt "Your detailed prompt here"
Gradio Web Interface
Start Gradio Interface
Text-to-Video (14B)
cd gradio
python t2v_14B_singleGPU.py \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt_extend_method 'local_qwen'
Image-to-Video (14B)
cd gradio
python i2v_14B_singleGPU.py \
  --ckpt_dir_720p /app/models/Wan2.1-I2V-14B-720P \
  --prompt_extend_method 'local_qwen'
VACE (All-in-One)
cd gradio
python vace.py --ckpt_dir /app/models/Wan2.1-VACE-1.3B
Access the Web Interface
- Open your web browser
 - Navigate to: 
http://localhost:7860 - Use the intuitive interface to generate videos
 
For Remote Access
If running on a remote server:
# Start with public URL (Gradio share feature)
python gradio/t2v_14B_singleGPU.py \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --server_name 0.0.0.0 \
  --server_port 7860 \
  --share
Then access via: http://your-server-ip:7860
Advanced Configuration
Multi-GPU Inference (FSDP + xDiT)
For 8-GPU setup using Ulysses or Ring attention strategies:
# Install xDiT
pip install "xfuser>=0.4.1"
# Run with Ulysses strategy (8 GPUs)
torchrun --nproc_per_node=8 generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --dit_fsdp \
  --t5_fsdp \
  --ulysses_size 8 \
  --prompt "Your prompt here"
# Run with Ring strategy (for sequence parallelism)
torchrun --nproc_per_node=8 generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --dit_fsdp \
  --t5_fsdp \
  --ring_size 8 \
  --prompt "Your prompt here"
Memory Optimization Flags
For limited VRAM:
python generate.py \
  --task t2v-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-T2V-1.3B \
  --offload_model True \  # Offload model to CPU when not in use
  --t5_cpu \               # Keep T5 encoder on CPU
  --sample_shift 8 \
  --sample_guide_scale 6 \
  --prompt "Your prompt"
Custom Output Directory
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --output_dir /app/outputs/my_generation \
  --prompt "Your prompt"
Batch Generation
Generate multiple variations:
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --base_seed 0 \
  --num_samples 4 \  # Generate 4 variations
  --prompt "Your prompt"
Troubleshooting
Issue: "CUDA out of memory"
Solutions:
- Use smaller model (1.3B instead of 14B)
 - Reduce resolution (480P instead of 720P)
 - Enable memory optimization flags:
--offload_model True --t5_cpu - Increase Docker shared memory:
docker run --shm-size=32g ... 
Issue: "nvidia-smi not found" inside container
Solutions:
- Verify NVIDIA Docker runtime is installed on host
 - Check Docker daemon configuration:
# Edit /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia" } - Restart Docker daemon:
sudo systemctl restart docker 
Issue: "Flash attention installation failed"
Solution: Flash attention is optional. The Dockerfile continues even if it fails. For better performance, install manually:
# Inside container
pip install flash-attn --no-build-isolation
Issue: Model download fails
Solutions:
- Check internet connection
 - Use mirror sites (ModelScope for Chinese users)
 - Download models on host machine and mount them
 - Increase Docker download timeout
 
Issue: "RuntimeError: CUDA error: device-side assert triggered"
Solutions:
- Check CUDA compatibility:
python -c "import torch; print(torch.cuda.is_available())" - Update NVIDIA drivers
 - Rebuild Docker image with matching CUDA version
 
Issue: Gradio interface not accessible
Solutions:
- Check if port is exposed:
docker ps | grep 7860 - Ensure firewall allows port 7860
 - Try binding to all interfaces:
python gradio/app.py --server_name 0.0.0.0 
Issue: Permission denied errors
Solution:
# Fix ownership of mounted volumes
sudo chown -R $(id -u):$(id -g) models outputs cache
Performance Optimization
1. Use SSD Storage
- Store models and cache on SSD for faster loading
 - Use NVMe for best performance
 
2. Increase Shared Memory
# In docker-compose.yml
shm_size: '32gb'
3. Use Mixed Precision
- The model uses bfloat16 by default (optimal for modern GPUs)
 
4. Enable Xformers (if available)
pip install xformers
5. Multi-GPU Best Practices
- Use NVLink/NVSwitch for GPU communication
 - Balance model sharding with Ulysses + Ring strategies
 - Monitor GPU utilization: 
watch -n 1 nvidia-smi 
6. Optimize Inference Parameters
# For T2V-1.3B
--sample_shift 8 \        # Adjust 8-12 based on quality
--sample_guide_scale 6    # Lower = faster, higher = better quality
# For T2V-14B
--sample_guide_scale 5.0  # Default recommended
7. Use Persistent Cache
# Models and transformers will be cached in ./cache
# Reusing the cache speeds up subsequent runs
Container Management
Stop Container
docker compose down
Restart Container
docker compose restart wan2-1
View Logs
docker compose logs -f wan2-1
Clean Up
# Remove containers
docker compose down -v
# Remove images
docker rmi wan2.1:latest
# Clean up Docker system
docker system prune -a
Update Container
# Pull latest code
git pull origin main
# Rebuild image
docker compose build --no-cache
# Restart containers
docker compose up -d
Security Best Practices
- Do not commit API keys to version control
 - Use .env files for sensitive environment variables
 - Limit container privileges: Avoid running as root
 - Keep Docker updated for security patches
 - Scan images for vulnerabilities:
docker scan wan2.1:latest 
Support and Resources
- GitHub Issues: https://github.com/Wan-Video/Wan2.1/issues
 - Discord: Join the community
 - Technical Report: arXiv:2503.20314
 - Docker Documentation: https://docs.docker.com/
 - NVIDIA Container Toolkit: https://github.com/NVIDIA/nvidia-docker
 
License
This Docker setup follows the same Apache 2.0 License as the Wan2.1 project. See LICENSE.txt for details.
Last Updated: 2025-10-26 Version: 1.0.0 Maintainer: Wan2.1 Community