Wan2.1/DOCKER_SETUP.md
Claude 0bd40b9bf0
Add professional-grade Docker setup for local deployment
This commit introduces comprehensive Docker support for running Wan2.1
video generation models locally with GPU acceleration.

Changes:
- Add Dockerfile with CUDA 12.1 support and optimized layer caching
- Add docker-compose.yml for easy container orchestration
- Add .dockerignore for efficient Docker builds
- Add DOCKER_SETUP.md with detailed setup and troubleshooting guide
- Add DOCKER_QUICKSTART.md for rapid deployment
- Add docker-run.sh helper script for container management
- Update Makefile with Docker management commands

Features:
- Full GPU support with NVIDIA Docker runtime
- Single-GPU and multi-GPU (FSDP + xDiT) configurations
- Memory optimization flags for consumer GPUs (8GB+)
- Gradio web interface support on port 7860
- Volume mounts for models, outputs, and cache
- Comprehensive troubleshooting and optimization guides
- Production-ready security best practices

The Docker setup supports all Wan2.1 models (T2V, I2V, FLF2V, VACE)
and includes both 1.3B (consumer GPU) and 14B (high-end GPU) variants.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 03:06:50 +00:00

664 lines
15 KiB
Markdown

# Wan2.1 Docker Setup Guide
Professional-grade instructions for running Wan2.1 video generation models in Docker containers with GPU support.
---
## Table of Contents
- [Prerequisites](#prerequisites)
- [System Requirements](#system-requirements)
- [Installation Steps](#installation-steps)
- [Quick Start](#quick-start)
- [Model Download](#model-download)
- [Running Inference](#running-inference)
- [Gradio Web Interface](#gradio-web-interface)
- [Advanced Configuration](#advanced-configuration)
- [Troubleshooting](#troubleshooting)
- [Performance Optimization](#performance-optimization)
---
## Prerequisites
### Required Software
1. **Docker Engine** (version 20.10+)
- [Installation Guide](https://docs.docker.com/engine/install/)
2. **NVIDIA Docker Runtime** (for GPU support)
- Required for GPU acceleration
- [Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
3. **NVIDIA Drivers** (version 525.60.13+)
- CUDA 12.1 compatible drivers
- Check with: `nvidia-smi`
4. **Docker Compose** (version 2.0+)
- Typically included with Docker Desktop
- [Installation Guide](https://docs.docker.com/compose/install/)
### Optional Software
- **Git** - For cloning the repository
- **Make** - For using convenience commands
- **NVIDIA Container Toolkit** - For multi-GPU support
---
## System Requirements
### Minimum Requirements (T2V-1.3B at 480P)
- **GPU**: NVIDIA GPU with 8GB+ VRAM (e.g., RTX 4060 Ti)
- **RAM**: 16GB system memory
- **Storage**: 50GB free space (for models and cache)
- **OS**: Linux (Ubuntu 20.04+), Windows 10/11 with WSL2
### Recommended Requirements (T2V-14B at 720P)
- **GPU**: NVIDIA GPU with 24GB+ VRAM (e.g., RTX 4090, A5000)
- **RAM**: 32GB+ system memory
- **Storage**: 100GB+ free space
- **OS**: Linux (Ubuntu 22.04+)
### Multi-GPU Setup (for 8x GPU)
- **GPUs**: 8x NVIDIA GPUs (A100, H100, etc.)
- **RAM**: 128GB+ system memory
- **Storage**: 200GB+ free space
- **Network**: High-bandwidth GPU interconnect (NVLink preferred)
---
## Installation Steps
### Step 1: Verify Docker and NVIDIA Runtime
```bash
# Check Docker installation
docker --version
docker compose version
# Check NVIDIA driver
nvidia-smi
# Test NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi
```
**Expected output**: You should see your GPU(s) listed in the nvidia-smi output.
### Step 2: Clone the Repository
```bash
git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1
```
### Step 3: Create Required Directories
```bash
# Create directories for models, outputs, and cache
mkdir -p models outputs cache examples
```
### Step 4: Set Environment Variables (Optional)
For prompt extension with Dashscope API:
```bash
# Create a .env file
cat > .env << EOF
DASH_API_KEY=your_dashscope_api_key_here
DASH_API_URL=https://dashscope.aliyuncs.com/api/v1
EOF
```
For international Alibaba Cloud users:
```bash
DASH_API_URL=https://dashscope-intl.aliyuncs.com/api/v1
```
### Step 5: Build the Docker Image
```bash
# Build using Docker Compose (recommended)
docker compose build
# OR build manually
docker build -t wan2.1:latest .
```
**Build time**: Approximately 10-20 minutes depending on your internet connection.
---
## Quick Start
### Option 1: Using Docker Compose (Recommended)
```bash
# Start the container with GPU support
docker compose up -d wan2-1
# Check container status
docker compose ps
# View logs
docker compose logs -f wan2-1
# Access the container shell
docker compose exec wan2-1 bash
```
### Option 2: Using Docker Run
```bash
docker run -it --gpus all \
--name wan2.1-container \
-v $(pwd)/models:/app/models \
-v $(pwd)/outputs:/app/outputs \
-v $(pwd)/cache:/app/cache \
-p 7860:7860 \
--shm-size=16g \
wan2.1:latest bash
```
### For CPU-only Mode
```bash
# Using Docker Compose
docker compose --profile cpu up -d wan2-1-cpu
# Using Docker Run
docker run -it \
--name wan2.1-cpu \
-e CUDA_VISIBLE_DEVICES="" \
-v $(pwd)/models:/app/models \
-v $(pwd)/outputs:/app/outputs \
-v $(pwd)/cache:/app/cache \
-p 7860:7860 \
wan2.1:latest bash
```
---
## Model Download
Download models **before** running inference. Models should be placed in the `./models` directory.
### Using Hugging Face CLI (Inside Container)
```bash
# Enter the container
docker compose exec wan2-1 bash
# Download T2V-14B model
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir /app/models/Wan2.1-T2V-14B
# Download T2V-1.3B model
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir /app/models/Wan2.1-T2V-1.3B
# Download I2V-14B-720P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir /app/models/Wan2.1-I2V-14B-720P
# Download I2V-14B-480P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir /app/models/Wan2.1-I2V-14B-480P
# Download FLF2V-14B model
huggingface-cli download Wan-AI/Wan2.1-FLF2V-14B-720P --local-dir /app/models/Wan2.1-FLF2V-14B-720P
# Download VACE models
huggingface-cli download Wan-AI/Wan2.1-VACE-1.3B --local-dir /app/models/Wan2.1-VACE-1.3B
huggingface-cli download Wan-AI/Wan2.1-VACE-14B --local-dir /app/models/Wan2.1-VACE-14B
```
### Using ModelScope (Alternative for Chinese Users)
```bash
pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B --local_dir /app/models/Wan2.1-T2V-14B
```
### Download from Host Machine
You can also download models on your host machine and they will be accessible in the container:
```bash
# On host machine (outside Docker)
cd Wan2.1/models
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./Wan2.1-T2V-1.3B
```
---
## Running Inference
All commands below should be run **inside the container**.
### Text-to-Video Generation
#### 1.3B Model (480P) - Consumer GPU Friendly
```bash
python generate.py \
--task t2v-1.3B \
--size 832*480 \
--ckpt_dir /app/models/Wan2.1-T2V-1.3B \
--offload_model True \
--t5_cpu \
--sample_shift 8 \
--sample_guide_scale 6 \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
```
#### 14B Model (720P) - High-End GPU
```bash
python generate.py \
--task t2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
```
#### With Prompt Extension (Better Quality)
```bash
# Using local Qwen model
python generate.py \
--task t2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--use_prompt_extend \
--prompt_extend_method 'local_qwen' \
--prompt "A beautiful sunset over the ocean"
# Using Dashscope API (requires DASH_API_KEY)
DASH_API_KEY=your_key python generate.py \
--task t2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--use_prompt_extend \
--prompt_extend_method 'dashscope' \
--prompt "A beautiful sunset over the ocean"
```
### Image-to-Video Generation
```bash
python generate.py \
--task i2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-I2V-14B-720P \
--image /app/examples/i2v_input.JPG \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard."
```
### First-Last-Frame-to-Video
```bash
python generate.py \
--task flf2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-FLF2V-14B-720P \
--first_frame /app/examples/flf2v_input_first_frame.png \
--last_frame /app/examples/flf2v_input_last_frame.png \
--prompt "CG animation style, a small blue bird takes off from the ground"
```
### Text-to-Image Generation
```bash
python generate.py \
--task t2i-14B \
--size 1024*1024 \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--prompt "A serene mountain landscape at dawn"
```
### VACE (Video Creation and Editing)
```bash
python generate.py \
--task vace-1.3B \
--size 832*480 \
--ckpt_dir /app/models/Wan2.1-VACE-1.3B \
--src_ref_images /app/examples/girl.png,/app/examples/snake.png \
--prompt "Your detailed prompt here"
```
---
## Gradio Web Interface
### Start Gradio Interface
#### Text-to-Video (14B)
```bash
cd gradio
python t2v_14B_singleGPU.py \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--prompt_extend_method 'local_qwen'
```
#### Image-to-Video (14B)
```bash
cd gradio
python i2v_14B_singleGPU.py \
--ckpt_dir_720p /app/models/Wan2.1-I2V-14B-720P \
--prompt_extend_method 'local_qwen'
```
#### VACE (All-in-One)
```bash
cd gradio
python vace.py --ckpt_dir /app/models/Wan2.1-VACE-1.3B
```
### Access the Web Interface
1. Open your web browser
2. Navigate to: `http://localhost:7860`
3. Use the intuitive interface to generate videos
### For Remote Access
If running on a remote server:
```bash
# Start with public URL (Gradio share feature)
python gradio/t2v_14B_singleGPU.py \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--server_name 0.0.0.0 \
--server_port 7860 \
--share
```
Then access via: `http://your-server-ip:7860`
---
## Advanced Configuration
### Multi-GPU Inference (FSDP + xDiT)
For 8-GPU setup using Ulysses or Ring attention strategies:
```bash
# Install xDiT
pip install "xfuser>=0.4.1"
# Run with Ulysses strategy (8 GPUs)
torchrun --nproc_per_node=8 generate.py \
--task t2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--dit_fsdp \
--t5_fsdp \
--ulysses_size 8 \
--prompt "Your prompt here"
# Run with Ring strategy (for sequence parallelism)
torchrun --nproc_per_node=8 generate.py \
--task t2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--dit_fsdp \
--t5_fsdp \
--ring_size 8 \
--prompt "Your prompt here"
```
### Memory Optimization Flags
For limited VRAM:
```bash
python generate.py \
--task t2v-1.3B \
--size 832*480 \
--ckpt_dir /app/models/Wan2.1-T2V-1.3B \
--offload_model True \ # Offload model to CPU when not in use
--t5_cpu \ # Keep T5 encoder on CPU
--sample_shift 8 \
--sample_guide_scale 6 \
--prompt "Your prompt"
```
### Custom Output Directory
```bash
python generate.py \
--task t2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--output_dir /app/outputs/my_generation \
--prompt "Your prompt"
```
### Batch Generation
Generate multiple variations:
```bash
python generate.py \
--task t2v-14B \
--size 1280*720 \
--ckpt_dir /app/models/Wan2.1-T2V-14B \
--base_seed 0 \
--num_samples 4 \ # Generate 4 variations
--prompt "Your prompt"
```
---
## Troubleshooting
### Issue: "CUDA out of memory"
**Solutions:**
1. Use smaller model (1.3B instead of 14B)
2. Reduce resolution (480P instead of 720P)
3. Enable memory optimization flags:
```bash
--offload_model True --t5_cpu
```
4. Increase Docker shared memory:
```bash
docker run --shm-size=32g ...
```
### Issue: "nvidia-smi not found" inside container
**Solutions:**
1. Verify NVIDIA Docker runtime is installed on host
2. Check Docker daemon configuration:
```bash
# Edit /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
```
3. Restart Docker daemon:
```bash
sudo systemctl restart docker
```
### Issue: "Flash attention installation failed"
**Solution:**
Flash attention is optional. The Dockerfile continues even if it fails. For better performance, install manually:
```bash
# Inside container
pip install flash-attn --no-build-isolation
```
### Issue: Model download fails
**Solutions:**
1. Check internet connection
2. Use mirror sites (ModelScope for Chinese users)
3. Download models on host machine and mount them
4. Increase Docker download timeout
### Issue: "RuntimeError: CUDA error: device-side assert triggered"
**Solutions:**
1. Check CUDA compatibility:
```bash
python -c "import torch; print(torch.cuda.is_available())"
```
2. Update NVIDIA drivers
3. Rebuild Docker image with matching CUDA version
### Issue: Gradio interface not accessible
**Solutions:**
1. Check if port is exposed:
```bash
docker ps | grep 7860
```
2. Ensure firewall allows port 7860
3. Try binding to all interfaces:
```bash
python gradio/app.py --server_name 0.0.0.0
```
### Issue: Permission denied errors
**Solution:**
```bash
# Fix ownership of mounted volumes
sudo chown -R $(id -u):$(id -g) models outputs cache
```
---
## Performance Optimization
### 1. Use SSD Storage
- Store models and cache on SSD for faster loading
- Use NVMe for best performance
### 2. Increase Shared Memory
```bash
# In docker-compose.yml
shm_size: '32gb'
```
### 3. Use Mixed Precision
- The model uses bfloat16 by default (optimal for modern GPUs)
### 4. Enable Xformers (if available)
```bash
pip install xformers
```
### 5. Multi-GPU Best Practices
- Use NVLink/NVSwitch for GPU communication
- Balance model sharding with Ulysses + Ring strategies
- Monitor GPU utilization: `watch -n 1 nvidia-smi`
### 6. Optimize Inference Parameters
```bash
# For T2V-1.3B
--sample_shift 8 \ # Adjust 8-12 based on quality
--sample_guide_scale 6 # Lower = faster, higher = better quality
# For T2V-14B
--sample_guide_scale 5.0 # Default recommended
```
### 7. Use Persistent Cache
```bash
# Models and transformers will be cached in ./cache
# Reusing the cache speeds up subsequent runs
```
---
## Container Management
### Stop Container
```bash
docker compose down
```
### Restart Container
```bash
docker compose restart wan2-1
```
### View Logs
```bash
docker compose logs -f wan2-1
```
### Clean Up
```bash
# Remove containers
docker compose down -v
# Remove images
docker rmi wan2.1:latest
# Clean up Docker system
docker system prune -a
```
### Update Container
```bash
# Pull latest code
git pull origin main
# Rebuild image
docker compose build --no-cache
# Restart containers
docker compose up -d
```
---
## Security Best Practices
1. **Do not commit API keys** to version control
2. **Use .env files** for sensitive environment variables
3. **Limit container privileges**: Avoid running as root
4. **Keep Docker updated** for security patches
5. **Scan images** for vulnerabilities:
```bash
docker scan wan2.1:latest
```
---
## Support and Resources
- **GitHub Issues**: [https://github.com/Wan-Video/Wan2.1/issues](https://github.com/Wan-Video/Wan2.1/issues)
- **Discord**: [Join the community](https://discord.gg/AKNgpMK4Yj)
- **Technical Report**: [arXiv:2503.20314](https://arxiv.org/abs/2503.20314)
- **Docker Documentation**: [https://docs.docker.com/](https://docs.docker.com/)
- **NVIDIA Container Toolkit**: [https://github.com/NVIDIA/nvidia-docker](https://github.com/NVIDIA/nvidia-docker)
---
## License
This Docker setup follows the same Apache 2.0 License as the Wan2.1 project. See [LICENSE.txt](LICENSE.txt) for details.
---
**Last Updated**: 2025-10-26
**Version**: 1.0.0
**Maintainer**: Wan2.1 Community