Wan2.1/DOCKER_SETUP.md

# Wan2.1 Docker Setup Guide

Professional-grade instructions for running Wan2.1 video generation models in Docker containers with GPU support.

---

## Table of Contents
- [Prerequisites](#prerequisites)
- [System Requirements](#system-requirements)
- [Installation Steps](#installation-steps)
- [Quick Start](#quick-start)
- [Model Download](#model-download)
- [Running Inference](#running-inference)
- [Gradio Web Interface](#gradio-web-interface)
- [Advanced Configuration](#advanced-configuration)
- [Troubleshooting](#troubleshooting)
- [Performance Optimization](#performance-optimization)

---

## Prerequisites

### Required Software

1. **Docker Engine** (version 20.10+)
   - [Installation Guide](https://docs.docker.com/engine/install/)

2. **NVIDIA Docker Runtime** (for GPU support)
   - Required for GPU acceleration
   - [Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)

3. **NVIDIA Drivers** (version 525.60.13+)
   - CUDA 12.1 compatible drivers
   - Check with: `nvidia-smi`

4. **Docker Compose** (version 2.0+)
   - Typically included with Docker Desktop
   - [Installation Guide](https://docs.docker.com/compose/install/)

### Optional Software

- **Git** - For cloning the repository
- **Make** - For using convenience commands
- **NVIDIA Container Toolkit** - For multi-GPU support

---

## System Requirements

### Minimum Requirements (T2V-1.3B at 480P)
- **GPU**: NVIDIA GPU with 8GB+ VRAM (e.g., RTX 4060 Ti)
- **RAM**: 16GB system memory
- **Storage**: 50GB free space (for models and cache)
- **OS**: Linux (Ubuntu 20.04+), Windows 10/11 with WSL2

### Recommended Requirements (T2V-14B at 720P)
- **GPU**: NVIDIA GPU with 24GB+ VRAM (e.g., RTX 4090, A5000)
- **RAM**: 32GB+ system memory
- **Storage**: 100GB+ free space
- **OS**: Linux (Ubuntu 22.04+)

### Multi-GPU Setup (for 8x GPU)
- **GPUs**: 8x NVIDIA GPUs (A100, H100, etc.)
- **RAM**: 128GB+ system memory
- **Storage**: 200GB+ free space
- **Network**: High-bandwidth GPU interconnect (NVLink preferred)

---

## Installation Steps

### Step 1: Verify Docker and NVIDIA Runtime

```bash
# Check Docker installation
docker --version
docker compose version

# Check NVIDIA driver
nvidia-smi

# Test NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi
```

**Expected output**: You should see your GPU(s) listed in the nvidia-smi output.

### Step 2: Clone the Repository

```bash
git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1
```

### Step 3: Create Required Directories

```bash
# Create directories for models, outputs, and cache
mkdir -p models outputs cache examples
```

### Step 4: Set Environment Variables (Optional)

For prompt extension with Dashscope API:

```bash
# Create a .env file
cat > .env << EOF
DASH_API_KEY=your_dashscope_api_key_here
DASH_API_URL=https://dashscope.aliyuncs.com/api/v1
EOF
```

For international Alibaba Cloud users:
```bash
DASH_API_URL=https://dashscope-intl.aliyuncs.com/api/v1
```

### Step 5: Build the Docker Image

```bash
# Build using Docker Compose (recommended)
docker compose build

# OR build manually
docker build -t wan2.1:latest .
```

**Build time**: Approximately 10-20 minutes depending on your internet connection.

---

## Quick Start

### Option 1: Using Docker Compose (Recommended)

```bash
# Start the container with GPU support
docker compose up -d wan2-1

# Check container status
docker compose ps

# View logs
docker compose logs -f wan2-1

# Access the container shell
docker compose exec wan2-1 bash
```

### Option 2: Using Docker Run

```bash
docker run -it --gpus all \
  --name wan2.1-container \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/cache:/app/cache \
  -p 7860:7860 \
  --shm-size=16g \
  wan2.1:latest bash
```

### For CPU-only Mode

```bash
# Using Docker Compose
docker compose --profile cpu up -d wan2-1-cpu

# Using Docker Run
docker run -it \
  --name wan2.1-cpu \
  -e CUDA_VISIBLE_DEVICES="" \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/cache:/app/cache \
  -p 7860:7860 \
  wan2.1:latest bash
```

---

## Model Download

Download models **before** running inference. Models should be placed in the `./models` directory.

### Using Hugging Face CLI (Inside Container)

```bash
# Enter the container
docker compose exec wan2-1 bash

# Download T2V-14B model
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir /app/models/Wan2.1-T2V-14B

# Download T2V-1.3B model
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir /app/models/Wan2.1-T2V-1.3B

# Download I2V-14B-720P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir /app/models/Wan2.1-I2V-14B-720P

# Download I2V-14B-480P model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir /app/models/Wan2.1-I2V-14B-480P

# Download FLF2V-14B model
huggingface-cli download Wan-AI/Wan2.1-FLF2V-14B-720P --local-dir /app/models/Wan2.1-FLF2V-14B-720P

# Download VACE models
huggingface-cli download Wan-AI/Wan2.1-VACE-1.3B --local-dir /app/models/Wan2.1-VACE-1.3B
huggingface-cli download Wan-AI/Wan2.1-VACE-14B --local-dir /app/models/Wan2.1-VACE-14B
```

### Using ModelScope (Alternative for Chinese Users)

```bash
pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B --local_dir /app/models/Wan2.1-T2V-14B
```

### Download from Host Machine

You can also download models on your host machine and they will be accessible in the container:

```bash
# On host machine (outside Docker)
cd Wan2.1/models
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./Wan2.1-T2V-1.3B
```

---

## Running Inference

All commands below should be run **inside the container**.

### Text-to-Video Generation

#### 1.3B Model (480P) - Consumer GPU Friendly

```bash
python generate.py \
  --task t2v-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-T2V-1.3B \
  --offload_model True \
  --t5_cpu \
  --sample_shift 8 \
  --sample_guide_scale 6 \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
```

#### 14B Model (720P) - High-End GPU

```bash
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
```

#### With Prompt Extension (Better Quality)

```bash
# Using local Qwen model
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --use_prompt_extend \
  --prompt_extend_method 'local_qwen' \
  --prompt "A beautiful sunset over the ocean"

# Using Dashscope API (requires DASH_API_KEY)
DASH_API_KEY=your_key python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --use_prompt_extend \
  --prompt_extend_method 'dashscope' \
  --prompt "A beautiful sunset over the ocean"
```

### Image-to-Video Generation

```bash
python generate.py \
  --task i2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-I2V-14B-720P \
  --image /app/examples/i2v_input.JPG \
  --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard."
```

### First-Last-Frame-to-Video

```bash
python generate.py \
  --task flf2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-FLF2V-14B-720P \
  --first_frame /app/examples/flf2v_input_first_frame.png \
  --last_frame /app/examples/flf2v_input_last_frame.png \
  --prompt "CG animation style, a small blue bird takes off from the ground"
```

### Text-to-Image Generation

```bash
python generate.py \
  --task t2i-14B \
  --size 1024*1024 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt "A serene mountain landscape at dawn"
```

### VACE (Video Creation and Editing)

```bash
python generate.py \
  --task vace-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-VACE-1.3B \
  --src_ref_images /app/examples/girl.png,/app/examples/snake.png \
  --prompt "Your detailed prompt here"
```

---

## Gradio Web Interface

### Start Gradio Interface

#### Text-to-Video (14B)

```bash
cd gradio
python t2v_14B_singleGPU.py \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --prompt_extend_method 'local_qwen'
```

#### Image-to-Video (14B)

```bash
cd gradio
python i2v_14B_singleGPU.py \
  --ckpt_dir_720p /app/models/Wan2.1-I2V-14B-720P \
  --prompt_extend_method 'local_qwen'
```

#### VACE (All-in-One)

```bash
cd gradio
python vace.py --ckpt_dir /app/models/Wan2.1-VACE-1.3B
```

### Access the Web Interface

1. Open your web browser
2. Navigate to: `http://localhost:7860`
3. Use the intuitive interface to generate videos

### For Remote Access

If running on a remote server:

```bash
# Start with public URL (Gradio share feature)
python gradio/t2v_14B_singleGPU.py \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --server_name 0.0.0.0 \
  --server_port 7860 \
  --share
```

Then access via: `http://your-server-ip:7860`

---

## Advanced Configuration

### Multi-GPU Inference (FSDP + xDiT)

For 8-GPU setup using Ulysses or Ring attention strategies:

```bash
# Install xDiT
pip install "xfuser>=0.4.1"

# Run with Ulysses strategy (8 GPUs)
torchrun --nproc_per_node=8 generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --dit_fsdp \
  --t5_fsdp \
  --ulysses_size 8 \
  --prompt "Your prompt here"

# Run with Ring strategy (for sequence parallelism)
torchrun --nproc_per_node=8 generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --dit_fsdp \
  --t5_fsdp \
  --ring_size 8 \
  --prompt "Your prompt here"
```

### Memory Optimization Flags

For limited VRAM:

```bash
python generate.py \
  --task t2v-1.3B \
  --size 832*480 \
  --ckpt_dir /app/models/Wan2.1-T2V-1.3B \
  --offload_model True \  # Offload model to CPU when not in use
  --t5_cpu \               # Keep T5 encoder on CPU
  --sample_shift 8 \
  --sample_guide_scale 6 \
  --prompt "Your prompt"
```

### Custom Output Directory

```bash
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --output_dir /app/outputs/my_generation \
  --prompt "Your prompt"
```

### Batch Generation

Generate multiple variations:

```bash
python generate.py \
  --task t2v-14B \
  --size 1280*720 \
  --ckpt_dir /app/models/Wan2.1-T2V-14B \
  --base_seed 0 \
  --num_samples 4 \  # Generate 4 variations
  --prompt "Your prompt"
```

---

## Troubleshooting

### Issue: "CUDA out of memory"

**Solutions:**
1. Use smaller model (1.3B instead of 14B)
2. Reduce resolution (480P instead of 720P)
3. Enable memory optimization flags:
   ```bash
   --offload_model True --t5_cpu
   ```
4. Increase Docker shared memory:
   ```bash
   docker run --shm-size=32g ...
   ```

### Issue: "nvidia-smi not found" inside container

**Solutions:**
1. Verify NVIDIA Docker runtime is installed on host
2. Check Docker daemon configuration:
   ```bash
   # Edit /etc/docker/daemon.json
   {
     "runtimes": {
       "nvidia": {
         "path": "nvidia-container-runtime",
         "runtimeArgs": []
       }
     },
     "default-runtime": "nvidia"
   }
   ```
3. Restart Docker daemon:
   ```bash
   sudo systemctl restart docker
   ```

### Issue: "Flash attention installation failed"

**Solution:**
Flash attention is optional. The Dockerfile continues even if it fails. For better performance, install manually:

```bash
# Inside container
pip install flash-attn --no-build-isolation
```

### Issue: Model download fails

**Solutions:**
1. Check internet connection
2. Use mirror sites (ModelScope for Chinese users)
3. Download models on host machine and mount them
4. Increase Docker download timeout

### Issue: "RuntimeError: CUDA error: device-side assert triggered"

**Solutions:**
1. Check CUDA compatibility:
   ```bash
   python -c "import torch; print(torch.cuda.is_available())"
   ```
2. Update NVIDIA drivers
3. Rebuild Docker image with matching CUDA version

### Issue: Gradio interface not accessible

**Solutions:**
1. Check if port is exposed:
   ```bash
   docker ps | grep 7860
   ```
2. Ensure firewall allows port 7860
3. Try binding to all interfaces:
   ```bash
   python gradio/app.py --server_name 0.0.0.0
   ```

### Issue: Permission denied errors

**Solution:**
```bash
# Fix ownership of mounted volumes
sudo chown -R $(id -u):$(id -g) models outputs cache
```

---

## Performance Optimization

### 1. Use SSD Storage
- Store models and cache on SSD for faster loading
- Use NVMe for best performance

### 2. Increase Shared Memory
```bash
# In docker-compose.yml
shm_size: '32gb'
```

### 3. Use Mixed Precision
- The model uses bfloat16 by default (optimal for modern GPUs)

### 4. Enable Xformers (if available)
```bash
pip install xformers
```

### 5. Multi-GPU Best Practices
- Use NVLink/NVSwitch for GPU communication
- Balance model sharding with Ulysses + Ring strategies
- Monitor GPU utilization: `watch -n 1 nvidia-smi`

### 6. Optimize Inference Parameters
```bash
# For T2V-1.3B
--sample_shift 8 \        # Adjust 8-12 based on quality
--sample_guide_scale 6    # Lower = faster, higher = better quality

# For T2V-14B
--sample_guide_scale 5.0  # Default recommended
```

### 7. Use Persistent Cache
```bash
# Models and transformers will be cached in ./cache
# Reusing the cache speeds up subsequent runs
```

---

## Container Management

### Stop Container
```bash
docker compose down
```

### Restart Container
```bash
docker compose restart wan2-1
```

### View Logs
```bash
docker compose logs -f wan2-1
```

### Clean Up
```bash
# Remove containers
docker compose down -v

# Remove images
docker rmi wan2.1:latest

# Clean up Docker system
docker system prune -a
```

### Update Container
```bash
# Pull latest code
git pull origin main

# Rebuild image
docker compose build --no-cache

# Restart containers
docker compose up -d
```

---

## Security Best Practices

1. **Do not commit API keys** to version control
2. **Use .env files** for sensitive environment variables
3. **Limit container privileges**: Avoid running as root
4. **Keep Docker updated** for security patches
5. **Scan images** for vulnerabilities:
   ```bash
   docker scan wan2.1:latest
   ```

---

## Support and Resources

- **GitHub Issues**: [https://github.com/Wan-Video/Wan2.1/issues](https://github.com/Wan-Video/Wan2.1/issues)
- **Discord**: [Join the community](https://discord.gg/AKNgpMK4Yj)
- **Technical Report**: [arXiv:2503.20314](https://arxiv.org/abs/2503.20314)
- **Docker Documentation**: [https://docs.docker.com/](https://docs.docker.com/)
- **NVIDIA Container Toolkit**: [https://github.com/NVIDIA/nvidia-docker](https://github.com/NVIDIA/nvidia-docker)

---

## License

This Docker setup follows the same Apache 2.0 License as the Wan2.1 project. See [LICENSE.txt](LICENSE.txt) for details.

---

**Last Updated**: 2025-10-26
**Version**: 1.0.0
**Maintainer**: Wan2.1 Community