mirror of
				https://github.com/Wan-Video/Wan2.1.git
				synced 2025-11-04 06:15:17 +00:00 
			
		
		
		
	Add --vae_cpu argument to enable VAE offloading for consumer GPUs with limited VRAM. When enabled, VAE initializes on CPU and moves to GPU only when needed for encoding/decoding operations. Key changes: - Add --vae_cpu argument to generate.py (mirrors --t5_cpu pattern) - Update all 4 pipelines (T2V, I2V, FLF2V, VACE) with conditional VAE offloading - Fix DiT offloading to free VRAM before T5 loading when offload_model=True - Handle VAE scale tensors (mean/std) during device transfers Benefits: - Saves ~100-200MB VRAM without performance degradation - Enables T2V-1.3B on more consumer GPUs (tested on 11.49GB GPU) - Backward compatible (default=False) - Consistent with existing --t5_cpu flag Test results on 11.49 GiB VRAM GPU: - Baseline: OOM (needed 80MB, only 85MB free) - With --vae_cpu: Success - With --t5_cpu: Success - With both flags: Success (maximum VRAM savings) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
		
			
				
	
	
		
			6 lines
		
	
	
		
			362 B
		
	
	
	
		
			Bash
		
	
	
		
			Executable File
		
	
	
	
	
			
		
		
	
	
			6 lines
		
	
	
		
			362 B
		
	
	
	
		
			Bash
		
	
	
		
			Executable File
		
	
	
	
	
#!/usr/bin/bash
 | 
						|
# Test 2: VAE CPU offloading only
 | 
						|
echo "=== TEST 2: VAE offloading enabled (--vae_cpu) ==="
 | 
						|
echo "Expected: Success - should save 100-200MB VRAM"
 | 
						|
python ../generate.py --task t2v-1.3B --size 480*832 --ckpt_dir ./t2v-1.3b --offload_model True --vae_cpu --prompt "happy the dwarf and sneezy the dwarf wrestle to the death at madison square garden"
 |