- Introduced `--cpu_offload` argument in `generate.py` for enabling CPU offload.
- Updated `WanI2V` class in `image2video.py` to handle CPU offload during model initialization and sharding.
- Added new functions in `fsdp.py` for CPU initialization and sharding with CPU offload.
- Expanded supported sizes in `configs/__init__.py` to include additional resolutions.
* Update text2video.py to reduce GPU memory by emptying cache
If offload_model is set, empty_cache() must be called after the model is moved to CPU to actually free the GPU. I verified on a RTX 4090 that without calling empty_cache the model remains in memory and the subsequent vae decoding never finishes.
* Update text2video.py only one empty_cache needed before vae decode