v6

2025-12-15 11:43:21 +00:00 · 2025-06-12 22:42:36 +02:00 · 2025-06-12 22:42:36 +02:00 · 56261f3cec
commit 56261f3cec
parent 15da9cd980
16 changed files with 278 additions and 244 deletions
--- a/README.md
+++ b/README.md
@ -20,14 +20,19 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
 **Follow DeepBeepMeep on Twitter/X to get the Latest News**: https://x.com/deepbeepmeep

 ## 🔥 Latest Updates
-### June 12 2025: WanGP v5.6
-👋 *Finetune models*: You find the 20 models supported by WanGP not sufficient ? Too impatient to wait for the next release to get the support for a newly released model ? Your prayers have been answered: if a new model is compatible with a model architecture supported by WanGP, you can add yourself the support for this model in WanGP by just creating Finetune model definition. You can then store this model in the cloud (for instance in Huggingface) and the very light finetune definition file can be easily shared with other users. WanGP will download automatically the finetuned model for them.
+### June 12 2025: WanGP v6.0
+👋 *Finetune models*: You find the 20 models supported by WanGP not sufficient ? Too impatient to wait for the next release to get the support for a newly released model ? Your prayers have been answered: if a new model is compatible with a model architecture supported by WanGP, you can add ny yourself the support for this model in WanGP by just creating a finetune model definition. You can then store this model in the cloud (for instance in Huggingface) and the very light finetune definition file can be easily shared with other users. WanGP will download automatically the finetuned model for them.

-To celebrate this new feature, I have provided 4 finetuned model definitions:
+To celebrate the new finetunes support, here are a few finetune gifts (directly accessible from the model selection menu):
 - *Fast Hunyuan Video* : generate model t2v in only 6 steps
 - *Hunyuan Vido AccVideo* : generate model t2v in only 5 steps
 - *Wan FusioniX*: it is a combo of AccVideo / CausVid ans other models and can generate high quality Wan videos in only 8 steps
- *Vace FusioniX*: the ultimate Vace model, as it is a combo of Vace / AccVideo / CausVid ans other models and can generate high quality Wan Controled videos in only 10 steps
+
+One more thing...
+
+The new finetune system can be used to combine complementaty models : what happens when you combine  Fusionix Text2Video and Vace Control Net ?
+
+You get **Vace FusioniX**: the Ultimate Vace Model, Fast (10 steps, no need for guidance) and with a much better quality Video than the original slower model (despite being the best Control Net out there). Here goes one more finetune...

 Check the *Finetune Guide* to create finetune models definitions and share them on the WanGP discord server.

--- a/configs/fantasy.json
+++ b/configs/fantasy.json
@ -0,0 +1,15 @@
+{
+  "_class_name": "WanModel",
+  "_diffusers_version": "0.30.0",
+  "dim": 5120,
+  "eps": 1e-06,
+  "ffn_dim": 13824,
+  "freq_dim": 256,
+  "in_dim": 36,
+  "model_type": "i2v",
+  "num_heads": 40,
+  "num_layers": 40,
+  "out_dim": 16,
+  "text_len": 512,
+  "fantasytalking_dim": 2048
+}
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@ -1,6 +1,22 @@
 # Changelog

 ## 🔥 Latest News
+### June 12 2025: WanGP v5.6
+👋 *Finetune models*: You find the 20 models supported by WanGP not sufficient ? Too impatient to wait for the next release to get the support for a newly released model ? Your prayers have been answered: if a new model is compatible with a model architecture supported by WanGP, you can add by yourself the support for this model in WanGP by just creating a finetune model definition. You can then store this model in the cloud (for instance in Huggingface) and the very light finetune definition file can be easily shared with other users. WanGP will download automatically the finetuned model for them.
+
+To celebrate the new finetunes support, here are a few finetune gifts (directly accessible from the model selection menu):
+- *Fast Hunyuan Video* : generate model t2v in only 6 steps
+- *Hunyuan Vido AccVideo* : generate model t2v in only 5 steps
+- *Wan FusioniX*: it is a combo of AccVideo / CausVid ans other models and can generate high quality Wan videos in only 8 steps
+
+One more thing...
+
+The new finetune system can be used to combine complementaty models : what happens when you combine  Fusionix Text2Video and Vace Control Net ?
+
+You get **Vace FusioniX**: the Ultimate Vace Model, Fast (10 steps, no need for guidance) and with a much better quality Video than the original slower model (despite being the best Control Net out there). Here goes one more finetune...
+
+Check the *Finetune Guide* to create finetune models definitions and share them on the WanGP discord server.
+
 ### June 11 2025: WanGP v5.5
 👋 *Hunyuan Video Custom Audio*: it is similar to Hunyuan Video Avatar excpet there isn't any lower limit on the number of frames and you can use your reference images in a different context than the image itself\
 *Hunyuan Video Custom Edit*: Hunyuan Video Controlnet, use it to do inpainting and replace a person in a video while still keeping his poses. Similar to Vace but less restricted than the Wan models in terms of content...
--- a/docs/FINETUNES.md
+++ b/docs/FINETUNES.md
@ -25,8 +25,8 @@ Here are steps:
 3) Save this file in the subfolder **finetunes**. The name used for the file will be used as its id. It is a good practise to prefix the name of this file with the base model. For instance for a finetune named **Fast*** based on  Hunyuan Text 2 Video model *hunyuan_t2v_fast.json*. In this example the Id is *hunyuan_t2v_fast*.
 4) Restart WanGP

-## Base Models Ids
-A finetune is derived from a base model and will inherit all the user interface and corresponding model capabilities, here are the Ids:
+## Architecture Models Ids
+A finetune is derived from a base model and will inherit all the user interface and corresponding model capabilities, here are Architecture Ids:
 - *t2v*: Wan 2.1 Video text 2 
 - *i2v*: Wan 2.1 Video image 2 480p
 - *i2v_720p*: Wan 2.1 Video image 2 720p
@ -36,9 +36,10 @@ A finetune is derived from a base model and will inherit all the user interface

 ## The Model Subtree
 - *name* : name of the finetune used to select
- *base* : Id of the base model of the finetune (see previous section)
+- *architecture* : architecture Id of the base model of the finetune (see previous section)
 - *description*: description of the finetune that will appear at the top
 - *URLs*: URLs of all the finetune versions (quantized / non quantized). WanGP will pick the version that is the closest to the user preferences. You will need to follow a naming convention to help WanGP identify the content of each version (see next section). Right now WanGP supports only 8 bits quantized model that have been quantized using **quanto**. WanGP offers a command switch to build easily such a quantized model (see below). *URLs* can contain also paths to local file to allow testing.
+- *modules*: this a list of modules to be combined with the models referenced by the URLs. A module is a model extension that is merged with a model to expand its capabilities. So far the only module supported is Vace 14B  (its id is *vace_14B*). For instance the full Vace model is the fusion of a Wan text 2 video and the Vace module.
 - *preload_URLs* : URLs of files to download no matter what (used to load quantization maps for instance)
 - *auto_quantize*: if set to True and no quantized model URL is provided, WanGP will perform on the fly quantization if the user expects a quantized model

@ -47,7 +48,7 @@ Example of **model** subtree
 	"model":
 	{
 		"name": "Wan text2video FusioniX 14B",
-		"base" : "t2v",
+		"architecture" : "t2v",
 		"description": "A powerful merged text-to-video model based on the original WAN 2.1 T2V model, enhanced using multiple open-source components and LoRAs to boost motion realism, temporal consistency, and expressive detail. multiple open-source models and LoRAs to boost temporal quality, expressiveness, and motion realism.",
 		"URLs": [
 			"https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/Wan14BT2VFusioniX_fp16.safetensors",
--- a/finetunes/hunyuan_t2v_accvideo.json
+++ b/finetunes/hunyuan_t2v_accvideo.json
@ -1,7 +1,7 @@
 {
  "model": {
    "name": "Hunyuan AccVideo 720p 13B",
-    "base": "hunyuan",
+    "architecture": "hunyuan",
    "description": " AccVideo is a novel efficient distillation method to accelerate video diffusion models with synthetic datset. Our method is 8.5x faster than HunyuanVideo.",
    "URLs": [
      "https://huggingface.co/DeepBeepMeep/HunyuanVideo/resolve/main/accvideo_hunyuan_video_720_quanto_int8.safetensors"
--- a/finetunes/hunyuan_t2v_fast.json
+++ b/finetunes/hunyuan_t2v_fast.json
@ -1,7 +1,7 @@
 {
  "model": {
    "name": "Hunyuan Fast Video 720p 13B",
-    "base": "hunyuan",
+    "architecture": "hunyuan",
    "description": "Fast Hunyuan is an accelerated HunyuanVideo model. It can sample high quality videos with 6 diffusion steps.",
    "URLs": [
      "https://huggingface.co/DeepBeepMeep/HunyuanVideo/resolve/main/fast_hunyuan_video_720_quanto_int8.safetensors"
--- a/finetunes/t2v_fusionix.json
+++ b/finetunes/t2v_fusionix.json
@ -2,7 +2,7 @@
 	"model":
 	{
 		"name": "Wan text2video FusioniX 14B",
-		"base" : "t2v",
+		"architecture" : "t2v",
 		"description": "A powerful merged text-to-video model based on the original WAN 2.1 T2V model, enhanced using multiple open-source components and LoRAs to boost motion realism, temporal consistency, and expressive detail.",
 		"URLs": [
 			"https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/Wan14BT2VFusioniX_fp16.safetensors",
--- a/finetunes/vace_14B_fusionix.json
+++ b/finetunes/vace_14B_fusionix.json
@ -2,12 +2,13 @@
 	"model":
 	{
 		"name": "Vace FusioniX 14B",
-		"base" : "vace_14B",
+		"architecture" : "vace_14B",
+		"modules" : ["vace_14B"],		
 		"description": "Vace control model enhanced using multiple open-source components and LoRAs to boost motion realism, temporal consistency, and expressive detail.",
 		"URLs": [
-			"https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/wan2.1_Vace_FusioniX_14B_mfp16.safetensors",
-			"https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/wan2.1_Vace_FusioniX_14B_quanto_mfp16_int8.safetensors",
-			"https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/wan2.1_Vace_FusioniX_14B_quanto_mbf16_int8.safetensors"
+			"https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/Wan14BT2VFusioniX_fp16.safetensors",
+			"https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/Wan14BT2VFusioniX_quanto_fp16_int8.safetensors",
+			"https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/Wan14BT2VFusioniX_quanto_bf16_int8.safetensors"
 		],
 		"auto_quantize": true
 	},
--- a/hyvideo/hunyuan.py
+++ b/hyvideo/hunyuan.py
@ -387,7 +387,7 @@ class Inference(object):
        # model = Inference.load_state_dict(args, model, model_filepath)

        # model_filepath ="c:/temp/hc/mp_rank_00_model_states_video.pt"
-        offload.load_model_data(model, model_filepath, quantizeTransformer = quantizeTransformer and not save_quantized, pinToMemory = pinToMemory, partialPinning = partialPinning)
+        offload.load_model_data(model, model_filepath, do_quantize= quantizeTransformer and not save_quantized, pinToMemory = pinToMemory, partialPinning = partialPinning)
        pass
        # offload.save_model(model, "hunyuan_video_avatar_edit_720_bf16.safetensors")
        # offload.save_model(model, "hunyuan_video_avatar_edit_720_quanto_bf16_int8.safetensors", do_quantize= True)
--- a/hyvideo/modules/models.py
+++ b/hyvideo/modules/models.py
@ -493,8 +493,8 @@ class MMSingleStreamBlock(nn.Module):
        return img, txt

 class HYVideoDiffusionTransformer(ModelMixin, ConfigMixin):
-    def preprocess_loras(self, model_filename, sd):
-        if not "i2v" in model_filename:
+    def preprocess_loras(self, model_type, sd):
+        if model_type != "i2v" :
            return sd
        new_sd = {}
        for k,v in sd.items():
--- a/requirements.txt
+++ b/requirements.txt
@ -17,7 +17,7 @@ gradio==5.23.0
 numpy>=1.23.5,<2
 einops
 moviepy==1.0.3
-mmgp==3.4.8
+mmgp==3.4.9
 peft==0.14.0
 mutagen
 pydantic==2.10.6
--- a/wan/diffusion_forcing.py
+++ b/wan/diffusion_forcing.py
@ -64,7 +64,8 @@ class DTT2V:
        # model_filename = "model.safetensors"
        # model_filename = "c:/temp/diffusion_pytorch_model-00001-of-00006.safetensors"
        base_config_file = f"configs/{base_model_type}.json"
-        self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer, writable_tensors= False) # , forcedConfigPath="c:/temp/config _df720.json")
+        forcedConfigPath = base_config_file if len(model_filename) > 1 else None
+        self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer, writable_tensors= False , forcedConfigPath=forcedConfigPath)
        # offload.load_model_data(self.model, "recam.ckpt")
        # self.model.cpu()
        # dtype = torch.float16
--- a/wan/image2video.py
+++ b/wan/image2video.py
@ -104,7 +104,8 @@ class WanI2V:
        # model_filename = "c:/temp/i2v480p/diffusion_pytorch_model-00001-of-00007.safetensors"
        # dtype = torch.float16
        base_config_file = f"configs/{base_model_type}.json"
-        self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer and not save_quantized, writable_tensors= False, defaultConfigPath= base_config_file) #, forcedConfigPath= "c:/temp/i2v720p/config.json")
+        forcedConfigPath = base_config_file if len(model_filename) > 1 else None
+        self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer and not save_quantized, writable_tensors= False, defaultConfigPath= base_config_file, forcedConfigPath= forcedConfigPath)
        self.model.lock_layers_dtypes(torch.float32 if mixed_precision_transformer else dtype)
        offload.change_dtype(self.model, dtype, True)
        # offload.save_model(self.model, "wan2.1_image2video_720p_14B_mbf16.safetensors", config_file_path="c:/temp/i2v720p/config.json")
--- a/wan/modules/model.py
+++ b/wan/modules/model.py
@ -589,7 +589,7 @@ class MLPProj(torch.nn.Module):


 class WanModel(ModelMixin, ConfigMixin):
-    def preprocess_loras(self, model_filename, sd):
+    def preprocess_loras(self, model_type, sd):

        first = next(iter(sd), None)
        if first == None:
@ -634,7 +634,7 @@ class WanModel(ModelMixin, ConfigMixin):
            new_sd.update(new_alphas)
            sd = new_sd
        from wgp import test_class_i2v 
-        if not test_class_i2v(model_filename):
+        if not test_class_i2v(model_type):
            new_sd = {}
            # convert loras for i2v to t2v
            for k,v in sd.items():
--- a/wan/text2video.py
+++ b/wan/text2video.py
@ -84,10 +84,11 @@ class WanT2V:
        from mmgp import offload
        # model_filename = "c:/temp/vace1.3/diffusion_pytorch_model.safetensors"
        # model_filename = "Vacefusionix_quanto_fp16_int8.safetensors"
-        # model_filename = "c:/temp/phantom/Phantom_Wan_14B-00001-of-00006.safetensors"
-        # config_filename= "c:/temp/phantom/config.json"
+        # model_filename = "c:/temp/t2v/diffusion_pytorch_model-00001-of-00006.safetensors"
+        # config_filename= "c:/temp/t2v/t2v.json"
        base_config_file = f"configs/{base_model_type}.json"
-        self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer and not save_quantized, writable_tensors= False, defaultConfigPath=base_config_file)#, forcedConfigPath= config_filename)
+        forcedConfigPath = base_config_file if len(model_filename) > 1 else None
+        self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer and not save_quantized, writable_tensors= False, defaultConfigPath=base_config_file , forcedConfigPath= forcedConfigPath)
        # offload.load_model_data(self.model, "c:/temp/Phantom-Wan-1.3B.pth")
        # self.model.to(torch.bfloat16)
        # self.model.cpu()
@ -95,8 +96,8 @@ class WanT2V:
        # dtype = torch.bfloat16
        # offload.load_model_data(self.model, "ckpts/Wan14BT2VFusioniX_fp16.safetensors")
        offload.change_dtype(self.model, dtype, True)
-        # offload.save_model(self.model, "wanfusionix_fp16.safetensors", config_file_path=base_config_file)
-        # offload.save_model(self.model, "wanfusionix_quanto_fp16_int8.safetensors", do_quantize=True, config_file_path=base_config_file)
+        # offload.save_model(self.model, "wan2.1_text2video_14B_mbf16.safetensors", config_file_path=base_config_file)
+        # offload.save_model(self.model, "wan2.1_text2video_14B_quanto_mfp16_int8.safetensors", do_quantize=True, config_file_path=base_config_file)
        self.model.eval().requires_grad_(False)
        if save_quantized:            
            from wan.utils.utils import save_quantized_model
--- a/wgp.py
+++ b/wgp.py