mirror of
https://github.com/Wan-Video/Wan2.1.git
synced 2025-11-04 14:16:57 +00:00
Merge branch 'main' into queue_editor
This commit is contained in:
commit
e69a406808
25
README.md
25
README.md
@ -20,6 +20,31 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
|
||||
**Follow DeepBeepMeep on Twitter/X to get the Latest News**: https://x.com/deepbeepmeep
|
||||
|
||||
## 🔥 Latest Updates :
|
||||
### September 11 2025: WanGP v8.5/8.55 - Wanna be a Cropper or a Painter ?
|
||||
|
||||
I have done some intensive internal refactoring of the generation pipeline to ease support of existing models or add new models. Nothing really visible but this makes WanGP is little more future proof.
|
||||
|
||||
Otherwise in the news:
|
||||
- **Cropped Input Image Prompts**: as quite often most *Image Prompts* provided (*Start Image, Input Video, Reference Image, Control Video, ...*) rarely matched your requested *Output Resolution*. In that case I used the resolution you gave either as a *Pixels Budget* or as an *Outer Canvas* for the Generated Video. However in some occasion you really want the requested Output Resolution and nothing else. Besides some models deliver much better Generations if you stick to one of their supported resolutions. In order to address this need I have added a new Output Resolution choice in the *Configuration Tab*: **Dimensions Correspond to the Ouput Weight & Height as the Prompt Images will be Cropped to fit Exactly these dimensins**. In short if needed the *Input Prompt Images* will be cropped (centered cropped for the moment). You will see this can make quite a difference for some models
|
||||
|
||||
- *Qwen Edit* has now a new sub Tab called **Inpainting**, that lets you target with a brush which part of the *Image Prompt* you want to modify. This is quite convenient if you find that Qwen Edit modifies usually too many things. Of course, as there are more constraints for Qwen Edit don't be surprised if sometime it will return the original image unchanged. A piece of advise: describe in your *Text Prompt* where (for instance *left to the man*, *top*, ...) the parts that you want to modify are located.
|
||||
|
||||
The mask inpainting is fully compatible with *Matanyone Mask generator*: generate first an *Image Mask* with Matanyone, transfer it to the current Image Generator and modify the mask with the *Paint Brush*. Talking about matanyone I have fixed a bug that caused a mask degradation with long videos (now WanGP Matanyone is as good as the original app and still requires 3 times less VRAM)
|
||||
|
||||
- This **Inpainting Mask Editor** has been added also to *Vace Image Mode*. Vace is probably still one of best Image Editor today. Here is a very simple & efficient workflow that do marvels with Vace:
|
||||
Select *Vace Cocktail > Control Image Process = Perform Inpainting & Area Processed = Masked Area > Upload a Control Image, then draw your mask directly on top of the image & enter a text Prompt that describes the expected change > Generate > Below the Video Gallery click 'To Control Image' > Keep on doing more changes*.
|
||||
|
||||
Doing more sophisticated thing Vace Image Editor works very well too: try Image Outpainting, Pose transfer, ...
|
||||
|
||||
For the best quality I recommend to set in *Quality Tab* the option: "*Generate a 9 Frames Long video...*"
|
||||
|
||||
**update 8.55**: Flux Festival
|
||||
- **Inpainting Mode** also added for *Flux Kontext*
|
||||
- **Flux SRPO** : new finetune with x3 better quality vs Flux Dev according to its authors. I have also created a *Flux SRPO USO* finetune which is certainly the best open source *Style Transfer* tool available
|
||||
- **Flux UMO**: model specialized in combining multiple reference objects / people together. Works quite well at 768x768
|
||||
|
||||
Good luck with finding your way through all the Flux models names !
|
||||
|
||||
### September 5 2025: WanGP v8.4 - Take me to Outer Space
|
||||
You have probably seen these short AI generated movies created using *Nano Banana* and the *First Frame - Last Frame* feature of *Kling 2.0*. The idea is to generate an image, modify a part of it with Nano Banana and give the these two images to Kling that will generate the Video between these two images, use now the previous Last Frame as the new First Frame, rinse and repeat and you get a full movie.
|
||||
|
||||
|
||||
@ -7,8 +7,6 @@
|
||||
"https://huggingface.co/DeepBeepMeep/Flux/resolve/main/flux1_kontext_dev_bf16.safetensors",
|
||||
"https://huggingface.co/DeepBeepMeep/Flux/resolve/main/flux1_kontext_dev_quanto_bf16_int8.safetensors"
|
||||
],
|
||||
"image_outputs": true,
|
||||
"reference_image": true,
|
||||
"flux-model": "flux-dev-kontext"
|
||||
},
|
||||
"prompt": "add a hat",
|
||||
|
||||
24
defaults/flux_dev_umo.json
Normal file
24
defaults/flux_dev_umo.json
Normal file
@ -0,0 +1,24 @@
|
||||
{
|
||||
"model": {
|
||||
"name": "Flux 1 Dev UMO 12B",
|
||||
"architecture": "flux",
|
||||
"description": "FLUX.1 Dev UMO is a model that can Edit Images with a specialization in combining multiple image references (resized internally at 512x512 max) to produce an Image output. Best Image preservation at 768x768 Resolution Output.",
|
||||
"URLs": "flux",
|
||||
"flux-model": "flux-dev-umo",
|
||||
"loras": ["https://huggingface.co/DeepBeepMeep/Flux/resolve/main/flux1-dev-UMO_dit_lora_bf16.safetensors"],
|
||||
"resolutions": [ ["1024x1024 (1:1)", "1024x1024"],
|
||||
["768x1024 (3:4)", "768x1024"],
|
||||
["1024x768 (4:3)", "1024x768"],
|
||||
["512x1024 (1:2)", "512x1024"],
|
||||
["1024x512 (2:1)", "1024x512"],
|
||||
["768x768 (1:1)", "768x768"],
|
||||
["768x512 (3:2)", "768x512"],
|
||||
["512x768 (2:3)", "512x768"]]
|
||||
},
|
||||
"prompt": "the man is wearing a hat",
|
||||
"embedded_guidance_scale": 4,
|
||||
"resolution": "768x768",
|
||||
"batch_size": 1
|
||||
}
|
||||
|
||||
|
||||
@ -2,12 +2,10 @@
|
||||
"model": {
|
||||
"name": "Flux 1 Dev USO 12B",
|
||||
"architecture": "flux",
|
||||
"description": "FLUX.1 Dev USO is a model specialized to Edit Images with a specialization in Style Transfers (up to two).",
|
||||
"description": "FLUX.1 Dev USO is a model that can Edit Images with a specialization in Style Transfers (up to two).",
|
||||
"modules": [ ["https://huggingface.co/DeepBeepMeep/Flux/resolve/main/flux1-dev-USO_projector_bf16.safetensors"]],
|
||||
"URLs": "flux",
|
||||
"loras": ["https://huggingface.co/DeepBeepMeep/Flux/resolve/main/flux1-dev-USO_dit_lora_bf16.safetensors"],
|
||||
"image_outputs": true,
|
||||
"reference_image": true,
|
||||
"flux-model": "flux-dev-uso"
|
||||
},
|
||||
"prompt": "the man is wearing a hat",
|
||||
|
||||
15
defaults/flux_srpo.json
Normal file
15
defaults/flux_srpo.json
Normal file
@ -0,0 +1,15 @@
|
||||
{
|
||||
"model": {
|
||||
"name": "Flux 1 SRPO Dev 12B",
|
||||
"architecture": "flux",
|
||||
"description": "By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, SRPO improves its human-evaluated realism and aesthetic quality by over 3x.",
|
||||
"URLs": [
|
||||
"https://huggingface.co/DeepBeepMeep/Flux/resolve/main/flux1-srpo-dev_bf16.safetensors",
|
||||
"https://huggingface.co/DeepBeepMeep/Flux/resolve/main/flux1-srpo-dev_quanto_bf16_int8.safetensors"
|
||||
],
|
||||
"flux-model": "flux-dev"
|
||||
},
|
||||
"prompt": "draw a hat",
|
||||
"resolution": "1024x1024",
|
||||
"batch_size": 1
|
||||
}
|
||||
17
defaults/flux_srpo_uso.json
Normal file
17
defaults/flux_srpo_uso.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"model": {
|
||||
"name": "Flux 1 SRPO USO 12B",
|
||||
"architecture": "flux",
|
||||
"description": "FLUX.1 SRPO USO is a model that can Edit Images with a specialization in Style Transfers (up to two). It leverages the improved Image quality brought by the SRPO process",
|
||||
"modules": [ "flux_dev_uso"],
|
||||
"URLs": "flux_srpo",
|
||||
"loras": "flux_dev_uso",
|
||||
"flux-model": "flux-dev-uso"
|
||||
},
|
||||
"prompt": "the man is wearing a hat",
|
||||
"embedded_guidance_scale": 4,
|
||||
"resolution": "1024x1024",
|
||||
"batch_size": 1
|
||||
}
|
||||
|
||||
|
||||
@ -9,9 +9,7 @@
|
||||
],
|
||||
"attention": {
|
||||
"<89": "sdpa"
|
||||
},
|
||||
"reference_image": true,
|
||||
"image_outputs": true
|
||||
}
|
||||
},
|
||||
"prompt": "add a hat",
|
||||
"resolution": "1280x720",
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
"name": "Wan2.1 Standin 14B",
|
||||
"modules": [ ["https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/Stand-In_wan2.1_T2V_14B_ver1.0_bf16.safetensors"]],
|
||||
"architecture" : "standin",
|
||||
"description": "The original Wan Text 2 Video model combined with the StandIn module to improve Identity Preservation. You need to provide a Reference Image with white background which is a close up of person face to transfer this person in the Video.",
|
||||
"description": "The original Wan Text 2 Video model combined with the StandIn module to improve Identity Preservation. You need to provide a Reference Image with white background which is a close up of a person face to transfer this person in the Video.",
|
||||
"URLs": "t2v"
|
||||
}
|
||||
}
|
||||
@ -13,28 +13,52 @@ class family_handler():
|
||||
flux_schnell = flux_model == "flux-schnell"
|
||||
flux_chroma = flux_model == "flux-chroma"
|
||||
flux_uso = flux_model == "flux-dev-uso"
|
||||
model_def_output = {
|
||||
flux_umo = flux_model == "flux-dev-umo"
|
||||
flux_kontext = flux_model == "flux-dev-kontext"
|
||||
|
||||
extra_model_def = {
|
||||
"image_outputs" : True,
|
||||
"no_negative_prompt" : not flux_chroma,
|
||||
}
|
||||
if flux_chroma:
|
||||
model_def_output["guidance_max_phases"] = 1
|
||||
extra_model_def["guidance_max_phases"] = 1
|
||||
elif not flux_schnell:
|
||||
model_def_output["embedded_guidance"] = True
|
||||
extra_model_def["embedded_guidance"] = True
|
||||
if flux_uso :
|
||||
model_def_output["any_image_refs_relative_size"] = True
|
||||
model_def_output["no_background_removal"] = True
|
||||
|
||||
model_def_output["image_ref_choices"] = {
|
||||
extra_model_def["any_image_refs_relative_size"] = True
|
||||
extra_model_def["no_background_removal"] = True
|
||||
extra_model_def["image_ref_choices"] = {
|
||||
"choices":[("No Reference Image", ""),("First Image is a Reference Image, and then the next ones (up to two) are Style Images", "KI"),
|
||||
("Up to two Images are Style Images", "KIJ")],
|
||||
"default": "KI",
|
||||
"letters_filter": "KIJ",
|
||||
"label": "Reference Images / Style Images"
|
||||
}
|
||||
model_def_output["lock_image_refs_ratios"] = True
|
||||
|
||||
return model_def_output
|
||||
if flux_kontext:
|
||||
extra_model_def["inpaint_support"] = True
|
||||
extra_model_def["image_ref_choices"] = {
|
||||
"choices": [
|
||||
("None", ""),
|
||||
("Conditional Images is first Main Subject / Landscape and may be followed by People / Objects", "KI"),
|
||||
("Conditional Images are People / Objects", "I"),
|
||||
],
|
||||
"letters_filter": "KI",
|
||||
}
|
||||
extra_model_def["background_removal_label"]= "Remove Backgrounds only behind People / Objects except main Subject / Landscape"
|
||||
elif flux_umo:
|
||||
extra_model_def["image_ref_choices"] = {
|
||||
"choices": [
|
||||
("Conditional Images are People / Objects", "I"),
|
||||
],
|
||||
"letters_filter": "I",
|
||||
"visible": False
|
||||
}
|
||||
|
||||
|
||||
extra_model_def["lock_image_refs_ratios"] = True
|
||||
|
||||
return extra_model_def
|
||||
|
||||
@staticmethod
|
||||
def query_supported_types():
|
||||
@ -118,15 +142,28 @@ class family_handler():
|
||||
video_prompt_type = video_prompt_type.replace("I", "KI")
|
||||
ui_defaults["video_prompt_type"] = video_prompt_type
|
||||
|
||||
if settings_version < 2.34:
|
||||
ui_defaults["denoising_strength"] = 1.
|
||||
|
||||
@staticmethod
|
||||
def update_default_settings(base_model_type, model_def, ui_defaults):
|
||||
flux_model = model_def.get("flux-model", "flux-dev")
|
||||
flux_uso = flux_model == "flux-dev-uso"
|
||||
flux_umo = flux_model == "flux-dev-umo"
|
||||
flux_kontext = flux_model == "flux-dev-kontext"
|
||||
ui_defaults.update({
|
||||
"embedded_guidance": 2.5,
|
||||
})
|
||||
if model_def.get("reference_image", False):
|
||||
|
||||
if flux_kontext or flux_uso:
|
||||
ui_defaults.update({
|
||||
"video_prompt_type": "KI",
|
||||
"denoising_strength": 1.,
|
||||
})
|
||||
elif flux_umo:
|
||||
ui_defaults.update({
|
||||
"video_prompt_type": "I",
|
||||
"remove_background_images_ref": 0,
|
||||
})
|
||||
|
||||
|
||||
|
||||
@ -23,44 +23,35 @@ from .util import (
|
||||
)
|
||||
|
||||
from PIL import Image
|
||||
def preprocess_ref(raw_image: Image.Image, long_size: int = 512):
|
||||
# 获取原始图像的宽度和高度
|
||||
image_w, image_h = raw_image.size
|
||||
|
||||
def resize_and_centercrop_image(image, target_height_ref1, target_width_ref1):
|
||||
target_height_ref1 = int(target_height_ref1 // 64 * 64)
|
||||
target_width_ref1 = int(target_width_ref1 // 64 * 64)
|
||||
h, w = image.shape[-2:]
|
||||
if h < target_height_ref1 or w < target_width_ref1:
|
||||
# 计算长宽比
|
||||
aspect_ratio = w / h
|
||||
if h < target_height_ref1:
|
||||
new_h = target_height_ref1
|
||||
new_w = new_h * aspect_ratio
|
||||
if new_w < target_width_ref1:
|
||||
new_w = target_width_ref1
|
||||
new_h = new_w / aspect_ratio
|
||||
else:
|
||||
new_w = target_width_ref1
|
||||
new_h = new_w / aspect_ratio
|
||||
if new_h < target_height_ref1:
|
||||
new_h = target_height_ref1
|
||||
new_w = new_h * aspect_ratio
|
||||
# 计算长边和短边
|
||||
if image_w >= image_h:
|
||||
new_w = long_size
|
||||
new_h = int((long_size / image_w) * image_h)
|
||||
else:
|
||||
aspect_ratio = w / h
|
||||
tgt_aspect_ratio = target_width_ref1 / target_height_ref1
|
||||
if aspect_ratio > tgt_aspect_ratio:
|
||||
new_h = target_height_ref1
|
||||
new_w = new_h * aspect_ratio
|
||||
else:
|
||||
new_w = target_width_ref1
|
||||
new_h = new_w / aspect_ratio
|
||||
# 使用 TVF.resize 进行图像缩放
|
||||
image = TVF.resize(image, (math.ceil(new_h), math.ceil(new_w)))
|
||||
# 计算中心裁剪的参数
|
||||
top = (image.shape[-2] - target_height_ref1) // 2
|
||||
left = (image.shape[-1] - target_width_ref1) // 2
|
||||
# 使用 TVF.crop 进行中心裁剪
|
||||
image = TVF.crop(image, top, left, target_height_ref1, target_width_ref1)
|
||||
return image
|
||||
new_h = long_size
|
||||
new_w = int((long_size / image_h) * image_w)
|
||||
|
||||
# 按新的宽高进行等比例缩放
|
||||
raw_image = raw_image.resize((new_w, new_h), resample=Image.LANCZOS)
|
||||
target_w = new_w // 16 * 16
|
||||
target_h = new_h // 16 * 16
|
||||
|
||||
# 计算裁剪的起始坐标以实现中心裁剪
|
||||
left = (new_w - target_w) // 2
|
||||
top = (new_h - target_h) // 2
|
||||
right = left + target_w
|
||||
bottom = top + target_h
|
||||
|
||||
# 进行中心裁剪
|
||||
raw_image = raw_image.crop((left, top, right, bottom))
|
||||
|
||||
# 转换为 RGB 模式
|
||||
raw_image = raw_image.convert("RGB")
|
||||
return raw_image
|
||||
|
||||
def stitch_images(img1, img2):
|
||||
# Resize img2 to match img1's height
|
||||
@ -105,7 +96,7 @@ class model_factory:
|
||||
# self.name= "flux-schnell"
|
||||
source = model_def.get("source", None)
|
||||
self.model = load_flow_model(self.name, model_filename[0] if source is None else source, torch_device)
|
||||
|
||||
self.model_def = model_def
|
||||
self.vae = load_ae(self.name, device=torch_device)
|
||||
|
||||
siglip_processor = siglip_model = feature_embedder = None
|
||||
@ -151,6 +142,8 @@ class model_factory:
|
||||
n_prompt: str = None,
|
||||
sampling_steps: int = 20,
|
||||
input_ref_images = None,
|
||||
image_guide= None,
|
||||
image_mask= None,
|
||||
width= 832,
|
||||
height=480,
|
||||
embedded_guidance_scale: float = 2.5,
|
||||
@ -162,6 +155,7 @@ class model_factory:
|
||||
video_prompt_type = "",
|
||||
joint_pass = False,
|
||||
image_refs_relative_size = 100,
|
||||
denoising_strength = 1.,
|
||||
**bbargs
|
||||
):
|
||||
if self._interrupt:
|
||||
@ -170,10 +164,16 @@ class model_factory:
|
||||
if n_prompt is None or len(n_prompt) == 0: n_prompt = "low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"
|
||||
device="cuda"
|
||||
flux_dev_uso = self.name in ['flux-dev-uso']
|
||||
image_stiching = not self.name in ['flux-dev-uso'] #and False
|
||||
# image_refs_relative_size = 100
|
||||
crop = False
|
||||
flux_dev_umo = self.name in ['flux-dev-umo']
|
||||
latent_stiching = self.name in ['flux-dev-uso', 'flux-dev-umo']
|
||||
|
||||
lock_dimensions= False
|
||||
|
||||
input_ref_images = [] if input_ref_images is None else input_ref_images[:]
|
||||
if flux_dev_umo:
|
||||
ref_long_side = 512 if len(input_ref_images) <= 1 else 320
|
||||
input_ref_images = [preprocess_ref(img, ref_long_side) for img in input_ref_images]
|
||||
lock_dimensions = True
|
||||
ref_style_imgs = []
|
||||
if "I" in video_prompt_type and len(input_ref_images) > 0:
|
||||
if flux_dev_uso :
|
||||
@ -183,43 +183,26 @@ class model_factory:
|
||||
elif len(input_ref_images) > 1 :
|
||||
ref_style_imgs = input_ref_images[-1:]
|
||||
input_ref_images = input_ref_images[:-1]
|
||||
if image_stiching:
|
||||
|
||||
if latent_stiching:
|
||||
# latents stiching with resize
|
||||
if not lock_dimensions :
|
||||
for i in range(len(input_ref_images)):
|
||||
w, h = input_ref_images[i].size
|
||||
image_height, image_width = calculate_new_dimensions(int(height*image_refs_relative_size/100), int(width*image_refs_relative_size/100), h, w, 0)
|
||||
input_ref_images[i] = input_ref_images[i].resize((image_width, image_height), resample=Image.Resampling.LANCZOS)
|
||||
else:
|
||||
# image stiching method
|
||||
stiched = input_ref_images[0]
|
||||
if "K" in video_prompt_type :
|
||||
w, h = input_ref_images[0].size
|
||||
height, width = calculate_new_dimensions(height, width, h, w, fit_into_canvas)
|
||||
# actual rescale will happen in prepare_kontext
|
||||
for new_img in input_ref_images[1:]:
|
||||
stiched = stitch_images(stiched, new_img)
|
||||
input_ref_images = [stiched]
|
||||
else:
|
||||
first_ref = 0
|
||||
if "K" in video_prompt_type:
|
||||
# image latents tiling method
|
||||
w, h = input_ref_images[0].size
|
||||
if crop :
|
||||
img = convert_image_to_tensor(input_ref_images[0])
|
||||
img = resize_and_centercrop_image(img, height, width)
|
||||
input_ref_images[0] = convert_tensor_to_image(img)
|
||||
else:
|
||||
height, width = calculate_new_dimensions(height, width, h, w, fit_into_canvas)
|
||||
input_ref_images[0] = input_ref_images[0].resize((width, height), resample=Image.Resampling.LANCZOS)
|
||||
first_ref = 1
|
||||
|
||||
for i in range(first_ref,len(input_ref_images)):
|
||||
w, h = input_ref_images[i].size
|
||||
if crop:
|
||||
img = convert_image_to_tensor(input_ref_images[i])
|
||||
img = resize_and_centercrop_image(img, int(height*image_refs_relative_size/100), int(width*image_refs_relative_size/100))
|
||||
input_ref_images[i] = convert_tensor_to_image(img)
|
||||
else:
|
||||
image_height, image_width = calculate_new_dimensions(int(height*image_refs_relative_size/100), int(width*image_refs_relative_size/100), h, w, fit_into_canvas)
|
||||
input_ref_images[i] = input_ref_images[i].resize((image_width, image_height), resample=Image.Resampling.LANCZOS)
|
||||
elif image_guide is not None:
|
||||
input_ref_images = [image_guide]
|
||||
else:
|
||||
input_ref_images = None
|
||||
|
||||
if flux_dev_uso :
|
||||
if self.name in ['flux-dev-uso', 'flux-dev-umo'] :
|
||||
inp, height, width = prepare_multi_ip(
|
||||
ae=self.vae,
|
||||
img_cond_list=input_ref_images,
|
||||
@ -238,6 +221,7 @@ class model_factory:
|
||||
bs=batch_size,
|
||||
seed=seed,
|
||||
device=device,
|
||||
img_mask=image_mask,
|
||||
)
|
||||
|
||||
inp.update(prepare_prompt(self.t5, self.clip, batch_size, input_prompt))
|
||||
@ -259,13 +243,19 @@ class model_factory:
|
||||
return unpack(x.float(), height, width)
|
||||
|
||||
# denoise initial noise
|
||||
x = denoise(self.model, **inp, timesteps=timesteps, guidance=embedded_guidance_scale, real_guidance_scale =guide_scale, callback=callback, pipeline=self, loras_slists= loras_slists, unpack_latent = unpack_latent, joint_pass = joint_pass)
|
||||
x = denoise(self.model, **inp, timesteps=timesteps, guidance=embedded_guidance_scale, real_guidance_scale =guide_scale, callback=callback, pipeline=self, loras_slists= loras_slists, unpack_latent = unpack_latent, joint_pass = joint_pass, denoising_strength = denoising_strength)
|
||||
if x==None: return None
|
||||
# decode latents to pixel space
|
||||
x = unpack_latent(x)
|
||||
with torch.autocast(device_type=device, dtype=torch.bfloat16):
|
||||
x = self.vae.decode(x)
|
||||
|
||||
if image_mask is not None:
|
||||
from shared.utils.utils import convert_image_to_tensor
|
||||
img_msk_rebuilt = inp["img_msk_rebuilt"]
|
||||
img= convert_image_to_tensor(image_guide)
|
||||
x = img.squeeze(2) * (1 - img_msk_rebuilt) + x.to(img) * img_msk_rebuilt
|
||||
|
||||
x = x.clamp(-1, 1)
|
||||
x = x.transpose(0, 1)
|
||||
return x
|
||||
|
||||
@ -190,6 +190,21 @@ class Flux(nn.Module):
|
||||
v = swap_scale_shift(v)
|
||||
k = k.replace("norm_out.linear", "final_layer.adaLN_modulation.1")
|
||||
new_sd[k] = v
|
||||
# elif not first_key.startswith("diffusion_model.") and not first_key.startswith("transformer."):
|
||||
# for k,v in sd.items():
|
||||
# if "double" in k:
|
||||
# k = k.replace(".processor.proj_lora1.", ".img_attn.proj.lora_")
|
||||
# k = k.replace(".processor.proj_lora2.", ".txt_attn.proj.lora_")
|
||||
# k = k.replace(".processor.qkv_lora1.", ".img_attn.qkv.lora_")
|
||||
# k = k.replace(".processor.qkv_lora2.", ".txt_attn.qkv.lora_")
|
||||
# else:
|
||||
# k = k.replace(".processor.qkv_lora.", ".linear1_qkv.lora_")
|
||||
# k = k.replace(".processor.proj_lora.", ".linear2.lora_")
|
||||
|
||||
# k = "diffusion_model." + k
|
||||
# new_sd[k] = v
|
||||
# from mmgp import safetensors2
|
||||
# safetensors2.torch_write_file(new_sd, "fff.safetensors")
|
||||
else:
|
||||
new_sd = sd
|
||||
return new_sd
|
||||
|
||||
@ -138,10 +138,12 @@ def prepare_kontext(
|
||||
target_width: int | None = None,
|
||||
target_height: int | None = None,
|
||||
bs: int = 1,
|
||||
|
||||
img_mask = None,
|
||||
) -> tuple[dict[str, Tensor], int, int]:
|
||||
# load and encode the conditioning image
|
||||
|
||||
res_match_output = img_mask is not None
|
||||
|
||||
img_cond_seq = None
|
||||
img_cond_seq_ids = None
|
||||
if img_cond_list == None: img_cond_list = []
|
||||
@ -150,9 +152,11 @@ def prepare_kontext(
|
||||
for cond_no, img_cond in enumerate(img_cond_list):
|
||||
width, height = img_cond.size
|
||||
aspect_ratio = width / height
|
||||
|
||||
# Kontext is trained on specific resolutions, using one of them is recommended
|
||||
_, width, height = min((abs(aspect_ratio - w / h), w, h) for w, h in PREFERED_KONTEXT_RESOLUTIONS)
|
||||
if res_match_output:
|
||||
width, height = target_width, target_height
|
||||
else:
|
||||
# Kontext is trained on specific resolutions, using one of them is recommended
|
||||
_, width, height = min((abs(aspect_ratio - w / h), w, h) for w, h in PREFERED_KONTEXT_RESOLUTIONS)
|
||||
width = 2 * int(width / 16)
|
||||
height = 2 * int(height / 16)
|
||||
|
||||
@ -193,6 +197,19 @@ def prepare_kontext(
|
||||
"img_cond_seq": img_cond_seq,
|
||||
"img_cond_seq_ids": img_cond_seq_ids,
|
||||
}
|
||||
if img_mask is not None:
|
||||
from shared.utils.utils import convert_image_to_tensor, convert_tensor_to_image
|
||||
# image_height, image_width = calculate_new_dimensions(ref_height, ref_width, image_height, image_width, False, block_size=multiple_of)
|
||||
image_mask_latents = convert_image_to_tensor(img_mask.resize((target_width // 16, target_height // 16), resample=Image.Resampling.LANCZOS))
|
||||
image_mask_latents = torch.where(image_mask_latents>-0.5, 1., 0. )[0:1]
|
||||
image_mask_rebuilt = image_mask_latents.repeat_interleave(16, dim=-1).repeat_interleave(16, dim=-2).unsqueeze(0)
|
||||
convert_tensor_to_image( image_mask_rebuilt.squeeze(0).repeat(3,1,1)).save("mmm.png")
|
||||
image_mask_latents = image_mask_latents.reshape(1, -1, 1).to(device)
|
||||
return_dict.update({
|
||||
"img_msk_latents": image_mask_latents,
|
||||
"img_msk_rebuilt": image_mask_rebuilt,
|
||||
})
|
||||
|
||||
img = get_noise(
|
||||
bs,
|
||||
target_height,
|
||||
@ -264,6 +281,9 @@ def denoise(
|
||||
loras_slists=None,
|
||||
unpack_latent = None,
|
||||
joint_pass= False,
|
||||
img_msk_latents = None,
|
||||
img_msk_rebuilt = None,
|
||||
denoising_strength = 1,
|
||||
):
|
||||
|
||||
kwargs = {'pipeline': pipeline, 'callback': callback, "img_len" : img.shape[1], "siglip_embedding": siglip_embedding, "siglip_embedding_ids": siglip_embedding_ids}
|
||||
@ -271,6 +291,21 @@ def denoise(
|
||||
if callback != None:
|
||||
callback(-1, None, True)
|
||||
|
||||
original_image_latents = None if img_cond_seq is None else img_cond_seq.clone()
|
||||
|
||||
morph, first_step = False, 0
|
||||
if img_msk_latents is not None:
|
||||
randn = torch.randn_like(original_image_latents)
|
||||
if denoising_strength < 1.:
|
||||
first_step = int(len(timesteps) * (1. - denoising_strength))
|
||||
if not morph:
|
||||
latent_noise_factor = timesteps[first_step]
|
||||
latents = original_image_latents * (1.0 - latent_noise_factor) + randn * latent_noise_factor
|
||||
img = latents.to(img)
|
||||
latents = None
|
||||
timesteps = timesteps[first_step:]
|
||||
|
||||
|
||||
updated_num_steps= len(timesteps) -1
|
||||
if callback != None:
|
||||
from shared.utils.loras_mutipliers import update_loras_slists
|
||||
@ -280,10 +315,14 @@ def denoise(
|
||||
# this is ignored for schnell
|
||||
guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
|
||||
for i, (t_curr, t_prev) in enumerate(zip(timesteps[:-1], timesteps[1:])):
|
||||
offload.set_step_no_for_lora(model, i)
|
||||
offload.set_step_no_for_lora(model, first_step + i)
|
||||
if pipeline._interrupt:
|
||||
return None
|
||||
|
||||
if img_msk_latents is not None and denoising_strength <1. and i == first_step and morph:
|
||||
latent_noise_factor = t_curr/1000
|
||||
img = original_image_latents * (1.0 - latent_noise_factor) + img * latent_noise_factor
|
||||
|
||||
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
|
||||
img_input = img
|
||||
img_input_ids = img_ids
|
||||
@ -333,6 +372,14 @@ def denoise(
|
||||
pred = neg_pred + real_guidance_scale * (pred - neg_pred)
|
||||
|
||||
img += (t_prev - t_curr) * pred
|
||||
|
||||
if img_msk_latents is not None:
|
||||
latent_noise_factor = t_prev
|
||||
# noisy_image = original_image_latents * (1.0 - latent_noise_factor) + torch.randn_like(original_image_latents) * latent_noise_factor
|
||||
noisy_image = original_image_latents * (1.0 - latent_noise_factor) + randn * latent_noise_factor
|
||||
img = noisy_image * (1-img_msk_latents) + img_msk_latents * img
|
||||
noisy_image = None
|
||||
|
||||
if callback is not None:
|
||||
preview = unpack_latent(img).transpose(0,1)
|
||||
callback(i, preview, False)
|
||||
|
||||
@ -640,6 +640,38 @@ configs = {
|
||||
shift_factor=0.1159,
|
||||
),
|
||||
),
|
||||
"flux-dev-umo": ModelSpec(
|
||||
repo_id="",
|
||||
repo_flow="",
|
||||
repo_ae="ckpts/flux_vae.safetensors",
|
||||
params=FluxParams(
|
||||
in_channels=64,
|
||||
out_channels=64,
|
||||
vec_in_dim=768,
|
||||
context_in_dim=4096,
|
||||
hidden_size=3072,
|
||||
mlp_ratio=4.0,
|
||||
num_heads=24,
|
||||
depth=19,
|
||||
depth_single_blocks=38,
|
||||
axes_dim=[16, 56, 56],
|
||||
theta=10_000,
|
||||
qkv_bias=True,
|
||||
guidance_embed=True,
|
||||
eso= True,
|
||||
),
|
||||
ae_params=AutoEncoderParams(
|
||||
resolution=256,
|
||||
in_channels=3,
|
||||
ch=128,
|
||||
out_ch=3,
|
||||
ch_mult=[1, 2, 4, 4],
|
||||
num_res_blocks=2,
|
||||
z_channels=16,
|
||||
scale_factor=0.3611,
|
||||
shift_factor=0.1159,
|
||||
),
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
|
||||
@ -861,11 +861,6 @@ class HunyuanVideoSampler(Inference):
|
||||
freqs_cos, freqs_sin = self.get_rotary_pos_embed(target_frame_num, target_height, target_width, enable_RIFLEx)
|
||||
else:
|
||||
if self.avatar:
|
||||
w, h = input_ref_images.size
|
||||
target_height, target_width = calculate_new_dimensions(target_height, target_width, h, w, fit_into_canvas)
|
||||
if target_width != w or target_height != h:
|
||||
input_ref_images = input_ref_images.resize((target_width,target_height), resample=Image.Resampling.LANCZOS)
|
||||
|
||||
concat_dict = {'mode': 'timecat', 'bias': -1}
|
||||
freqs_cos, freqs_sin = self.get_rotary_pos_embed_new(129, target_height, target_width, concat_dict)
|
||||
else:
|
||||
|
||||
@ -51,6 +51,23 @@ class family_handler():
|
||||
extra_model_def["tea_cache"] = True
|
||||
extra_model_def["mag_cache"] = True
|
||||
|
||||
if base_model_type in ["hunyuan_custom_edit"]:
|
||||
extra_model_def["guide_preprocessing"] = {
|
||||
"selection": ["MV", "PV"],
|
||||
}
|
||||
|
||||
extra_model_def["mask_preprocessing"] = {
|
||||
"selection": ["A", "NA"],
|
||||
"default" : "NA"
|
||||
}
|
||||
|
||||
if base_model_type in ["hunyuan_custom_audio", "hunyuan_custom_edit", "hunyuan_custom"]:
|
||||
extra_model_def["image_ref_choices"] = {
|
||||
"choices": [("Reference Image", "I")],
|
||||
"letters_filter":"I",
|
||||
"visible": False,
|
||||
}
|
||||
|
||||
if base_model_type in ["hunyuan_avatar"]: extra_model_def["no_background_removal"] = True
|
||||
|
||||
if base_model_type in ["hunyuan_custom", "hunyuan_custom_edit", "hunyuan_custom_audio", "hunyuan_avatar"]:
|
||||
@ -141,6 +158,18 @@ class family_handler():
|
||||
|
||||
return hunyuan_model, pipe
|
||||
|
||||
@staticmethod
|
||||
def fix_settings(base_model_type, settings_version, model_def, ui_defaults):
|
||||
if settings_version<2.33:
|
||||
if base_model_type in ["hunyuan_custom_edit"]:
|
||||
video_prompt_type= ui_defaults["video_prompt_type"]
|
||||
if "P" in video_prompt_type and "M" in video_prompt_type:
|
||||
video_prompt_type = video_prompt_type.replace("M","")
|
||||
ui_defaults["video_prompt_type"] = video_prompt_type
|
||||
|
||||
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def update_default_settings(base_model_type, model_def, ui_defaults):
|
||||
ui_defaults["embedded_guidance_scale"]= 6.0
|
||||
|
||||
@ -300,9 +300,6 @@ class LTXV:
|
||||
prefix_size, height, width = input_video.shape[-3:]
|
||||
else:
|
||||
if image_start != None:
|
||||
frame_width, frame_height = image_start.size
|
||||
if fit_into_canvas != None:
|
||||
height, width = calculate_new_dimensions(height, width, frame_height, frame_width, fit_into_canvas, 32)
|
||||
conditioning_media_paths.append(image_start.unsqueeze(1))
|
||||
conditioning_start_frames.append(0)
|
||||
conditioning_control_frames.append(False)
|
||||
|
||||
@ -26,6 +26,15 @@ class family_handler():
|
||||
extra_model_def["sliding_window"] = True
|
||||
extra_model_def["image_prompt_types_allowed"] = "TSEV"
|
||||
|
||||
extra_model_def["guide_preprocessing"] = {
|
||||
"selection": ["", "PV", "DV", "EV", "V"],
|
||||
"labels" : { "V": "Use LTXV raw format"}
|
||||
}
|
||||
|
||||
extra_model_def["mask_preprocessing"] = {
|
||||
"selection": ["", "A", "NA", "XA", "XNA"],
|
||||
}
|
||||
|
||||
return extra_model_def
|
||||
|
||||
@staticmethod
|
||||
|
||||
@ -28,7 +28,7 @@ from transformers import Qwen2_5_VLForConditionalGeneration, Qwen2Tokenizer, Aut
|
||||
from .autoencoder_kl_qwenimage import AutoencoderKLQwenImage
|
||||
from diffusers import FlowMatchEulerDiscreteScheduler
|
||||
from PIL import Image
|
||||
from shared.utils.utils import calculate_new_dimensions
|
||||
from shared.utils.utils import calculate_new_dimensions, convert_image_to_tensor, convert_tensor_to_image
|
||||
|
||||
XLA_AVAILABLE = False
|
||||
|
||||
@ -563,6 +563,8 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
callback_on_step_end_tensor_inputs: List[str] = ["latents"],
|
||||
max_sequence_length: int = 512,
|
||||
image = None,
|
||||
image_mask = None,
|
||||
denoising_strength = 0,
|
||||
callback=None,
|
||||
pipeline=None,
|
||||
loras_slists=None,
|
||||
@ -683,6 +685,7 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
device = "cuda"
|
||||
|
||||
prompt_image = None
|
||||
image_mask_latents = None
|
||||
if image is not None and not (isinstance(image, torch.Tensor) and image.size(1) == self.latent_channels):
|
||||
image = image[0] if isinstance(image, list) else image
|
||||
image_height, image_width = self.image_processor.get_default_height_width(image)
|
||||
@ -694,14 +697,32 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
image_width = image_width // multiple_of * multiple_of
|
||||
image_height = image_height // multiple_of * multiple_of
|
||||
ref_height, ref_width = 1568, 672
|
||||
if height * width < ref_height * ref_width: ref_height , ref_width = height , width
|
||||
if image_height * image_width > ref_height * ref_width:
|
||||
image_height, image_width = calculate_new_dimensions(ref_height, ref_width, image_height, image_width, False, block_size=multiple_of)
|
||||
|
||||
image = image.resize((image_width,image_height), resample=Image.Resampling.LANCZOS)
|
||||
if image_mask is None:
|
||||
if height * width < ref_height * ref_width: ref_height , ref_width = height , width
|
||||
if image_height * image_width > ref_height * ref_width:
|
||||
image_height, image_width = calculate_new_dimensions(ref_height, ref_width, image_height, image_width, False, block_size=multiple_of)
|
||||
if (image_width,image_height) != image.size:
|
||||
image = image.resize((image_width,image_height), resample=Image.Resampling.LANCZOS)
|
||||
else:
|
||||
# _, image_width, image_height = min(
|
||||
# (abs(aspect_ratio - w / h), w, h) for w, h in PREFERRED_QWENIMAGE_RESOLUTIONS
|
||||
# )
|
||||
image_height, image_width = calculate_new_dimensions(height, width, image_height, image_width, False, block_size=multiple_of)
|
||||
# image_height, image_width = calculate_new_dimensions(ref_height, ref_width, image_height, image_width, False, block_size=multiple_of)
|
||||
height, width = image_height, image_width
|
||||
image_mask_latents = convert_image_to_tensor(image_mask.resize((width // 16, height // 16), resample=Image.Resampling.LANCZOS))
|
||||
image_mask_latents = torch.where(image_mask_latents>-0.5, 1., 0. )[0:1]
|
||||
image_mask_rebuilt = image_mask_latents.repeat_interleave(16, dim=-1).repeat_interleave(16, dim=-2).unsqueeze(0)
|
||||
# convert_tensor_to_image( image_mask_rebuilt.squeeze(0).repeat(3,1,1)).save("mmm.png")
|
||||
image_mask_latents = image_mask_latents.reshape(1, -1, 1).to(device)
|
||||
|
||||
prompt_image = image
|
||||
image = self.image_processor.preprocess(image, image_height, image_width)
|
||||
image = image.unsqueeze(2)
|
||||
if image.size != (image_width, image_height):
|
||||
image = image.resize((image_width, image_height), resample=Image.Resampling.LANCZOS)
|
||||
|
||||
# image.save("nnn.png")
|
||||
image = convert_image_to_tensor(image).unsqueeze(0).unsqueeze(2)
|
||||
|
||||
has_neg_prompt = negative_prompt is not None or (
|
||||
negative_prompt_embeds is not None and negative_prompt_embeds_mask is not None
|
||||
@ -744,6 +765,8 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
generator,
|
||||
latents,
|
||||
)
|
||||
original_image_latents = None if image_latents is None else image_latents.clone()
|
||||
|
||||
if image is not None:
|
||||
img_shapes = [
|
||||
[
|
||||
@ -788,6 +811,18 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
negative_txt_seq_lens = (
|
||||
negative_prompt_embeds_mask.sum(dim=1).tolist() if negative_prompt_embeds_mask is not None else None
|
||||
)
|
||||
morph, first_step = False, 0
|
||||
if image_mask_latents is not None:
|
||||
randn = torch.randn_like(original_image_latents)
|
||||
if denoising_strength < 1.:
|
||||
first_step = int(len(timesteps) * (1. - denoising_strength))
|
||||
if not morph:
|
||||
latent_noise_factor = timesteps[first_step]/1000
|
||||
# latents = original_image_latents * (1.0 - latent_noise_factor) + torch.randn_like(original_image_latents) * latent_noise_factor
|
||||
latents = original_image_latents * (1.0 - latent_noise_factor) + randn * latent_noise_factor
|
||||
timesteps = timesteps[first_step:]
|
||||
self.scheduler.timesteps = timesteps
|
||||
self.scheduler.sigmas= self.scheduler.sigmas[first_step:]
|
||||
|
||||
# 6. Denoising loop
|
||||
self.scheduler.set_begin_index(0)
|
||||
@ -797,10 +832,16 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
update_loras_slists(self.transformer, loras_slists, updated_num_steps)
|
||||
callback(-1, None, True, override_num_inference_steps = updated_num_steps)
|
||||
|
||||
|
||||
for i, t in enumerate(timesteps):
|
||||
offload.set_step_no_for_lora(self.transformer, first_step + i)
|
||||
if self.interrupt:
|
||||
continue
|
||||
|
||||
if image_mask_latents is not None and denoising_strength <1. and i == first_step and morph:
|
||||
latent_noise_factor = t/1000
|
||||
latents = original_image_latents * (1.0 - latent_noise_factor) + latents * latent_noise_factor
|
||||
|
||||
self._current_timestep = t
|
||||
# broadcast to batch dimension in a way that's compatible with ONNX/Core ML
|
||||
timestep = t.expand(latents.shape[0]).to(latents.dtype)
|
||||
@ -865,6 +906,13 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
# compute the previous noisy sample x_t -> x_t-1
|
||||
latents_dtype = latents.dtype
|
||||
latents = self.scheduler.step(noise_pred, t, latents, return_dict=False)[0]
|
||||
if image_mask_latents is not None:
|
||||
next_t = timesteps[i+1] if i<len(timesteps)-1 else 0
|
||||
latent_noise_factor = next_t / 1000
|
||||
# noisy_image = original_image_latents * (1.0 - latent_noise_factor) + torch.randn_like(original_image_latents) * latent_noise_factor
|
||||
noisy_image = original_image_latents * (1.0 - latent_noise_factor) + randn * latent_noise_factor
|
||||
latents = noisy_image * (1-image_mask_latents) + image_mask_latents * latents
|
||||
noisy_image = None
|
||||
|
||||
if latents.dtype != latents_dtype:
|
||||
if torch.backends.mps.is_available():
|
||||
@ -878,7 +926,7 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
|
||||
self._current_timestep = None
|
||||
if output_type == "latent":
|
||||
image = latents
|
||||
output_image = latents
|
||||
else:
|
||||
latents = self._unpack_latents(latents, height, width, self.vae_scale_factor)
|
||||
latents = latents.to(self.vae.dtype)
|
||||
@ -891,7 +939,9 @@ class QwenImagePipeline(): #DiffusionPipeline
|
||||
latents.device, latents.dtype
|
||||
)
|
||||
latents = latents / latents_std + latents_mean
|
||||
image = self.vae.decode(latents, return_dict=False)[0][:, :, 0]
|
||||
output_image = self.vae.decode(latents, return_dict=False)[0][:, :, 0]
|
||||
if image_mask is not None:
|
||||
output_image = image.squeeze(2) * (1 - image_mask_rebuilt) + output_image.to(image) * image_mask_rebuilt
|
||||
|
||||
|
||||
return image
|
||||
return output_image
|
||||
|
||||
@ -9,7 +9,7 @@ def get_qwen_text_encoder_filename(text_encoder_quantization):
|
||||
class family_handler():
|
||||
@staticmethod
|
||||
def query_model_def(base_model_type, model_def):
|
||||
model_def_output = {
|
||||
extra_model_def = {
|
||||
"image_outputs" : True,
|
||||
"sample_solvers":[
|
||||
("Default", "default"),
|
||||
@ -18,8 +18,19 @@ class family_handler():
|
||||
"lock_image_refs_ratios": True,
|
||||
}
|
||||
|
||||
if base_model_type in ["qwen_image_edit_20B"]:
|
||||
extra_model_def["inpaint_support"] = True
|
||||
extra_model_def["image_ref_choices"] = {
|
||||
"choices": [
|
||||
("None", ""),
|
||||
("Conditional Images is first Main Subject / Landscape and may be followed by People / Objects", "KI"),
|
||||
("Conditional Images are People / Objects", "I"),
|
||||
],
|
||||
"letters_filter": "KI",
|
||||
}
|
||||
extra_model_def["background_removal_label"]= "Remove Backgrounds only behind People / Objects except main Subject / Landscape"
|
||||
|
||||
return model_def_output
|
||||
return extra_model_def
|
||||
|
||||
@staticmethod
|
||||
def query_supported_types():
|
||||
@ -75,14 +86,18 @@ class family_handler():
|
||||
if ui_defaults.get("sample_solver", "") == "":
|
||||
ui_defaults["sample_solver"] = "default"
|
||||
|
||||
if settings_version < 2.32:
|
||||
ui_defaults["denoising_strength"] = 1.
|
||||
|
||||
@staticmethod
|
||||
def update_default_settings(base_model_type, model_def, ui_defaults):
|
||||
ui_defaults.update({
|
||||
"guidance_scale": 4,
|
||||
"sample_solver": "default",
|
||||
})
|
||||
if model_def.get("reference_image", False):
|
||||
if base_model_type in ["qwen_image_edit_20B"]:
|
||||
ui_defaults.update({
|
||||
"video_prompt_type": "KI",
|
||||
"denoising_strength" : 1.,
|
||||
})
|
||||
|
||||
|
||||
@ -103,6 +103,8 @@ class model_factory():
|
||||
n_prompt = None,
|
||||
sampling_steps: int = 20,
|
||||
input_ref_images = None,
|
||||
image_guide= None,
|
||||
image_mask= None,
|
||||
width= 832,
|
||||
height=480,
|
||||
guide_scale: float = 4,
|
||||
@ -114,6 +116,7 @@ class model_factory():
|
||||
VAE_tile_size = None,
|
||||
joint_pass = True,
|
||||
sample_solver='default',
|
||||
denoising_strength = 1.,
|
||||
**bbargs
|
||||
):
|
||||
# Generate with different aspect ratios
|
||||
@ -174,8 +177,9 @@ class model_factory():
|
||||
|
||||
if n_prompt is None or len(n_prompt) == 0:
|
||||
n_prompt= "text, watermark, copyright, blurry, low resolution"
|
||||
|
||||
if input_ref_images is not None:
|
||||
if image_guide is not None:
|
||||
input_ref_images = [image_guide]
|
||||
elif input_ref_images is not None:
|
||||
# image stiching method
|
||||
stiched = input_ref_images[0]
|
||||
if "K" in video_prompt_type :
|
||||
@ -190,6 +194,7 @@ class model_factory():
|
||||
prompt=input_prompt,
|
||||
negative_prompt=n_prompt,
|
||||
image = input_ref_images,
|
||||
image_mask = image_mask,
|
||||
width=width,
|
||||
height=height,
|
||||
num_inference_steps=sampling_steps,
|
||||
@ -199,6 +204,7 @@ class model_factory():
|
||||
pipeline=self,
|
||||
loras_slists=loras_slists,
|
||||
joint_pass = joint_pass,
|
||||
denoising_strength=denoising_strength,
|
||||
generator=torch.Generator(device="cuda").manual_seed(seed)
|
||||
)
|
||||
if image is None: return None
|
||||
|
||||
@ -261,7 +261,7 @@ class WanAny2V:
|
||||
def vace_latent(self, z, m):
|
||||
return [torch.cat([zz, mm], dim=0) for zz, mm in zip(z, m)]
|
||||
|
||||
def fit_image_into_canvas(self, ref_img, image_size, canvas_tf_bg, device, fill_max = False, outpainting_dims = None, return_mask = False):
|
||||
def fit_image_into_canvas(self, ref_img, image_size, canvas_tf_bg, device, full_frame = False, outpainting_dims = None, return_mask = False):
|
||||
from shared.utils.utils import save_image
|
||||
ref_width, ref_height = ref_img.size
|
||||
if (ref_height, ref_width) == image_size and outpainting_dims == None:
|
||||
@ -270,18 +270,23 @@ class WanAny2V:
|
||||
else:
|
||||
if outpainting_dims != None:
|
||||
final_height, final_width = image_size
|
||||
canvas_height, canvas_width, margin_top, margin_left = get_outpainting_frame_location(final_height, final_width, outpainting_dims, 8)
|
||||
canvas_height, canvas_width, margin_top, margin_left = get_outpainting_frame_location(final_height, final_width, outpainting_dims, 1)
|
||||
else:
|
||||
canvas_height, canvas_width = image_size
|
||||
scale = min(canvas_height / ref_height, canvas_width / ref_width)
|
||||
new_height = int(ref_height * scale)
|
||||
new_width = int(ref_width * scale)
|
||||
if fill_max and (canvas_height - new_height) < 16:
|
||||
if full_frame:
|
||||
new_height = canvas_height
|
||||
if fill_max and (canvas_width - new_width) < 16:
|
||||
new_width = canvas_width
|
||||
top = (canvas_height - new_height) // 2
|
||||
left = (canvas_width - new_width) // 2
|
||||
top = left = 0
|
||||
else:
|
||||
# if fill_max and (canvas_height - new_height) < 16:
|
||||
# new_height = canvas_height
|
||||
# if fill_max and (canvas_width - new_width) < 16:
|
||||
# new_width = canvas_width
|
||||
scale = min(canvas_height / ref_height, canvas_width / ref_width)
|
||||
new_height = int(ref_height * scale)
|
||||
new_width = int(ref_width * scale)
|
||||
top = (canvas_height - new_height) // 2
|
||||
left = (canvas_width - new_width) // 2
|
||||
ref_img = ref_img.resize((new_width, new_height), resample=Image.Resampling.LANCZOS)
|
||||
ref_img = TF.to_tensor(ref_img).sub_(0.5).div_(0.5).unsqueeze(1)
|
||||
if outpainting_dims != None:
|
||||
@ -302,7 +307,7 @@ class WanAny2V:
|
||||
canvas = canvas.to(device)
|
||||
return ref_img.to(device), canvas
|
||||
|
||||
def prepare_source(self, src_video, src_mask, src_ref_images, total_frames, image_size, device, keep_video_guide_frames= [], start_frame = 0, fit_into_canvas = None, pre_src_video = None, inject_frames = [], outpainting_dims = None, any_background_ref = False):
|
||||
def prepare_source(self, src_video, src_mask, src_ref_images, total_frames, image_size, device, keep_video_guide_frames= [], start_frame = 0, pre_src_video = None, inject_frames = [], outpainting_dims = None, any_background_ref = False):
|
||||
image_sizes = []
|
||||
trim_video_guide = len(keep_video_guide_frames)
|
||||
def conv_tensor(t, device):
|
||||
@ -533,22 +538,16 @@ class WanAny2V:
|
||||
any_end_frame = False
|
||||
if image_start is None:
|
||||
if infinitetalk:
|
||||
new_shot = "Q" in video_prompt_type
|
||||
if input_frames is not None:
|
||||
image_ref = input_frames[:, 0]
|
||||
if input_video is None: input_video = input_frames[:, 0:1]
|
||||
new_shot = "Q" in video_prompt_type
|
||||
else:
|
||||
if pre_video_frame is None:
|
||||
new_shot = True
|
||||
else:
|
||||
if input_ref_images is None:
|
||||
input_ref_images, new_shot = [pre_video_frame], False
|
||||
else:
|
||||
input_ref_images, new_shot = [img.resize(pre_video_frame.size, resample=Image.Resampling.LANCZOS) for img in input_ref_images], "Q" in video_prompt_type
|
||||
if input_ref_images is None: raise Exception("Missing Reference Image")
|
||||
if input_ref_images is None:
|
||||
if pre_video_frame is None: raise Exception("Missing Reference Image")
|
||||
input_ref_images, new_shot = [pre_video_frame], False
|
||||
new_shot = new_shot and window_no <= len(input_ref_images)
|
||||
image_ref = convert_image_to_tensor(input_ref_images[ min(window_no, len(input_ref_images))-1 ])
|
||||
if new_shot:
|
||||
if new_shot or input_video is None:
|
||||
input_video = image_ref.unsqueeze(1)
|
||||
else:
|
||||
color_correction_strength = 0 #disable color correction as transition frames between shots may have a complete different color level than the colors of the new shot
|
||||
@ -847,7 +846,7 @@ class WanAny2V:
|
||||
for i, t in enumerate(tqdm(timesteps)):
|
||||
guide_scale, guidance_switch_done, trans, denoising_extra = update_guidance(i, t, guide_scale, guide2_scale, guidance_switch_done, switch_threshold, trans, 2, denoising_extra)
|
||||
guide_scale, guidance_switch2_done, trans, denoising_extra = update_guidance(i, t, guide_scale, guide3_scale, guidance_switch2_done, switch2_threshold, trans, 3, denoising_extra)
|
||||
offload.set_step_no_for_lora(trans, i)
|
||||
offload.set_step_no_for_lora(trans, start_step_no + i)
|
||||
timestep = torch.stack([t])
|
||||
|
||||
if timestep_injection:
|
||||
|
||||
@ -35,7 +35,7 @@ class family_handler():
|
||||
"label" : "Generation Type"
|
||||
}
|
||||
|
||||
extra_model_def["image_prompt_types_allowed"] = "TSEV"
|
||||
extra_model_def["image_prompt_types_allowed"] = "TSV"
|
||||
|
||||
|
||||
return extra_model_def
|
||||
@ -66,7 +66,11 @@ class family_handler():
|
||||
def query_family_infos():
|
||||
return {}
|
||||
|
||||
|
||||
@staticmethod
|
||||
def get_rgb_factors(base_model_type ):
|
||||
from shared.RGB_factors import get_rgb_factors
|
||||
latent_rgb_factors, latent_rgb_factors_bias = get_rgb_factors("wan", base_model_type)
|
||||
return latent_rgb_factors, latent_rgb_factors_bias
|
||||
|
||||
@staticmethod
|
||||
def query_model_files(computeList, base_model_type, model_filename, text_encoder_quantization):
|
||||
|
||||
@ -110,18 +110,79 @@ class family_handler():
|
||||
"tea_cache" : not (base_model_type in ["i2v_2_2", "ti2v_2_2" ] or multiple_submodels),
|
||||
"mag_cache" : True,
|
||||
"keep_frames_video_guide_not_supported": base_model_type in ["infinitetalk"],
|
||||
"convert_image_guide_to_video" : True,
|
||||
"sample_solvers":[
|
||||
("unipc", "unipc"),
|
||||
("euler", "euler"),
|
||||
("dpm++", "dpm++"),
|
||||
("flowmatch causvid", "causvid"), ]
|
||||
})
|
||||
|
||||
|
||||
if base_model_type in ["t2v"]:
|
||||
extra_model_def["guide_custom_choices"] = {
|
||||
"choices":[("Use Text Prompt Only", ""),("Video to Video guided by Text Prompt", "GUV")],
|
||||
"default": "",
|
||||
"letters_filter": "GUV",
|
||||
"label": "Video to Video"
|
||||
}
|
||||
|
||||
if base_model_type in ["infinitetalk"]:
|
||||
extra_model_def["no_background_removal"] = True
|
||||
# extra_model_def["at_least_one_image_ref_needed"] = True
|
||||
extra_model_def["all_image_refs_are_background_ref"] = True
|
||||
extra_model_def["guide_custom_choices"] = {
|
||||
"choices":[
|
||||
("Images to Video, each Reference Image will start a new shot with a new Sliding Window - Sharp Transitions", "QKI"),
|
||||
("Images to Video, each Reference Image will start a new shot with a new Sliding Window - Smooth Transitions", "KI"),
|
||||
("Sparse Video to Video, one Image will by extracted from Video for each new Sliding Window - Sharp Transitions", "QRUV"),
|
||||
("Sparse Video to Video, one Image will by extracted from Video for each new Sliding Window - Smooth Transitions", "RUV"),
|
||||
("Video to Video, amount of motion transferred depends on Denoising Strength - Sharp Transitions", "GQUV"),
|
||||
("Video to Video, amount of motion transferred depends on Denoising Strength - Smooth Transitions", "GUV"),
|
||||
],
|
||||
"default": "KI",
|
||||
"letters_filter": "RGUVQKI",
|
||||
"label": "Video to Video",
|
||||
"show_label" : False,
|
||||
}
|
||||
|
||||
# extra_model_def["at_least_one_image_ref_needed"] = True
|
||||
if vace_class:
|
||||
extra_model_def["guide_preprocessing"] = {
|
||||
"selection": ["", "UV", "PV", "DV", "SV", "LV", "CV", "MV", "V", "PDV", "PSV", "PLV" , "DSV", "DLV", "SLV"],
|
||||
"labels" : { "V": "Use Vace raw format"}
|
||||
}
|
||||
extra_model_def["mask_preprocessing"] = {
|
||||
"selection": ["", "A", "NA", "XA", "XNA", "YA", "YNA", "WA", "WNA", "ZA", "ZNA"],
|
||||
}
|
||||
|
||||
extra_model_def["image_ref_choices"] = {
|
||||
"choices": [("None", ""),
|
||||
("Inject only People / Objects", "I"),
|
||||
("Inject Landscape and then People / Objects", "KI"),
|
||||
("Inject Frames and then People / Objects", "FI"),
|
||||
],
|
||||
"letters_filter": "KFI",
|
||||
}
|
||||
|
||||
if base_model_type in ["standin"] or vace_class:
|
||||
extra_model_def["lock_image_refs_ratios"] = True
|
||||
extra_model_def["background_removal_label"]= "Remove Backgrounds behind People / Objects, keep it for Landscape or positioned Frames"
|
||||
|
||||
if base_model_type in ["standin"]:
|
||||
extra_model_def["lock_image_refs_ratios"] = True
|
||||
extra_model_def["image_ref_choices"] = {
|
||||
"choices": [
|
||||
("No Reference Image", ""),
|
||||
("Reference Image is a Person Face", "I"),
|
||||
],
|
||||
"letters_filter":"I",
|
||||
}
|
||||
|
||||
if base_model_type in ["phantom_1.3B", "phantom_14B"]:
|
||||
extra_model_def["image_ref_choices"] = {
|
||||
"choices": [("Reference Image", "I")],
|
||||
"letters_filter":"I",
|
||||
"visible": False,
|
||||
}
|
||||
|
||||
if base_model_type in ["recam_1.3B"]:
|
||||
extra_model_def["keep_frames_video_guide_not_supported"] = True
|
||||
@ -141,6 +202,12 @@ class family_handler():
|
||||
"default": 1,
|
||||
"label" : "Camera Movement Type"
|
||||
}
|
||||
extra_model_def["guide_preprocessing"] = {
|
||||
"selection": ["UV"],
|
||||
"labels" : { "UV": "Control Video"},
|
||||
"visible" : False,
|
||||
}
|
||||
|
||||
if vace_class or base_model_type in ["infinitetalk"]:
|
||||
image_prompt_types_allowed = "TVL"
|
||||
elif base_model_type in ["ti2v_2_2"]:
|
||||
|
||||
@ -7,7 +7,6 @@ import psutil
|
||||
# import ffmpeg
|
||||
import imageio
|
||||
from PIL import Image
|
||||
|
||||
import cv2
|
||||
import torch
|
||||
import torch.nn.functional as F
|
||||
@ -33,6 +32,8 @@ model_in_GPU = False
|
||||
matanyone_in_GPU = False
|
||||
bfloat16_supported = False
|
||||
# SAM generator
|
||||
import copy
|
||||
|
||||
class MaskGenerator():
|
||||
def __init__(self, sam_checkpoint, device):
|
||||
global args_device
|
||||
@ -89,6 +90,7 @@ def get_frames_from_image(image_input, image_state):
|
||||
"last_frame_numer": 0,
|
||||
"fps": None
|
||||
}
|
||||
|
||||
image_info = "Image Name: N/A,\nFPS: N/A,\nTotal Frames: {},\nImage Size:{}".format(len(frames), image_size)
|
||||
set_image_encoder_patch()
|
||||
select_SAM()
|
||||
@ -717,27 +719,33 @@ def load_unload_models(selected):
|
||||
def get_vmc_event_handler():
|
||||
return load_unload_models
|
||||
|
||||
def export_to_vace_video_input(foreground_video_output):
|
||||
gr.Info("Masked Video Input transferred to Vace For Inpainting")
|
||||
return "V#" + str(time.time()), foreground_video_output
|
||||
|
||||
|
||||
def export_image(image_refs, image_output):
|
||||
gr.Info("Masked Image transferred to Current Video")
|
||||
def export_image(state, image_output):
|
||||
ui_settings = get_current_model_settings(state)
|
||||
image_refs = ui_settings["image_refs"]
|
||||
if image_refs == None:
|
||||
image_refs =[]
|
||||
image_refs.append( image_output)
|
||||
return image_refs
|
||||
ui_settings["image_refs"] = image_refs
|
||||
gr.Info("Masked Image transferred to Current Image Generator")
|
||||
return time.time()
|
||||
|
||||
def export_image_mask(image_input, image_mask):
|
||||
gr.Info("Input Image & Mask transferred to Current Video")
|
||||
return Image.fromarray(image_input), image_mask
|
||||
def export_image_mask(state, image_input, image_mask):
|
||||
ui_settings = get_current_model_settings(state)
|
||||
ui_settings["image_guide"] = Image.fromarray(image_input)
|
||||
ui_settings["image_mask"] = image_mask
|
||||
|
||||
gr.Info("Input Image & Mask transferred to Current Image Generator")
|
||||
return time.time()
|
||||
|
||||
|
||||
def export_to_current_video_engine( foreground_video_output, alpha_video_output):
|
||||
def export_to_current_video_engine(state, foreground_video_output, alpha_video_output):
|
||||
ui_settings = get_current_model_settings(state)
|
||||
ui_settings["video_guide"] = foreground_video_output
|
||||
ui_settings["video_mask"] = alpha_video_output
|
||||
|
||||
gr.Info("Original Video and Full Mask have been transferred")
|
||||
# return "MV#" + str(time.time()), foreground_video_output, alpha_video_output
|
||||
return foreground_video_output, alpha_video_output
|
||||
return time.time()
|
||||
|
||||
|
||||
def teleport_to_video_tab(tab_state):
|
||||
@ -746,15 +754,29 @@ def teleport_to_video_tab(tab_state):
|
||||
return gr.Tabs(selected="video_gen")
|
||||
|
||||
|
||||
def display(tabs, tab_state, server_config, vace_video_input, vace_image_input, vace_video_mask, vace_image_mask, vace_image_refs):
|
||||
def display(tabs, tab_state, state, refresh_form_trigger, server_config, get_current_model_settings_fn): #, vace_video_input, vace_image_input, vace_video_mask, vace_image_mask, vace_image_refs):
|
||||
# my_tab.select(fn=load_unload_models, inputs=[], outputs=[])
|
||||
global image_output_codec, video_output_codec
|
||||
global image_output_codec, video_output_codec, get_current_model_settings
|
||||
get_current_model_settings = get_current_model_settings_fn
|
||||
|
||||
image_output_codec = server_config.get("image_output_codec", None)
|
||||
video_output_codec = server_config.get("video_output_codec", None)
|
||||
|
||||
media_url = "https://github.com/pq-yang/MatAnyone/releases/download/media/"
|
||||
|
||||
click_brush_js = """
|
||||
() => {
|
||||
setTimeout(() => {
|
||||
const brushButton = document.querySelector('button[aria-label="Brush"]');
|
||||
if (brushButton) {
|
||||
brushButton.click();
|
||||
console.log('Brush button clicked');
|
||||
} else {
|
||||
console.log('Brush button not found');
|
||||
}
|
||||
}, 1000);
|
||||
} """
|
||||
|
||||
# download assets
|
||||
|
||||
gr.Markdown("<B>Mast Edition is provided by MatAnyone and VRAM optimized by DeepBeepMeep</B>")
|
||||
@ -871,7 +893,7 @@ def display(tabs, tab_state, server_config, vace_video_input, vace_image_input,
|
||||
template_frame = gr.Image(label="Start Frame", type="pil",interactive=True, elem_id="template_frame", visible=False, elem_classes="image")
|
||||
with gr.Row():
|
||||
clear_button_click = gr.Button(value="Clear Clicks", interactive=True, visible=False, min_width=100)
|
||||
add_mask_button = gr.Button(value="Set Mask", interactive=True, visible=False, min_width=100)
|
||||
add_mask_button = gr.Button(value="Add Mask", interactive=True, visible=False, min_width=100)
|
||||
remove_mask_button = gr.Button(value="Remove Mask", interactive=True, visible=False, min_width=100) # no use
|
||||
matting_button = gr.Button(value="Generate Video Matting", interactive=True, visible=False, min_width=100)
|
||||
with gr.Row():
|
||||
@ -892,7 +914,7 @@ def display(tabs, tab_state, server_config, vace_video_input, vace_image_input,
|
||||
with gr.Row(visible= True):
|
||||
export_to_current_video_engine_btn = gr.Button("Export to Control Video Input and Video Mask Input", visible= False)
|
||||
|
||||
export_to_current_video_engine_btn.click( fn=export_to_current_video_engine, inputs= [foreground_video_output, alpha_video_output], outputs= [vace_video_input, vace_video_mask]).then( #video_prompt_video_guide_trigger,
|
||||
export_to_current_video_engine_btn.click( fn=export_to_current_video_engine, inputs= [state, foreground_video_output, alpha_video_output], outputs= [refresh_form_trigger]).then( #video_prompt_video_guide_trigger,
|
||||
fn=teleport_to_video_tab, inputs= [tab_state], outputs= [tabs])
|
||||
|
||||
|
||||
@ -1089,10 +1111,10 @@ def display(tabs, tab_state, server_config, vace_video_input, vace_image_input,
|
||||
# with gr.Column(scale=2, visible= True):
|
||||
export_image_mask_btn = gr.Button(value="Set to Control Image & Mask", visible=False, elem_classes="new_button")
|
||||
|
||||
export_image_btn.click( fn=export_image, inputs= [vace_image_refs, foreground_image_output], outputs= [vace_image_refs]).then( #video_prompt_video_guide_trigger,
|
||||
fn=teleport_to_video_tab, inputs= [tab_state], outputs= [tabs])
|
||||
export_image_mask_btn.click( fn=export_image_mask, inputs= [image_input, alpha_image_output], outputs= [vace_image_input, vace_image_mask]).then( #video_prompt_video_guide_trigger,
|
||||
export_image_btn.click( fn=export_image, inputs= [state, foreground_image_output], outputs= [refresh_form_trigger]).then( #video_prompt_video_guide_trigger,
|
||||
fn=teleport_to_video_tab, inputs= [tab_state], outputs= [tabs])
|
||||
export_image_mask_btn.click( fn=export_image_mask, inputs= [state, image_input, alpha_image_output], outputs= [refresh_form_trigger]).then( #video_prompt_video_guide_trigger,
|
||||
fn=teleport_to_video_tab, inputs= [tab_state], outputs= [tabs]).then(fn=None, inputs=None, outputs=None, js=click_brush_js)
|
||||
|
||||
# first step: get the image information
|
||||
extract_frames_button.click(
|
||||
@ -1148,5 +1170,21 @@ def display(tabs, tab_state, server_config, vace_video_input, vace_image_input,
|
||||
outputs=[foreground_image_output, alpha_image_output,foreground_image_output, alpha_image_output,bbox_info, export_image_btn, export_image_mask_btn]
|
||||
)
|
||||
|
||||
|
||||
nada = gr.State({})
|
||||
# clear input
|
||||
gr.on(
|
||||
triggers=[image_input.clear], #image_input.change,
|
||||
fn=restart,
|
||||
inputs=[],
|
||||
outputs=[
|
||||
image_state,
|
||||
interactive_state,
|
||||
click_state,
|
||||
foreground_image_output, alpha_image_output,
|
||||
template_frame,
|
||||
image_selection_slider, image_selection_slider, track_pause_number_slider,point_prompt, export_image_btn, export_image_mask_btn, bbox_info, clear_button_click,
|
||||
add_mask_button, matting_button, template_frame, foreground_image_output, alpha_image_output, remove_mask_button, export_image_btn, export_image_mask_btn, mask_dropdown, nada, step2_title
|
||||
],
|
||||
queue=False,
|
||||
show_progress=False)
|
||||
|
||||
|
||||
@ -2,7 +2,6 @@ import math
|
||||
import torch
|
||||
from typing import Optional, Union, Tuple
|
||||
|
||||
|
||||
# @torch.jit.script
|
||||
def get_similarity(mk: torch.Tensor,
|
||||
ms: torch.Tensor,
|
||||
@ -59,6 +58,7 @@ def get_similarity(mk: torch.Tensor,
|
||||
del two_ab
|
||||
# similarity = (-a_sq + two_ab)
|
||||
|
||||
similarity =similarity.float()
|
||||
if ms is not None:
|
||||
similarity *= ms
|
||||
similarity /= math.sqrt(CK)
|
||||
|
||||
@ -73,5 +73,5 @@ def matanyone(processor, frames_np, mask, r_erode=0, r_dilate=0, n_warmup=10):
|
||||
if ti > (n_warmup-1):
|
||||
frames.append((com_np*255).astype(np.uint8))
|
||||
phas.append((pha*255).astype(np.uint8))
|
||||
|
||||
# phas.append(np.clip(pha * 255, 0, 255).astype(np.uint8))
|
||||
return frames, phas
|
||||
@ -23,7 +23,7 @@ librosa==0.11.0
|
||||
speechbrain==1.0.3
|
||||
|
||||
# UI & interaction
|
||||
gradio==5.23.0
|
||||
gradio==5.29.0
|
||||
dashscope
|
||||
loguru
|
||||
|
||||
|
||||
@ -4,6 +4,7 @@ from typing import Any, Dict, List, Optional, Sequence, Tuple, Union, Literal
|
||||
|
||||
import gradio as gr
|
||||
import PIL
|
||||
import time
|
||||
from PIL import Image as PILImage
|
||||
|
||||
FilePath = str
|
||||
@ -20,6 +21,9 @@ def get_list( objs):
|
||||
return []
|
||||
return [ obj[0] if isinstance(obj, tuple) else obj for obj in objs]
|
||||
|
||||
def record_last_action(st, last_action):
|
||||
st["last_action"] = last_action
|
||||
st["last_time"] = time.time()
|
||||
class AdvancedMediaGallery:
|
||||
def __init__(
|
||||
self,
|
||||
@ -60,9 +64,10 @@ class AdvancedMediaGallery:
|
||||
self.state: Optional[gr.State] = None
|
||||
self._initial_state: Dict[str, Any] = {
|
||||
"items": items,
|
||||
"selected": (len(items) - 1) if items else None,
|
||||
"selected": (len(items) - 1) if items else 0, # None,
|
||||
"single": bool(single_image_mode),
|
||||
"mode": self.media_mode,
|
||||
"last_action": "",
|
||||
}
|
||||
|
||||
# ---------------- helpers ----------------
|
||||
@ -210,6 +215,13 @@ class AdvancedMediaGallery:
|
||||
|
||||
def _on_select(self, state: Dict[str, Any], gallery, evt: gr.SelectData) :
|
||||
# Mirror the selected index into state and the gallery (server-side selected_index)
|
||||
|
||||
st = get_state(state)
|
||||
last_time = st.get("last_time", None)
|
||||
if last_time is not None and abs(time.time()- last_time)< 0.5: # crappy trick to detect if onselect is unwanted (buggy gallery)
|
||||
# print(f"ignored:{time.time()}, real {st['selected']}")
|
||||
return gr.update(selected_index=st["selected"]), st
|
||||
|
||||
idx = None
|
||||
if evt is not None and hasattr(evt, "index"):
|
||||
ix = evt.index
|
||||
@ -220,17 +232,28 @@ class AdvancedMediaGallery:
|
||||
idx = ix[0] * max(1, int(self.columns)) + ix[1]
|
||||
else:
|
||||
idx = ix[0]
|
||||
st = get_state(state)
|
||||
n = len(get_list(gallery))
|
||||
sel = idx if (idx is not None and 0 <= idx < n) else None
|
||||
# print(f"image selected evt index:{sel}/{evt.selected}")
|
||||
st["selected"] = sel
|
||||
# return gr.update(selected_index=sel), st
|
||||
# return gr.update(), st
|
||||
return st
|
||||
return gr.update(), st
|
||||
|
||||
def _on_upload(self, value: List[Any], state: Dict[str, Any]) :
|
||||
# Fires when users upload via the Gallery itself.
|
||||
# items_filtered = self._filter_items_by_mode(list(value or []))
|
||||
items_filtered = list(value or [])
|
||||
st = get_state(state)
|
||||
new_items = self._paths_from_payload(items_filtered)
|
||||
st["items"] = new_items
|
||||
new_sel = len(new_items) - 1
|
||||
st["selected"] = new_sel
|
||||
record_last_action(st,"add")
|
||||
return gr.update(selected_index=new_sel), st
|
||||
|
||||
def _on_gallery_change(self, value: List[Any], state: Dict[str, Any]) :
|
||||
# Fires when users add/drag/drop/delete via the Gallery itself.
|
||||
items_filtered = self._filter_items_by_mode(list(value or []))
|
||||
# items_filtered = self._filter_items_by_mode(list(value or []))
|
||||
items_filtered = list(value or [])
|
||||
st = get_state(state)
|
||||
st["items"] = items_filtered
|
||||
# Keep selection if still valid, else default to last
|
||||
@ -240,10 +263,9 @@ class AdvancedMediaGallery:
|
||||
else:
|
||||
new_sel = old_sel
|
||||
st["selected"] = new_sel
|
||||
# return gr.update(value=items_filtered, selected_index=new_sel), st
|
||||
# return gr.update(value=items_filtered), st
|
||||
|
||||
return gr.update(), st
|
||||
st["last_action"] ="gallery_change"
|
||||
# print(f"gallery change: set sel {new_sel}")
|
||||
return gr.update(selected_index=new_sel), st
|
||||
|
||||
def _on_add(self, files_payload: Any, state: Dict[str, Any], gallery):
|
||||
"""
|
||||
@ -252,7 +274,8 @@ class AdvancedMediaGallery:
|
||||
and re-selects the last inserted item.
|
||||
"""
|
||||
# New items (respect image/video mode)
|
||||
new_items = self._filter_items_by_mode(self._paths_from_payload(files_payload))
|
||||
# new_items = self._filter_items_by_mode(self._paths_from_payload(files_payload))
|
||||
new_items = self._paths_from_payload(files_payload)
|
||||
|
||||
st = get_state(state)
|
||||
cur: List[Any] = get_list(gallery)
|
||||
@ -298,30 +321,6 @@ class AdvancedMediaGallery:
|
||||
if k is not None:
|
||||
seen_new.add(k)
|
||||
|
||||
# Remove any existing occurrences of the incoming items from current list,
|
||||
# BUT keep the currently selected item even if it's also in incoming.
|
||||
cur_clean: List[Any] = []
|
||||
# sel_item = cur[sel] if (sel is not None and 0 <= sel < len(cur)) else None
|
||||
# for idx, it in enumerate(cur):
|
||||
# k = key_of(it)
|
||||
# if it is sel_item:
|
||||
# cur_clean.append(it)
|
||||
# continue
|
||||
# if k is not None and k in seen_new:
|
||||
# continue # drop duplicate; we'll reinsert at the target spot
|
||||
# cur_clean.append(it)
|
||||
|
||||
# # Compute insertion position: right AFTER the (possibly shifted) selected item
|
||||
# if sel_item is not None:
|
||||
# # find sel_item's new index in cur_clean
|
||||
# try:
|
||||
# pos_sel = cur_clean.index(sel_item)
|
||||
# except ValueError:
|
||||
# # Shouldn't happen, but fall back to end
|
||||
# pos_sel = len(cur_clean) - 1
|
||||
# insert_pos = pos_sel + 1
|
||||
# else:
|
||||
# insert_pos = len(cur_clean) # no selection -> append at end
|
||||
insert_pos = min(sel, len(cur) -1)
|
||||
cur_clean = cur
|
||||
# Build final list and selection
|
||||
@ -330,6 +329,8 @@ class AdvancedMediaGallery:
|
||||
|
||||
st["items"] = merged
|
||||
st["selected"] = new_sel
|
||||
record_last_action(st,"add")
|
||||
# print(f"gallery add: set sel {new_sel}")
|
||||
return gr.update(value=merged, selected_index=new_sel), st
|
||||
|
||||
def _on_remove(self, state: Dict[str, Any], gallery) :
|
||||
@ -342,8 +343,9 @@ class AdvancedMediaGallery:
|
||||
return gr.update(value=[], selected_index=None), st
|
||||
new_sel = min(sel, len(items) - 1)
|
||||
st["items"] = items; st["selected"] = new_sel
|
||||
# return gr.update(value=items, selected_index=new_sel), st
|
||||
return gr.update(value=items), st
|
||||
record_last_action(st,"remove")
|
||||
# print(f"gallery del: new sel {new_sel}")
|
||||
return gr.update(value=items, selected_index=new_sel), st
|
||||
|
||||
def _on_move(self, delta: int, state: Dict[str, Any], gallery) :
|
||||
st = get_state(state); items: List[Any] = get_list(gallery); sel = st.get("selected", None)
|
||||
@ -354,11 +356,15 @@ class AdvancedMediaGallery:
|
||||
return gr.update(value=items, selected_index=sel), st
|
||||
items[sel], items[j] = items[j], items[sel]
|
||||
st["items"] = items; st["selected"] = j
|
||||
record_last_action(st,"move")
|
||||
# print(f"gallery move: set sel {j}")
|
||||
return gr.update(value=items, selected_index=j), st
|
||||
|
||||
def _on_clear(self, state: Dict[str, Any]) :
|
||||
st = {"items": [], "selected": None, "single": get_state(state).get("single", False), "mode": self.media_mode}
|
||||
return gr.update(value=[], selected_index=0), st
|
||||
record_last_action(st,"clear")
|
||||
# print(f"Clear all")
|
||||
return gr.update(value=[], selected_index=None), st
|
||||
|
||||
def _on_toggle_single(self, to_single: bool, state: Dict[str, Any]) :
|
||||
st = get_state(state); st["single"] = bool(to_single)
|
||||
@ -382,30 +388,38 @@ class AdvancedMediaGallery:
|
||||
def mount(self, parent: Optional[gr.Blocks | gr.Group | gr.Row | gr.Column] = None, update_form = False):
|
||||
if parent is not None:
|
||||
with parent:
|
||||
col = self._build_ui()
|
||||
col = self._build_ui(update_form)
|
||||
else:
|
||||
col = self._build_ui()
|
||||
col = self._build_ui(update_form)
|
||||
if not update_form:
|
||||
self._wire_events()
|
||||
return col
|
||||
|
||||
def _build_ui(self) -> gr.Column:
|
||||
def _build_ui(self, update = False) -> gr.Column:
|
||||
with gr.Column(elem_id=self.elem_id, elem_classes=self.elem_classes) as col:
|
||||
self.container = col
|
||||
|
||||
self.state = gr.State(dict(self._initial_state))
|
||||
|
||||
self.gallery = gr.Gallery(
|
||||
label=self.label,
|
||||
value=self._initial_state["items"],
|
||||
height=self.height,
|
||||
columns=self.columns,
|
||||
show_label=self.show_label,
|
||||
preview= True,
|
||||
# type="pil",
|
||||
file_types= list(IMAGE_EXTS) if self.media_mode == "image" else list(VIDEO_EXTS),
|
||||
selected_index=self._initial_state["selected"], # server-side selection
|
||||
)
|
||||
if update:
|
||||
self.gallery = gr.update(
|
||||
value=self._initial_state["items"],
|
||||
selected_index=self._initial_state["selected"], # server-side selection
|
||||
label=self.label,
|
||||
show_label=self.show_label,
|
||||
)
|
||||
else:
|
||||
self.gallery = gr.Gallery(
|
||||
value=self._initial_state["items"],
|
||||
label=self.label,
|
||||
height=self.height,
|
||||
columns=self.columns,
|
||||
show_label=self.show_label,
|
||||
preview= True,
|
||||
# type="pil", # very slow
|
||||
file_types= list(IMAGE_EXTS) if self.media_mode == "image" else list(VIDEO_EXTS),
|
||||
selected_index=self._initial_state["selected"], # server-side selection
|
||||
)
|
||||
|
||||
# One-line controls
|
||||
exts = sorted(IMAGE_EXTS if self.media_mode == "image" else VIDEO_EXTS) if self.accept_filter else None
|
||||
@ -418,10 +432,10 @@ class AdvancedMediaGallery:
|
||||
size="sm",
|
||||
min_width=1,
|
||||
)
|
||||
self.btn_remove = gr.Button("Remove", size="sm", min_width=1)
|
||||
self.btn_remove = gr.Button(" Remove ", size="sm", min_width=1)
|
||||
self.btn_left = gr.Button("◀ Left", size="sm", visible=not self._initial_state["single"], min_width=1)
|
||||
self.btn_right = gr.Button("Right ▶", size="sm", visible=not self._initial_state["single"], min_width=1)
|
||||
self.btn_clear = gr.Button("Clear", variant="secondary", size="sm", visible=not self._initial_state["single"], min_width=1)
|
||||
self.btn_clear = gr.Button(" Clear ", variant="secondary", size="sm", visible=not self._initial_state["single"], min_width=1)
|
||||
|
||||
return col
|
||||
|
||||
@ -430,14 +444,24 @@ class AdvancedMediaGallery:
|
||||
self.gallery.select(
|
||||
self._on_select,
|
||||
inputs=[self.state, self.gallery],
|
||||
outputs=[self.state],
|
||||
outputs=[self.gallery, self.state],
|
||||
trigger_mode="always_last",
|
||||
)
|
||||
|
||||
# Gallery value changed by user actions (click-to-add, drag-drop, internal remove, etc.)
|
||||
self.gallery.change(
|
||||
self.gallery.upload(
|
||||
self._on_upload,
|
||||
inputs=[self.gallery, self.state],
|
||||
outputs=[self.gallery, self.state],
|
||||
trigger_mode="always_last",
|
||||
)
|
||||
|
||||
# Gallery value changed by user actions (click-to-add, drag-drop, internal remove, etc.)
|
||||
self.gallery.upload(
|
||||
self._on_gallery_change,
|
||||
inputs=[self.gallery, self.state],
|
||||
outputs=[self.gallery, self.state],
|
||||
trigger_mode="always_last",
|
||||
)
|
||||
|
||||
# Add via UploadButton
|
||||
@ -445,6 +469,7 @@ class AdvancedMediaGallery:
|
||||
self._on_add,
|
||||
inputs=[self.upload_btn, self.state, self.gallery],
|
||||
outputs=[self.gallery, self.state],
|
||||
trigger_mode="always_last",
|
||||
)
|
||||
|
||||
# Remove selected
|
||||
@ -452,6 +477,7 @@ class AdvancedMediaGallery:
|
||||
self._on_remove,
|
||||
inputs=[self.state, self.gallery],
|
||||
outputs=[self.gallery, self.state],
|
||||
trigger_mode="always_last",
|
||||
)
|
||||
|
||||
# Reorder using selected index, keep same item selected
|
||||
@ -459,11 +485,13 @@ class AdvancedMediaGallery:
|
||||
lambda st, gallery: self._on_move(-1, st, gallery),
|
||||
inputs=[self.state, self.gallery],
|
||||
outputs=[self.gallery, self.state],
|
||||
trigger_mode="always_last",
|
||||
)
|
||||
self.btn_right.click(
|
||||
lambda st, gallery: self._on_move(+1, st, gallery),
|
||||
inputs=[self.state, self.gallery],
|
||||
outputs=[self.gallery, self.state],
|
||||
trigger_mode="always_last",
|
||||
)
|
||||
|
||||
# Clear all
|
||||
@ -471,6 +499,7 @@ class AdvancedMediaGallery:
|
||||
self._on_clear,
|
||||
inputs=[self.state],
|
||||
outputs=[self.gallery, self.state],
|
||||
trigger_mode="always_last",
|
||||
)
|
||||
|
||||
# ---------------- public API ----------------
|
||||
|
||||
@ -19,6 +19,7 @@ import tempfile
|
||||
import subprocess
|
||||
import json
|
||||
from functools import lru_cache
|
||||
os.environ["U2NET_HOME"] = os.path.join(os.getcwd(), "ckpts", "rembg")
|
||||
|
||||
|
||||
from PIL import Image
|
||||
@ -188,6 +189,14 @@ def get_outpainting_full_area_dimensions(frame_height,frame_width, outpainting_d
|
||||
frame_width = int(frame_width * (100 + outpainting_left + outpainting_right) / 100)
|
||||
return frame_height, frame_width
|
||||
|
||||
def rgb_bw_to_rgba_mask(img, thresh=127):
|
||||
a = img.convert('L').point(lambda p: 255 if p > thresh else 0) # alpha
|
||||
out = Image.new('RGBA', img.size, (255, 255, 255, 0)) # white, transparent
|
||||
out.putalpha(a) # white where alpha=255
|
||||
return out
|
||||
|
||||
|
||||
|
||||
def get_outpainting_frame_location(final_height, final_width, outpainting_dims, block_size = 8):
|
||||
outpainting_top, outpainting_bottom, outpainting_left, outpainting_right= outpainting_dims
|
||||
raw_height = int(final_height / ((100 + outpainting_top + outpainting_bottom) / 100))
|
||||
@ -207,30 +216,62 @@ def get_outpainting_frame_location(final_height, final_width, outpainting_dims
|
||||
if (margin_left + width) > final_width or outpainting_right == 0: margin_left = final_width - width
|
||||
return height, width, margin_top, margin_left
|
||||
|
||||
def calculate_new_dimensions(canvas_height, canvas_width, image_height, image_width, fit_into_canvas, block_size = 16):
|
||||
if fit_into_canvas == None:
|
||||
def rescale_and_crop(img, w, h):
|
||||
ow, oh = img.size
|
||||
target_ratio = w / h
|
||||
orig_ratio = ow / oh
|
||||
|
||||
if orig_ratio > target_ratio:
|
||||
# Crop width first
|
||||
nw = int(oh * target_ratio)
|
||||
img = img.crop(((ow - nw) // 2, 0, (ow + nw) // 2, oh))
|
||||
else:
|
||||
# Crop height first
|
||||
nh = int(ow / target_ratio)
|
||||
img = img.crop((0, (oh - nh) // 2, ow, (oh + nh) // 2))
|
||||
|
||||
return img.resize((w, h), Image.LANCZOS)
|
||||
|
||||
def calculate_new_dimensions(canvas_height, canvas_width, image_height, image_width, fit_into_canvas, block_size = 16):
|
||||
if fit_into_canvas == None or fit_into_canvas == 2:
|
||||
# return image_height, image_width
|
||||
return canvas_height, canvas_width
|
||||
if fit_into_canvas:
|
||||
if fit_into_canvas == 1:
|
||||
scale1 = min(canvas_height / image_height, canvas_width / image_width)
|
||||
scale2 = min(canvas_width / image_height, canvas_height / image_width)
|
||||
scale = max(scale1, scale2)
|
||||
else:
|
||||
else: #0 or #2 (crop)
|
||||
scale = (canvas_height * canvas_width / (image_height * image_width))**(1/2)
|
||||
|
||||
new_height = round( image_height * scale / block_size) * block_size
|
||||
new_width = round( image_width * scale / block_size) * block_size
|
||||
return new_height, new_width
|
||||
|
||||
def resize_and_remove_background(img_list, budget_width, budget_height, rm_background, ignore_first, fit_into_canvas = False ):
|
||||
def calculate_dimensions_and_resize_image(image, canvas_height, canvas_width, fit_into_canvas, fit_crop, block_size = 16):
|
||||
if fit_crop:
|
||||
image = rescale_and_crop(image, canvas_width, canvas_height)
|
||||
new_width, new_height = image.size
|
||||
else:
|
||||
image_width, image_height = image.size
|
||||
new_height, new_width = calculate_new_dimensions(canvas_height, canvas_width, image_height, image_width, fit_into_canvas, block_size = block_size )
|
||||
image = image.resize((new_width, new_height), resample=Image.Resampling.LANCZOS)
|
||||
return image, new_height, new_width
|
||||
|
||||
def resize_and_remove_background(img_list, budget_width, budget_height, rm_background, any_background_ref, fit_into_canvas = 0, block_size= 16, outpainting_dims = None ):
|
||||
if rm_background:
|
||||
session = new_session()
|
||||
|
||||
output_list =[]
|
||||
for i, img in enumerate(img_list):
|
||||
width, height = img.size
|
||||
|
||||
if fit_into_canvas:
|
||||
if fit_into_canvas == None or any_background_ref == 1 and i==0 or any_background_ref == 2:
|
||||
if outpainting_dims is not None:
|
||||
resized_image =img
|
||||
elif img.size != (budget_width, budget_height):
|
||||
resized_image= img.resize((budget_width, budget_height), resample=Image.Resampling.LANCZOS)
|
||||
else:
|
||||
resized_image =img
|
||||
elif fit_into_canvas == 1:
|
||||
white_canvas = np.ones((budget_height, budget_width, 3), dtype=np.uint8) * 255
|
||||
scale = min(budget_height / height, budget_width / width)
|
||||
new_height = int(height * scale)
|
||||
@ -242,10 +283,10 @@ def resize_and_remove_background(img_list, budget_width, budget_height, rm_backg
|
||||
resized_image = Image.fromarray(white_canvas)
|
||||
else:
|
||||
scale = (budget_height * budget_width / (height * width))**(1/2)
|
||||
new_height = int( round(height * scale / 16) * 16)
|
||||
new_width = int( round(width * scale / 16) * 16)
|
||||
new_height = int( round(height * scale / block_size) * block_size)
|
||||
new_width = int( round(width * scale / block_size) * block_size)
|
||||
resized_image= img.resize((new_width,new_height), resample=Image.Resampling.LANCZOS)
|
||||
if rm_background and not (ignore_first and i == 0) :
|
||||
if rm_background and not (any_background_ref and i==0 or any_background_ref == 2) :
|
||||
# resized_image = remove(resized_image, session=session, alpha_matting_erode_size = 1,alpha_matting_background_threshold = 70, alpha_foreground_background_threshold = 100, alpha_matting = True, bgcolor=[255, 255, 255, 0]).convert('RGB')
|
||||
resized_image = remove(resized_image, session=session, alpha_matting_erode_size = 1, alpha_matting = True, bgcolor=[255, 255, 255, 0]).convert('RGB')
|
||||
output_list.append(resized_image) #alpha_matting_background_threshold = 30, alpha_foreground_background_threshold = 200,
|
||||
|
||||
Loading…
Reference in New Issue
Block a user