Vace improvements

This commit is contained in:
DeepBeepMeep 2025-05-23 21:51:00 +02:00
parent 6706709230
commit 86725a65d4
8 changed files with 631 additions and 343 deletions

View File

@ -21,6 +21,7 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
## 🔥 Latest News!!
* May 23 2025: 👋 Wan 2.1GP v5.21 : Improvements for Vace: better transitions between Sliding Windows,Support for Image masks in Matanyone, new Extend Video for Vace, different types of automated background removal
* May 20 2025: 👋 Wan 2.1GP v5.2 : Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps.
The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B.
See instructions below on how to use CausVid.\
@ -307,17 +308,20 @@ You can define multiple lines of macros. If there is only one macro line, the ap
### VACE ControlNet introduction
Vace is a ControlNet 1.3B text2video model that allows you to do Video to Video and Reference to Video (inject your own images into the output video). So with Vace you can inject in the scene people or objects of your choice, animate a person, perform inpainting or outpainting, continue a video, ...
Vace is a ControlNet that allows you to do Video to Video and Reference to Video (inject your own images into the output video). It is probably one of the most powerful Wan models and you will be able to do amazing things when you master it: inject in the scene people or objects of your choice, animate a person, perform inpainting or outpainting, continue a video, ...
First you need to select the Vace 1.3B model in the Drop Down box at the top. Please note that Vace works well for the moment only with videos up to 5s (81 frames).
First you need to select the Vace 1.3B model or the Vace 13B model in the Drop Down box at the top. Please note that Vace works well for the moment only with videos up to 7s with the Riflex option turned on.
Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
- a Control Video: Based on your choice, you can decide to transfer the motion, the depth in a new Video. You can tell WanGP to use only the first n frames of Control Video and to extrapolate the rest. You can also do inpainting ). If the video contains area of grey color 127, they will be considered as masks and will be filled based on the Text prompt of the reference Images.
- *a Control Video*\
Based on your choice, you can decide to transfer the motion, the depth in a new Video. You can tell WanGP to use only the first n frames of Control Video and to extrapolate the rest. You can also do inpainting. If the video contains area of grey color 127, they will be considered as masks and will be filled based on the Text prompt of the reference Images.
- reference Images: Use this to inject people or objects of your choice in the video. You can select multiple reference Images. The integration of the image is more efficient if the background is replaced by the full white color. You can do that with your preferred background remover or use the built in background remover by checking the box *Remove background*
- *Reference Images*\
A reference Image can be as well a background that you want to use as a setting for the video or people or objects of your choice that you want to inject in the video. You can select multiple reference Images. The integration of object / person image is more efficient if the background is replaced by the full white color. For complex background removal you can use the Image version of the Matanyone tool that is embedded with WanGP or use you can use the fast on the fly background remover by selecting an option in the drop down box *Remove background*. Becareful not to remove the background of the reference image that is a landscape or setting (always the first reference image) that you want to use as a start image / background for the video. It helps greatly to reference and describe explictly the injected objects / people of the Reference Images in the text prompt.
- *a Video Mask*\
This offers a stronger mechanism to tell Vace which parts should be kept (black) or replaced (white). You can do as well inpainting / outpainting, fill the missing part of a video more efficientlty with just the video hint. For instance, if a video mask is white except at the beginning and at the end where it is black, the first and last frames will be kept and everything in between will be generated.
- a Video Mask
This offers a stronger mechanism to tell Vace which parts should be kept (black) or replaced (white). You can do as well inpainting / outpainting, fill the missing part of a video more efficientlty with just the video hint. If a video mask is white, it will be generated so with black frames at the beginning and at the end and the rest white, you could generate the missing frames in between.
Examples:
@ -340,9 +344,25 @@ Other recommended setttings for Vace:
- Set a medium size overlap window: long enough to give the model a sense of the motion but short enough so any overlapped blurred frames do no turn the rest of the video into a blurred video
- Truncate at least the last 4 frames of the each generated window as Vace last frames tends to be blurry
**WanGP integrates the Matanyone tool which is tuned to work with Vace**.
### VACE and Sky Reels v2 Diffusion Forcing Slidig Window
With this mode (that works for the moment only with Vace and Sky Reels v2) you can merge mutiple Videos to form a very long video (up to 1 min).
This can be very useful to create at the same time a control video and a mask video that go together.\
For example, if you want to replace a face of a person in a video:
- load the video in the Matanyone tool
- click the face on the first frame and create a mask for it (if you have some trouble to select only the face look at the tips below)
- generate both the control video and the mask video by clicking *Generate Video Matting*
- Click *Export to current Video Input and Video Mask*
- In the *Reference Image* field of the Vace screen, load a picture of the replacement face
Please notes that sometime it may be useful to create *Background Masks* if want for instance to replace everything but a character that is in the video. You can do that by selecting *Background Mask* in the *Matanyone settings*
If you have some trouble creating the perfect mask, be aware of these tips:
- Using the Matanyone Settings you can also define Negative Point Prompts to remove parts of the current selection.
- Sometime it is very hard to fit everything you want in a single mask, it may be much easier to combine multiple independent sub Masks before producing the Matting : each sub Mask is created by selecting an area of an image and by clicking the Add Mask button. Sub masks can then be enabled / disabled in the Matanyone settings.
### VACE, Sky Reels v2 Diffusion Forcing Slidig Window and LTX Video
With this mode (that works for the moment only with Vace, Sky Reels v2 and LTX Video) you can merge mutiple Videos to form a very long video (up to 1 min).
When combined with Vace this feature can use the same control video to generate the full Video that results from concatenining the different windows. For instance the first 0-4s of the control video will be used to generate the first window then the next 4-8s of the control video will be used to generate the second window, and so on. So if your control video contains a person walking, your generate video could contain up to one minute of this person walking.
@ -352,12 +372,16 @@ Sliding Windows are turned on by default and are triggered as soon as you try to
Although the window duration is set by the *Sliding Window Size* form field, the actual number of frames generated by each iteration will be less, because of the *overlap frames* and *discard last frames*:
- *overlap frames* : the first frames of a new window are filled with last frames of the previous window in order to ensure continuity between the two windows
- *discard last frames* : quite often (Vace model Only) the last frames of a window have a worse quality. You can decide here how many ending frames of a new window should be dropped.
s
- *discard last frames* : sometime (Vace 1.3B model Only) the last frames of a window have a worse quality. You can decide here how many ending frames of a new window should be dropped.
There is some inevitable quality degradation over time to due to accumulated errors in calculation. One trick to reduce it / hide it is to add some noise (usually not noticable) on the overlapped frames using the *add overlapped noise* option.
Number of Generated Frames = [Number of Windows - 1] * ([Window Size] - [Overlap Frames] - [Discard Last Frames]) + [Window Size]
Experimental: if your prompt is broken into multiple lines (each line separated by a carriage return), then each line of the prompt will be used for a new window. If there are more windows to generate than prompt lines, the last prompt line will be repeated.
### Command line parameters for Gradio Server
--i2v : launch the image to video generator\
--t2v : launch the text to video generator (default defined in the configuration)\

View File

@ -85,7 +85,7 @@ def get_frames_from_image(image_input, image_state):
model.samcontroler.sam_controler.reset_image()
model.samcontroler.sam_controler.set_image(image_state["origin_images"][0])
return image_state, image_info, image_state["origin_images"][0], \
gr.update(visible=True, maximum=10, value=10), gr.update(visible=True, maximum=len(frames), value=len(frames)), gr.update(visible=False, maximum=len(frames), value=len(frames)), \
gr.update(visible=True, maximum=10, value=10), gr.update(visible=False, maximum=len(frames), value=len(frames)), \
gr.update(visible=True), gr.update(visible=True), \
gr.update(visible=True), gr.update(visible=True),\
gr.update(visible=True), gr.update(visible=True), \
@ -273,6 +273,57 @@ def save_video(frames, output_path, fps):
return output_path
# image matting
def image_matting(video_state, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size, refine_iter):
matanyone_processor = InferenceCore(matanyone_model, cfg=matanyone_model.cfg)
if interactive_state["track_end_number"]:
following_frames = video_state["origin_images"][video_state["select_frame_number"]:interactive_state["track_end_number"]]
else:
following_frames = video_state["origin_images"][video_state["select_frame_number"]:]
if interactive_state["multi_mask"]["masks"]:
if len(mask_dropdown) == 0:
mask_dropdown = ["mask_001"]
mask_dropdown.sort()
template_mask = interactive_state["multi_mask"]["masks"][int(mask_dropdown[0].split("_")[1]) - 1] * (int(mask_dropdown[0].split("_")[1]))
for i in range(1,len(mask_dropdown)):
mask_number = int(mask_dropdown[i].split("_")[1]) - 1
template_mask = np.clip(template_mask+interactive_state["multi_mask"]["masks"][mask_number]*(mask_number+1), 0, mask_number+1)
video_state["masks"][video_state["select_frame_number"]]= template_mask
else:
template_mask = video_state["masks"][video_state["select_frame_number"]]
# operation error
if len(np.unique(template_mask))==1:
template_mask[0][0]=1
foreground, alpha = matanyone(matanyone_processor, following_frames, template_mask*255, r_erode=erode_kernel_size, r_dilate=dilate_kernel_size, n_warmup=refine_iter)
foreground_mat = False
output_frames = []
for frame_origin, frame_alpha in zip(following_frames, alpha):
if foreground_mat:
frame_alpha[frame_alpha > 127] = 255
frame_alpha[frame_alpha <= 127] = 0
else:
frame_temp = frame_alpha.copy()
frame_alpha[frame_temp > 127] = 0
frame_alpha[frame_temp <= 127] = 255
output_frame = np.bitwise_and(frame_origin, 255-frame_alpha)
frame_grey = frame_alpha.copy()
frame_grey[frame_alpha == 255] = 255
output_frame += frame_grey
output_frames.append(output_frame)
foreground = output_frames
foreground_output = Image.fromarray(foreground[-1])
alpha_output = Image.fromarray(alpha[-1][:,:,0])
return foreground_output, gr.update(visible=True)
# video matting
def video_matting(video_state, end_slider, matting_type, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size):
matanyone_processor = InferenceCore(matanyone_model, cfg=matanyone_model.cfg)
@ -397,7 +448,7 @@ def restart():
"inference_times": 0,
"negative_click_times" : 0,
"positive_click_times": 0,
"mask_save": arg_mask_save,
"mask_save": False,
"multi_mask": {
"mask_names": [],
"masks": []
@ -457,6 +508,15 @@ def export_to_vace_video_input(foreground_video_output):
gr.Info("Masked Video Input transferred to Vace For Inpainting")
return "V#" + str(time.time()), foreground_video_output
def export_image(image_refs, image_output):
gr.Info("Masked Image transferred to Current Video")
# return "MV#" + str(time.time()), foreground_video_output, alpha_video_output
if image_refs == None:
image_refs =[]
image_refs.append( image_output)
return image_refs
def export_to_current_video_engine(foreground_video_output, alpha_video_output):
gr.Info("Masked Video Input and Full Mask transferred to Current Video Engine For Inpainting")
# return "MV#" + str(time.time()), foreground_video_output, alpha_video_output
@ -471,14 +531,17 @@ def teleport_to_vace_1_3B():
def teleport_to_vace_14B():
return gr.Tabs(selected="video_gen"), gr.Dropdown(value="vace_14B")
def display(tabs, model_choice, vace_video_input, vace_video_mask, video_prompt_video_guide_trigger):
def display(tabs, model_choice, vace_video_input, vace_video_mask, vace_image_refs, video_prompt_video_guide_trigger):
# my_tab.select(fn=load_unload_models, inputs=[], outputs=[])
media_url = "https://github.com/pq-yang/MatAnyone/releases/download/media/"
# download assets
gr.Markdown("Mast Edition is provided by MatAnyone")
gr.Markdown("<B>Mast Edition is provided by MatAnyone</B>")
gr.Markdown("If you have some trouble creating the perfect mask, be aware of these tips:")
gr.Markdown("- Using the Matanyone Settings you can also define Negative Point Prompts to remove parts of the current selection.")
gr.Markdown("- Sometime it is very hard to fit everything you want in a single mask, it may be much easier to combine multiple independent sub Masks before producing the Matting : each sub Mask is created by selecting an area of an image and by clicking the Add Mask button. Sub masks can then be enabled / disabled in the Matanyone settings.")
with gr.Column( visible=True):
with gr.Row():
@ -493,6 +556,11 @@ def display(tabs, model_choice, vace_video_input, vace_video_mask, video_prompt_
gr.Video(value="preprocessing/matanyone/tutorial_multi_targets.mp4", elem_classes="video")
with gr.Tabs():
with gr.TabItem("Video"):
click_state = gr.State([[],[]])
interactive_state = gr.State({
@ -568,9 +636,6 @@ def display(tabs, model_choice, vace_video_input, vace_video_mask, video_prompt_
scale=1)
mask_dropdown = gr.Dropdown(multiselect=True, value=[], label="Mask Selection", info="Choose 1~all mask(s) added in Step 2", visible=False, scale=2)
gr.Markdown("---")
with gr.Column():
# input video
with gr.Row(equal_height=True):
with gr.Column(scale=2):
@ -613,6 +678,7 @@ def display(tabs, model_choice, vace_video_input, vace_video_mask, video_prompt_
export_to_current_video_engine_btn.click( fn=export_to_current_video_engine, inputs= [foreground_video_output, alpha_video_output], outputs= [vace_video_input, vace_video_mask]).then( #video_prompt_video_guide_trigger,
fn=teleport_to_video_tab, inputs= [], outputs= [tabs])
# first step: get the video information
extract_frames_button.click(
fn=get_frames_from_video,
@ -706,3 +772,152 @@ def display(tabs, model_choice, vace_video_input, vace_video_mask, video_prompt_
inputs = [video_state, click_state,],
outputs = [template_frame,click_state],
)
with gr.TabItem("Image"):
click_state = gr.State([[],[]])
interactive_state = gr.State({
"inference_times": 0,
"negative_click_times" : 0,
"positive_click_times": 0,
"mask_save": False,
"multi_mask": {
"mask_names": [],
"masks": []
},
"track_end_number": None,
}
)
image_state = gr.State(
{
"user_name": "",
"image_name": "",
"origin_images": None,
"painted_images": None,
"masks": None,
"inpaint_masks": None,
"logits": None,
"select_frame_number": 0,
"fps": 30
}
)
with gr.Group(elem_classes="gr-monochrome-group", visible=True):
with gr.Row():
with gr.Accordion('MatAnyone Settings (click to expand)', open=False):
with gr.Row():
erode_kernel_size = gr.Slider(label='Erode Kernel Size',
minimum=0,
maximum=30,
step=1,
value=10,
info="Erosion on the added mask",
interactive=True)
dilate_kernel_size = gr.Slider(label='Dilate Kernel Size',
minimum=0,
maximum=30,
step=1,
value=10,
info="Dilation on the added mask",
interactive=True)
with gr.Row():
image_selection_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Num of Refinement Iterations", info="More iterations → More details & More time", visible=False)
track_pause_number_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Track end frame", visible=False)
with gr.Row():
point_prompt = gr.Radio(
choices=["Positive", "Negative"],
value="Positive",
label="Point Prompt",
info="Click to add positive or negative point for target mask",
interactive=True,
visible=False,
min_width=100,
scale=1)
mask_dropdown = gr.Dropdown(multiselect=True, value=[], label="Mask Selection", info="Choose 1~all mask(s) added in Step 2", visible=False)
with gr.Column():
# input image
with gr.Row(equal_height=True):
with gr.Column(scale=2):
gr.Markdown("## Step1: Upload image")
with gr.Column(scale=2):
step2_title = gr.Markdown("## Step2: Add masks <small>(Several clicks then **`Add Mask`** <u>one by one</u>)</small>", visible=False)
with gr.Row(equal_height=True):
with gr.Column(scale=2):
image_input = gr.Image(label="Input Image", elem_classes="image")
extract_frames_button = gr.Button(value="Load Image", interactive=True, elem_classes="new_button")
with gr.Column(scale=2):
image_info = gr.Textbox(label="Image Info", visible=False)
template_frame = gr.Image(type="pil", label="Start Frame", interactive=True, elem_id="template_frame", visible=False, elem_classes="image")
with gr.Row(equal_height=True, elem_classes="mask_button_group"):
clear_button_click = gr.Button(value="Clear Clicks", interactive=True, visible=False, elem_classes="new_button", min_width=100)
add_mask_button = gr.Button(value="Add Mask", interactive=True, visible=False, elem_classes="new_button", min_width=100)
remove_mask_button = gr.Button(value="Remove Mask", interactive=True, visible=False, elem_classes="new_button", min_width=100)
matting_button = gr.Button(value="Image Matting", interactive=True, visible=False, elem_classes="green_button", min_width=100)
# output image
with gr.Row(equal_height=True):
foreground_image_output = gr.Image(type="pil", label="Foreground Output", visible=False, elem_classes="image")
with gr.Row():
with gr.Row():
export_image_btn = gr.Button(value="Add to current Reference Images", visible=False, elem_classes="new_button")
with gr.Column(scale=2, visible= False):
alpha_image_output = gr.Image(type="pil", label="Alpha Output", visible=False, elem_classes="image")
alpha_output_button = gr.Button(value="Alpha Mask Output", visible=False, elem_classes="new_button")
export_image_btn.click( fn=export_image, inputs= [vace_image_refs, foreground_image_output], outputs= [vace_image_refs]).then( #video_prompt_video_guide_trigger,
fn=teleport_to_video_tab, inputs= [], outputs= [tabs])
# first step: get the image information
extract_frames_button.click(
fn=get_frames_from_image,
inputs=[
image_input, image_state
],
outputs=[image_state, image_info, template_frame,
image_selection_slider, track_pause_number_slider,point_prompt, clear_button_click, add_mask_button, matting_button, template_frame,
foreground_image_output, alpha_image_output, export_image_btn, alpha_output_button, mask_dropdown, step2_title]
)
# second step: select images from slider
image_selection_slider.release(fn=select_image_template,
inputs=[image_selection_slider, image_state, interactive_state],
outputs=[template_frame, image_state, interactive_state], api_name="select_image")
track_pause_number_slider.release(fn=get_end_number,
inputs=[track_pause_number_slider, image_state, interactive_state],
outputs=[template_frame, interactive_state], api_name="end_image")
# click select image to get mask using sam
template_frame.select(
fn=sam_refine,
inputs=[image_state, point_prompt, click_state, interactive_state],
outputs=[template_frame, image_state, interactive_state]
)
# add different mask
add_mask_button.click(
fn=add_multi_mask,
inputs=[image_state, interactive_state, mask_dropdown],
outputs=[interactive_state, mask_dropdown, template_frame, click_state]
)
remove_mask_button.click(
fn=remove_multi_mask,
inputs=[interactive_state, mask_dropdown],
outputs=[interactive_state, mask_dropdown]
)
# image matting
matting_button.click(
fn=image_matting,
inputs=[image_state, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size, image_selection_slider],
outputs=[foreground_image_output, export_image_btn]
)

Binary file not shown.

Binary file not shown.

View File

@ -111,7 +111,7 @@ class WanT2V:
self.adapt_vace_model()
def vace_encode_frames(self, frames, ref_images, masks=None, tile_size = 0, overlapped_latents = 0, overlap_noise = 0):
def vace_encode_frames(self, frames, ref_images, masks=None, tile_size = 0, overlapped_latents = None):
if ref_images is None:
ref_images = [None] * len(frames)
else:
@ -123,10 +123,10 @@ class WanT2V:
inactive = [i * (1 - m) + 0 * m for i, m in zip(frames, masks)]
reactive = [i * m + 0 * (1 - m) for i, m in zip(frames, masks)]
inactive = self.vae.encode(inactive, tile_size = tile_size)
# inactive = [ t * (1.0 - noise_factor) + torch.randn_like(t ) * noise_factor for t in inactive]
# if overlapped_latents > 0:
# for t in inactive:
# t[:, :overlapped_latents ] = t[:, :overlapped_latents ] * (1.0 - noise_factor) + torch.randn_like(t[:, :overlapped_latents ] ) * noise_factor
self.toto = inactive[0].clone()
if overlapped_latents != None :
# inactive[0][:, 0:1] = self.vae.encode([frames[0][:, 0:1]], tile_size = tile_size)[0] # redundant
inactive[0][:, 1:overlapped_latents.shape[1] + 1] = overlapped_latents
reactive = self.vae.encode(reactive, tile_size = tile_size)
latents = [torch.cat((u, c), dim=0) for u, c in zip(inactive, reactive)]
@ -190,13 +190,13 @@ class WanT2V:
num_frames = total_frames - prepend_count
if sub_src_mask is not None and sub_src_video is not None:
src_video[i], src_mask[i], _, _, _ = self.vid_proc.load_video_pair(sub_src_video, sub_src_mask, max_frames= num_frames, trim_video = trim_video - prepend_count, start_frame = start_frame, canvas_height = canvas_height, canvas_width = canvas_width, fit_into_canvas = fit_into_canvas)
# src_video is [-1, 1], 0 = inpainting area (in fact 127 in [0, 255])
# src_mask is [-1, 1], 0 = preserve original video (in fact 127 in [0, 255]) and 1 = Inpainting (in fact 255 in [0, 255])
# src_video is [-1, 1] (at this function output), 0 = inpainting area (in fact 127 in [0, 255])
# src_mask is [-1, 1] (at this function output), 0 = preserve original video (in fact 127 in [0, 255]) and 1 = Inpainting (in fact 255 in [0, 255])
src_video[i] = src_video[i].to(device)
src_mask[i] = src_mask[i].to(device)
if prepend_count > 0:
src_video[i] = torch.cat( [sub_pre_src_video, src_video[i]], dim=1)
src_mask[i] = torch.cat( [torch.zeros_like(sub_pre_src_video), src_mask[i]] ,1)
src_mask[i] = torch.cat( [torch.full_like(sub_pre_src_video, -1.0), src_mask[i]] ,1)
src_video_shape = src_video[i].shape
if src_video_shape[1] != total_frames:
src_video[i] = torch.cat( [src_video[i], src_video[i].new_zeros(src_video_shape[0], total_frames -src_video_shape[1], *src_video_shape[-2:])], dim=1)
@ -300,7 +300,8 @@ class WanT2V:
slg_end = 1.0,
cfg_star_switch = True,
cfg_zero_step = 5,
overlapped_latents = 0,
overlapped_latents = None,
return_latent_slice = None,
overlap_noise = 0,
model_filename = None,
**bbargs
@ -373,8 +374,10 @@ class WanT2V:
input_frames = [u.to(self.device) for u in input_frames]
input_ref_images = [ None if u == None else [v.to(self.device) for v in u] for u in input_ref_images]
input_masks = [u.to(self.device) for u in input_masks]
z0 = self.vace_encode_frames(input_frames, input_ref_images, masks=input_masks, tile_size = VAE_tile_size, overlapped_latents = overlapped_latents, overlap_noise = overlap_noise )
previous_latents = None
# if overlapped_latents != None:
# input_ref_images = [u[-1:] for u in input_ref_images]
z0 = self.vace_encode_frames(input_frames, input_ref_images, masks=input_masks, tile_size = VAE_tile_size, overlapped_latents = overlapped_latents )
m0 = self.vace_encode_masks(input_masks, input_ref_images)
z = self.vace_latent(z0, m0)
@ -442,8 +445,9 @@ class WanT2V:
if vace:
ref_images_count = len(input_ref_images[0]) if input_ref_images != None and input_ref_images[0] != None else 0
kwargs.update({'vace_context' : z, 'vace_context_scale' : context_scale})
if overlapped_latents > 0:
z_reactive = [ zz[0:16, ref_images_count:overlapped_latents + ref_images_count].clone() for zz in z]
if overlapped_latents != None:
overlapped_latents_size = overlapped_latents.shape[1] + 1
z_reactive = [ zz[0:16, 0:overlapped_latents_size + ref_images_count].clone() for zz in z]
if self.model.enable_teacache:
@ -453,13 +457,14 @@ class WanT2V:
if callback != None:
callback(-1, None, True)
for i, t in enumerate(tqdm(timesteps)):
if vace and overlapped_latents > 0 :
# noise_factor = overlap_noise *(i/(len(timesteps)-1)) / 1000
noise_factor = overlap_noise / 1000 # * (999-t) / 999
# noise_factor = overlap_noise / 1000 # * t / 999
for zz, zz_r in zip(z, z_reactive):
zz[0:16, ref_images_count:overlapped_latents + ref_images_count] = zz_r * (1.0 - noise_factor) + torch.randn_like(zz_r ) * noise_factor
if overlapped_latents != None:
# overlap_noise_factor = overlap_noise *(i/(len(timesteps)-1)) / 1000
overlap_noise_factor = overlap_noise / 1000
latent_noise_factor = t / 1000
for zz, zz_r, ll in zip(z, z_reactive, [latents]):
pass
zz[0:16, ref_images_count:overlapped_latents_size + ref_images_count] = zz_r[:, ref_images_count:] * (1.0 - overlap_noise_factor) + torch.randn_like(zz_r[:, ref_images_count:] ) * overlap_noise_factor
ll[:, 0:overlapped_latents_size + ref_images_count] = zz_r * (1.0 - latent_noise_factor) + torch.randn_like(zz_r ) * latent_noise_factor
if target_camera != None:
latent_model_input = torch.cat([latents, source_latents], dim=1)
else:
@ -552,6 +557,13 @@ class WanT2V:
x0 = [latents]
if return_latent_slice != None:
if overlapped_latents != None:
# latents [:, 1:] = self.toto
for zz, zz_r, ll in zip(z, z_reactive, [latents]):
ll[:, 0:overlapped_latents_size + ref_images_count] = zz_r
latent_slice = latents[:, return_latent_slice].clone()
if input_frames == None:
if phantom:
# phantom post processing
@ -560,11 +572,9 @@ class WanT2V:
else:
# vace post processing
videos = self.decode_latent(x0, input_ref_images, VAE_tile_size)
del latents
del sample_scheduler
return videos[0] if self.rank == 0 else None
if return_latent_slice != None:
return { "x" : videos[0], "latent_slice" : latent_slice }
return videos[0]
def adapt_vace_model(self):
model = self.model

View File

@ -91,11 +91,11 @@ def calculate_new_dimensions(canvas_height, canvas_width, height, width, fit_int
return new_height, new_width
def resize_and_remove_background(img_list, budget_width, budget_height, rm_background, fit_into_canvas = False ):
if rm_background:
if rm_background > 0:
session = new_session()
output_list =[]
for img in img_list:
for i, img in enumerate(img_list):
width, height = img.size
if fit_into_canvas:
@ -113,9 +113,10 @@ def resize_and_remove_background(img_list, budget_width, budget_height, rm_backg
new_height = int( round(height * scale / 16) * 16)
new_width = int( round(width * scale / 16) * 16)
resized_image= img.resize((new_width,new_height), resample=Image.Resampling.LANCZOS)
if rm_background:
resized_image = remove(resized_image, session=session, alpha_matting = True, bgcolor=[255, 255, 255, 0]).convert('RGB')
output_list.append(resized_image)
if rm_background == 1 or rm_background == 2 and i > 0 :
# resized_image = remove(resized_image, session=session, alpha_matting_erode_size = 1,alpha_matting_background_threshold = 70, alpha_foreground_background_threshold = 100, alpha_matting = True, bgcolor=[255, 255, 255, 0]).convert('RGB')
resized_image = remove(resized_image, session=session, alpha_matting_erode_size = 1, alpha_matting = True, bgcolor=[255, 255, 255, 0]).convert('RGB')
output_list.append(resized_image) #alpha_matting_background_threshold = 30, alpha_foreground_background_threshold = 200,
return output_list

206
wgp.py
View File

@ -204,9 +204,6 @@ def process_prompt_and_add_tasks(state, model_choice):
if isinstance(image_refs, list):
image_refs = [ convert_image(tup[0]) for tup in image_refs ]
# os.environ["U2NET_HOME"] = os.path.join(os.getcwd(), "ckpts", "rembg")
# from wan.utils.utils import resize_and_remove_background
# image_refs = resize_and_remove_background(image_refs, width, height, inputs["remove_background_image_ref"] ==1, fit_into_canvas= True)
if len(prompts) > 0:
@ -333,8 +330,10 @@ def process_prompt_and_add_tasks(state, model_choice):
if "O" in video_prompt_type :
keep_frames_video_guide= inputs["keep_frames_video_guide"]
video_length = inputs["video_length"]
if len(keep_frames_video_guide) ==0:
gr.Info(f"Warning : you have asked to reuse all the frames of the control Video in the Alternate Video Ending it. Please make sure the number of frames of the control Video is lower than the total number of frames to generate otherwise it won't make a difference.")
if len(keep_frames_video_guide) > 0:
gr.Info("Keeping Frames with Extending Video is not yet supported")
return
# gr.Info(f"Warning : you have asked to reuse all the frames of the control Video in the Alternate Video Ending it. Please make sure the number of frames of the control Video is lower than the total number of frames to generate otherwise it won't make a difference.")
# elif keep_frames >= video_length:
# gr.Info(f"The number of frames in the control Video to reuse ({keep_frames_video_guide}) in Alternate Video Ending can not be bigger than the total number of frames ({video_length}) to generate.")
# return
@ -349,11 +348,6 @@ def process_prompt_and_add_tasks(state, model_choice):
if isinstance(image_refs, list):
image_refs = [ convert_image(tup[0]) for tup in image_refs ]
# os.environ["U2NET_HOME"] = os.path.join(os.getcwd(), "ckpts", "rembg")
# from wan.utils.utils import resize_and_remove_background
# image_refs = resize_and_remove_background(image_refs, width, height, inputs["remove_background_image_ref"] ==1)
if len(prompts) > 0:
prompts = ["\n".join(prompts)]
@ -1464,7 +1458,6 @@ lock_ui_attention = False
lock_ui_transformer = False
lock_ui_compile = False
preload =int(args.preload)
force_profile_no = int(args.profile)
verbose_level = int(args.verbose)
quantizeTransformer = args.quantize_transformer
@ -1482,17 +1475,21 @@ if os.path.isfile("t2v_settings.json"):
if not os.path.isfile(server_config_filename) and os.path.isfile("gradio_config.json"):
shutil.move("gradio_config.json", server_config_filename)
if not os.path.isdir("ckpts/umt5-xxl/"):
os.makedirs("ckpts/umt5-xxl/")
src_move = [ "ckpts/models_clip_open-clip-xlm-roberta-large-vit-huge-14-bf16.safetensors", "ckpts/models_t5_umt5-xxl-enc-bf16.safetensors", "ckpts/models_t5_umt5-xxl-enc-quanto_int8.safetensors" ]
tgt_move = [ "ckpts/xlm-roberta-large/", "ckpts/umt5-xxl/", "ckpts/umt5-xxl/"]
for src,tgt in zip(src_move,tgt_move):
if os.path.isfile(src):
try:
if os.path.isfile(tgt):
shutil.remove(src)
else:
shutil.move(src, tgt)
except:
pass
if not Path(server_config_filename).is_file():
server_config = {"attention_mode" : "auto",
"transformer_types": [],
@ -1755,7 +1752,10 @@ def get_default_settings(filename):
"flow_shift": 13,
"resolution": "1280x720"
})
elif get_model_type(filename) in ("vace_14B"):
ui_defaults.update({
"sliding_window_discard_last_frames": 0,
})
with open(defaults_filename, "w", encoding="utf-8") as f:
@ -2136,6 +2136,9 @@ def load_models(model_filename):
global transformer_filename, transformer_loras_filenames
model_family = get_model_family(model_filename)
perc_reserved_mem_max = args.perc_reserved_mem_max
preload =int(args.preload)
if preload == 0:
preload = server_config.get("preload_in_VRAM", 0)
new_transformer_loras_filenames = None
dependent_models = get_dependent_models(model_filename, quantization= transformer_quantization, dtype_policy = transformer_dtype_policy)
new_transformer_loras_filenames = [model_filename] if "_lora" in model_filename else None
@ -2259,7 +2262,8 @@ def apply_changes( state,
preload_model_policy_choice = 1,
UI_theme_choice = "default",
enhancer_enabled_choice = 0,
fit_canvas_choice = 0
fit_canvas_choice = 0,
preload_in_VRAM_choice = 0
):
if args.lock_config:
return
@ -2284,6 +2288,7 @@ def apply_changes( state,
"UI_theme" : UI_theme_choice,
"fit_canvas": fit_canvas_choice,
"enhancer_enabled" : enhancer_enabled_choice,
"preload_in_VRAM" : preload_in_VRAM_choice
}
if Path(server_config_filename).is_file():
@ -2456,26 +2461,20 @@ def refresh_gallery(state): #, msg
prompt = "<BR><DIV style='height:8px'></DIV>".join(prompts)
if enhanced:
prompt = "<U><B>Enhanced:</B></U><BR>" + prompt
list_uri = []
start_img_uri = task.get('start_image_data_base64')
start_img_uri = start_img_uri[0] if start_img_uri !=None else None
if start_img_uri != None:
list_uri += start_img_uri
end_img_uri = task.get('end_image_data_base64')
end_img_uri = end_img_uri[0] if end_img_uri !=None else None
if end_img_uri != None:
list_uri += end_img_uri
thumbnail_size = "100px"
if start_img_uri:
start_img_md = f'<img src="{start_img_uri}" alt="Start" style="max-width:{thumbnail_size}; max-height:{thumbnail_size}; display: block; margin: auto; object-fit: contain;" />'
if end_img_uri:
end_img_md = f'<img src="{end_img_uri}" alt="End" style="max-width:{thumbnail_size}; max-height:{thumbnail_size}; display: block; margin: auto; object-fit: contain;" />'
thumbnails = ""
for img_uri in list_uri:
thumbnails += f'<TD><img src="{img_uri}" alt="Start" style="max-width:{thumbnail_size}; max-height:{thumbnail_size}; display: block; margin: auto; object-fit: contain;" /></TD>'
label = f"Prompt of Video being Generated"
html = "<STYLE> #PINFO, #PINFO th, #PINFO td {border: 1px solid #CCCCCC;background-color:#FFFFFF;}</STYLE><TABLE WIDTH=100% ID=PINFO ><TR><TD width=100%>" + prompt + "</TD>"
if start_img_md != "":
html += "<TD>" + start_img_md + "</TD>"
if end_img_md != "":
html += "<TD>" + end_img_md + "</TD>"
html += "</TR></TABLE>"
html = "<STYLE> #PINFO, #PINFO th, #PINFO td {border: 1px solid #CCCCCC;background-color:#FFFFFF;}</STYLE><TABLE WIDTH=100% ID=PINFO ><TR><TD width=100%>" + prompt + "</TD>" + thumbnails + "</TR></TABLE>"
html_output = gr.HTML(html, visible= True)
return gr.Gallery(selected_index=choice, value = file_list), html_output, gr.Button(visible=False), gr.Button(visible=True), gr.Row(visible=True), update_queue_data(queue), gr.Button(interactive= abort_interactive), gr.Button(visible= onemorewindow_visible)
@ -2680,7 +2679,7 @@ def generate_video(
sliding_window_overlap,
sliding_window_overlap_noise,
sliding_window_discard_last_frames,
remove_background_image_ref,
remove_background_images_ref,
temporal_upsampling,
spatial_upsampling,
RIFLEx_setting,
@ -2816,13 +2815,14 @@ def generate_video(
fps = 30
else:
fps = 16
latent_size = 8 if ltxv else 4
original_image_refs = image_refs
if image_refs != None and len(image_refs) > 0 and (hunyuan_custom or phantom or vace):
send_cmd("progress", [0, get_latest_status(state, "Removing Images References Background")])
os.environ["U2NET_HOME"] = os.path.join(os.getcwd(), "ckpts", "rembg")
from wan.utils.utils import resize_and_remove_background
image_refs = resize_and_remove_background(image_refs, width, height, remove_background_image_ref ==1, fit_into_canvas= not vace)
image_refs = resize_and_remove_background(image_refs, width, height, remove_background_images_ref, fit_into_canvas= not vace)
update_task_thumbnails(task, locals())
send_cmd("output")
@ -2879,7 +2879,6 @@ def generate_video(
repeat_no = 0
extra_generation = 0
initial_total_windows = 0
max_frames_to_generate = video_length
if diffusion_forcing or vace or ltxv:
reuse_frames = min(sliding_window_size - 4, sliding_window_overlap)
else:
@ -2888,8 +2887,9 @@ def generate_video(
video_length += sliding_window_overlap
sliding_window = (vace or diffusion_forcing or ltxv) and video_length > sliding_window_size
if sliding_window:
discard_last_frames = sliding_window_discard_last_frames
default_max_frames_to_generate = video_length
if sliding_window:
left_after_first_window = video_length - sliding_window_size + discard_last_frames
initial_total_windows= 1 + math.ceil(left_after_first_window / (sliding_window_size - discard_last_frames - reuse_frames))
video_length = sliding_window_size
@ -2913,6 +2913,7 @@ def generate_video(
prefix_video_frames_count = 0
frames_already_processed = None
pre_video_guide = None
overlapped_latents = None
window_no = 0
extra_windows = 0
guide_start_frame = 0
@ -2920,6 +2921,8 @@ def generate_video(
gen["extra_windows"] = 0
gen["total_windows"] = 1
gen["window_no"] = 1
num_frames_generated = 0
max_frames_to_generate = default_max_frames_to_generate
start_time = time.time()
if prompt_enhancer_image_caption_model != None and prompt_enhancer !=None and len(prompt_enhancer)>0:
text_encoder_max_tokens = 256
@ -2955,38 +2958,50 @@ def generate_video(
while not abort:
if sliding_window:
prompt = prompts[window_no] if window_no < len(prompts) else prompts[-1]
extra_windows += gen.get("extra_windows",0)
if extra_windows > 0:
video_length = sliding_window_size
new_extra_windows = gen.get("extra_windows",0)
gen["extra_windows"] = 0
extra_windows += new_extra_windows
max_frames_to_generate += new_extra_windows * (sliding_window_size - discard_last_frames - reuse_frames)
sliding_window = sliding_window or extra_windows > 0
if sliding_window and window_no > 0:
num_frames_generated -= reuse_frames
if (max_frames_to_generate - prefix_video_frames_count - num_frames_generated) < latent_size:
break
video_length = min(sliding_window_size, ((max_frames_to_generate - num_frames_generated - prefix_video_frames_count + reuse_frames + discard_last_frames) // latent_size) * latent_size + 1 )
total_windows = initial_total_windows + extra_windows
gen["total_windows"] = total_windows
if window_no >= total_windows:
break
window_no += 1
gen["window_no"] = window_no
return_latent_slice = None
if reuse_frames > 0:
return_latent_slice = slice(-(reuse_frames - 1 + discard_last_frames ) // latent_size, None if discard_last_frames == 0 else -(discard_last_frames // latent_size) )
if hunyuan_custom:
src_ref_images = image_refs
elif phantom:
src_ref_images = image_refs.copy() if image_refs != None else None
elif diffusion_forcing or ltxv:
elif diffusion_forcing or ltxv or vace and "O" in video_prompt_type:
if vace:
video_source = video_guide
video_guide = None
if video_source != None and len(video_source) > 0 and window_no == 1:
keep_frames_video_source= 1000 if len(keep_frames_video_source) ==0 else int(keep_frames_video_source)
keep_frames_video_source = (keep_frames_video_source // latent_size ) * latent_size + 1
prefix_video = preprocess_video(None, width=width, height=height,video_in=video_source, max_frames= keep_frames_video_source , start_frame = 0, fit_canvas= fit_canvas, target_fps = fps, block_size = 32 if ltxv else 16)
prefix_video = prefix_video .permute(3, 0, 1, 2)
prefix_video = prefix_video .float().div_(127.5).sub_(1.) # c, f, h, w
prefix_video_frames_count = prefix_video.shape[1]
pre_video_guide = prefix_video[:, -reuse_frames:]
elif vace:
# video_prompt_type = video_prompt_type +"G"
prefix_video_frames_count = pre_video_guide.shape[1]
if vace:
height, width = pre_video_guide.shape[-2:]
if vace:
image_refs_copy = image_refs.copy() if image_refs != None else None # required since prepare_source do inplace modifications
video_guide_copy = video_guide
video_mask_copy = video_mask
if any(process in video_prompt_type for process in ("P", "D", "G")) :
prompts_max = gen["prompts_max"]
preprocess_type = None
if "P" in video_prompt_type :
progress_args = [0, get_latest_status(state,"Extracting Open Pose Information")]
@ -3005,8 +3020,11 @@ def generate_video(
if len(error) > 0:
raise gr.Error(f"invalid keep frames {keep_frames_video_guide}")
keep_frames_parsed = keep_frames_parsed[guide_start_frame: guide_start_frame + video_length]
if window_no == 1:
image_size = (height, width) # VACE_SIZE_CONFIGS[resolution_reformated] # default frame dimensions until it is set by video_src (if there is any)
image_size = (height, width) # default frame dimensions until it is set by video_src (if there is any)
src_video, src_mask, src_ref_images = wan_model.prepare_source([video_guide_copy],
[video_mask_copy ],
[image_refs_copy],
@ -3017,22 +3035,13 @@ def generate_video(
pre_src_video = [pre_video_guide],
fit_into_canvas = fit_canvas
)
# if window_no == 1 and src_video != None and len(src_video) > 0:
# image_size = src_video[0].shape[-2:]
prompts_max = gen["prompts_max"]
status = get_latest_status(state)
gen["progress_status"] = status
gen["progress_phase"] = ("Encoding Prompt", -1 )
callback = build_callback(state, trans, send_cmd, status, num_inference_steps)
progress_args = [0, merge_status_context(status, "Encoding Prompt")]
send_cmd("progress", progress_args)
# samples = torch.empty( (1,2)) #for testing
# if False:
try:
if trans.enable_teacache:
trans.teacache_counter = 0
trans.num_steps = num_inference_steps
@ -3040,6 +3049,10 @@ def generate_video(
trans.previous_residual = None
trans.previous_modulated_input = None
# samples = torch.empty( (1,2)) #for testing
# if False:
try:
samples = wan_model.generate(
input_prompt = prompt,
image_start = image_start,
@ -3049,7 +3062,7 @@ def generate_video(
input_masks = src_mask,
input_video= pre_video_guide if diffusion_forcing or ltxv else source_video,
target_camera= target_camera,
frame_num=(video_length // 4)* 4 + 1,
frame_num=(video_length // latent_size)* latent_size + 1,
height = height,
width = width,
fit_into_canvas = fit_canvas == 1,
@ -3076,7 +3089,8 @@ def generate_video(
causal_block_size = 5,
causal_attention = True,
fps = fps,
overlapped_latents = 0 if reuse_frames == 0 or window_no == 1 else ((reuse_frames - 1) // 4 + 1),
overlapped_latents = overlapped_latents,
return_latent_slice= return_latent_slice,
overlap_noise = sliding_window_overlap_noise,
model_filename = model_filename,
)
@ -3109,6 +3123,7 @@ def generate_video(
tb = traceback.format_exc().split('\n')[:-1]
print('\n'.join(tb))
send_cmd("error", new_error)
clear_status(state)
return
finally:
trans.previous_residual = None
@ -3118,33 +3133,42 @@ def generate_video(
print(f"Teacache Skipped Steps:{trans.teacache_skipped_steps}/{trans.num_steps}" )
if samples != None:
if isinstance(samples, dict):
overlapped_latents = samples.get("latent_slice", None)
samples= samples["x"]
samples = samples.to("cpu")
offload.last_offload_obj.unload_all()
gc.collect()
torch.cuda.empty_cache()
# time_flag = datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d-%Hh%Mm%Ss")
# save_prompt = "_in_" + original_prompts[0]
# file_name = f"{time_flag}_seed{seed}_{sanitize_file_name(save_prompt[:50]).strip()}.mp4"
# sample = samples.cpu()
# cache_video( tensor=sample[None].clone(), save_file=os.path.join(save_path, file_name), fps=16, nrow=1, normalize=True, value_range=(-1, 1))
if samples == None:
abort = True
state["prompt"] = ""
send_cmd("output")
else:
sample = samples.cpu()
if True: # for testing
torch.save(sample, "output.pt")
else:
sample =torch.load("output.pt")
# if True: # for testing
# torch.save(sample, "output.pt")
# else:
# sample =torch.load("output.pt")
if gen.get("extra_windows",0) > 0:
sliding_window = True
if sliding_window :
guide_start_frame += video_length
if discard_last_frames > 0:
sample = sample[: , :-discard_last_frames]
guide_start_frame -= discard_last_frames
if reuse_frames == 0:
pre_video_guide = sample[:,9999 :]
pre_video_guide = sample[:,9999 :].clone()
else:
# noise_factor = 200/ 1000
# pre_video_guide = sample[:, -reuse_frames:] * (1.0 - noise_factor) + torch.randn_like(sample[:, -reuse_frames:] ) * noise_factor
pre_video_guide = sample[:, -reuse_frames:]
pre_video_guide = sample[:, -reuse_frames:].clone()
num_frames_generated += sample.shape[1]
if prefix_video != None:
@ -3158,7 +3182,6 @@ def generate_video(
sample = sample[: , :]
else:
sample = sample[: , reuse_frames:]
guide_start_frame -= reuse_frames
exp = 0
@ -3252,15 +3275,9 @@ def generate_video(
print(f"New video saved to Path: "+video_path)
file_list.append(video_path)
send_cmd("output")
if sliding_window :
if max_frames_to_generate > 0 and extra_windows == 0:
current_length = sample.shape[1]
if (current_length - prefix_video_frames_count)>= max_frames_to_generate:
break
video_length = min(sliding_window_size, ((max_frames_to_generate - (current_length - prefix_video_frames_count) + reuse_frames + discard_last_frames) // 4) * 4 + 1 )
seed += 1
clear_status(state)
if temp_filename!= None and os.path.isfile(temp_filename):
os.remove(temp_filename)
offload.unload_loras_from_model(trans)
@ -3631,6 +3648,15 @@ def merge_status_context(status="", context=""):
else:
return status + " - " + context
def clear_status(state):
gen = get_gen_info(state)
gen["extra_windows"] = 0
gen["total_windows"] = 1
gen["window_no"] = 1
gen["extra_orders"] = 0
gen["repeat_no"] = 0
gen["total_generation"] = 0
def get_latest_status(state, context=""):
gen = get_gen_info(state)
prompt_no = gen["prompt_no"]
@ -3999,7 +4025,7 @@ def prepare_inputs_dict(target, inputs ):
inputs.pop("model_mode")
if not "Vace" in model_filename or not "phantom" in model_filename or not "hunyuan_video_custom" in model_filename:
unsaved_params = ["keep_frames_video_guide", "video_prompt_type", "remove_background_image_ref"]
unsaved_params = ["keep_frames_video_guide", "video_prompt_type", "remove_background_images_ref"]
for k in unsaved_params:
inputs.pop(k)
@ -4066,7 +4092,7 @@ def save_inputs(
sliding_window_overlap,
sliding_window_overlap_noise,
sliding_window_discard_last_frames,
remove_background_image_ref,
remove_background_images_ref,
temporal_upsampling,
spatial_upsampling,
RIFLEx_setting,
@ -4458,7 +4484,7 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
("Transfer Human Motion from the Control Video", "PV"),
("Transfer Depth from the Control Video", "DV"),
("Recolorize the Control Video", "CV"),
# ("Alternate Video Ending", "OV"),
("Extend Video", "OV"),
("Video contains Open Pose, Depth, Black & White, Inpainting ", "V"),
("Control Video and Mask video for Inpainting ", "MV"),
],
@ -4489,7 +4515,17 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
)
# with gr.Row():
remove_background_image_ref = gr.Checkbox(value=ui_defaults.get("remove_background_image_ref",1), label= "Remove Background of Images References", visible= "I" in video_prompt_type_value, scale =1 )
remove_background_images_ref = gr.Dropdown(
choices=[
("Keep Backgrounds of All Images (landscape)", 0),
("Remove Backgrounds of All Images (objects / faces)", 1),
("Keep it for first Image (landscape) and remove it for other Images (objects / faces)", 2),
],
value=ui_defaults.get("remove_background_images_ref",1),
label="Remove Background of Images References", scale = 3, visible= "I" in video_prompt_type_value
)
# remove_background_images_ref = gr.Checkbox(value=ui_defaults.get("remove_background_images_ref",1), label= "Remove Background of Images References", visible= "I" in video_prompt_type_value, scale =1 )
video_mask = gr.Video(label= "Video Mask (for Inpainting or Outpaing, white pixels = Mask)", visible= "M" in video_prompt_type_value, value= ui_defaults.get("video_mask", None))
@ -4730,7 +4766,7 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
else:
sliding_window_size = gr.Slider(5, 137, value=ui_defaults.get("sliding_window_size", 81), step=4, label="Sliding Window Size")
sliding_window_overlap = gr.Slider(1, 97, value=ui_defaults.get("sliding_window_overlap",5), step=4, label="Windows Frames Overlap (needed to maintain continuity between windows, a higher value will require more windows)")
sliding_window_overlap_noise = gr.Slider(0, 100, value=ui_defaults.get("sliding_window_overlap_noise",20), step=1, label="Noise to be added to overlapped frames to reduce blur effect")
sliding_window_overlap_noise = gr.Slider(0, 150, value=ui_defaults.get("sliding_window_overlap_noise",20), step=1, label="Noise to be added to overlapped frames to reduce blur effect")
sliding_window_discard_last_frames = gr.Slider(0, 20, value=ui_defaults.get("sliding_window_discard_last_frames", 8), step=4, label="Discard Last Frames of a Window (that may have bad quality)", visible = True)
@ -4811,7 +4847,7 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
image_prompt_type.change(fn=refresh_image_prompt_type, inputs=[state, image_prompt_type], outputs=[image_start, image_end, video_source, keep_frames_video_source] )
video_prompt_video_guide_trigger.change(fn=refresh_video_prompt_video_guide_trigger, inputs=[video_prompt_type, video_prompt_video_guide_trigger], outputs=[video_prompt_type, video_prompt_type_video_guide, video_guide, video_mask, keep_frames_video_guide])
video_prompt_type_image_refs.input(fn=refresh_video_prompt_type_image_refs, inputs = [video_prompt_type, video_prompt_type_image_refs], outputs = [video_prompt_type, image_refs, remove_background_image_ref ])
video_prompt_type_image_refs.input(fn=refresh_video_prompt_type_image_refs, inputs = [video_prompt_type, video_prompt_type_image_refs], outputs = [video_prompt_type, image_refs, remove_background_images_ref ])
video_prompt_type_video_guide.input(fn=refresh_video_prompt_type_video_guide, inputs = [video_prompt_type, video_prompt_type_video_guide], outputs = [video_prompt_type, video_guide, keep_frames_video_guide, video_mask])
show_advanced.change(fn=switch_advanced, inputs=[state, show_advanced, lset_name], outputs=[advanced_row, preset_buttons_rows, refresh_lora_btn, refresh2_row ,lset_name ]).then(
@ -5036,7 +5072,7 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
)
return ( state, loras_choices, lset_name, state,
video_guide, video_mask, video_prompt_video_guide_trigger, prompt_enhancer
video_guide, video_mask, image_refs, video_prompt_video_guide_trigger, prompt_enhancer
)
@ -5250,6 +5286,7 @@ def generate_configuration_tab(state, blocks, header, model_choice, prompt_enhan
value= profile,
label="Profile (for power users only, not needed to change it)"
)
preload_in_VRAM_choice = gr.Slider(0, 40000, value=server_config.get("preload_in_VRAM", 0), step=100, label="Number of MB of Models that are Preloaded in VRAM (0 will use Profile default)")
@ -5277,7 +5314,8 @@ def generate_configuration_tab(state, blocks, header, model_choice, prompt_enhan
preload_model_policy_choice,
UI_theme_choice,
enhancer_enabled_choice,
fit_canvas_choice
fit_canvas_choice,
preload_in_VRAM_choice
],
outputs= [msg , header, model_choice, prompt_enhancer_row]
)
@ -5661,7 +5699,7 @@ def create_demo():
theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.2 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.21 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
global model_list
tab_state = gr.State({ "tab_no":0 })
@ -5680,7 +5718,7 @@ def create_demo():
header = gr.Markdown(generate_header(transformer_filename, compile, attention_mode), visible= True)
with gr.Row():
( state, loras_choices, lset_name, state,
video_guide, video_mask, video_prompt_type_video_trigger, prompt_enhancer_row
video_guide, video_mask, image_refs, video_prompt_type_video_trigger, prompt_enhancer_row
) = generate_video_tab(model_choice=model_choice, header=header, main = main)
with gr.Tab("Informations", id="info"):
generate_info_tab()
@ -5688,7 +5726,7 @@ def create_demo():
from preprocessing.matanyone import app as matanyone_app
vmc_event_handler = matanyone_app.get_vmc_event_handler()
matanyone_app.display(main_tabs, model_choice, video_guide, video_mask, video_prompt_type_video_trigger)
matanyone_app.display(main_tabs, model_choice, video_guide, video_mask, image_refs, video_prompt_type_video_trigger)
if not args.lock_config:
with gr.Tab("Downloads", id="downloads") as downloads_tab:
generate_download_tab(lset_name, loras_choices, state)