mirror of
https://github.com/Wan-Video/Wan2.1.git
synced 2025-11-04 14:16:57 +00:00
Vace improvements
This commit is contained in:
parent
6706709230
commit
86725a65d4
46
README.md
46
README.md
@ -21,6 +21,7 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
|
|||||||
|
|
||||||
|
|
||||||
## 🔥 Latest News!!
|
## 🔥 Latest News!!
|
||||||
|
* May 23 2025: 👋 Wan 2.1GP v5.21 : Improvements for Vace: better transitions between Sliding Windows,Support for Image masks in Matanyone, new Extend Video for Vace, different types of automated background removal
|
||||||
* May 20 2025: 👋 Wan 2.1GP v5.2 : Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps.
|
* May 20 2025: 👋 Wan 2.1GP v5.2 : Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps.
|
||||||
The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B.
|
The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B.
|
||||||
See instructions below on how to use CausVid.\
|
See instructions below on how to use CausVid.\
|
||||||
@ -307,17 +308,20 @@ You can define multiple lines of macros. If there is only one macro line, the ap
|
|||||||
|
|
||||||
### VACE ControlNet introduction
|
### VACE ControlNet introduction
|
||||||
|
|
||||||
Vace is a ControlNet 1.3B text2video model that allows you to do Video to Video and Reference to Video (inject your own images into the output video). So with Vace you can inject in the scene people or objects of your choice, animate a person, perform inpainting or outpainting, continue a video, ...
|
Vace is a ControlNet that allows you to do Video to Video and Reference to Video (inject your own images into the output video). It is probably one of the most powerful Wan models and you will be able to do amazing things when you master it: inject in the scene people or objects of your choice, animate a person, perform inpainting or outpainting, continue a video, ...
|
||||||
|
|
||||||
First you need to select the Vace 1.3B model in the Drop Down box at the top. Please note that Vace works well for the moment only with videos up to 5s (81 frames).
|
First you need to select the Vace 1.3B model or the Vace 13B model in the Drop Down box at the top. Please note that Vace works well for the moment only with videos up to 7s with the Riflex option turned on.
|
||||||
|
|
||||||
Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
|
Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
|
||||||
- a Control Video: Based on your choice, you can decide to transfer the motion, the depth in a new Video. You can tell WanGP to use only the first n frames of Control Video and to extrapolate the rest. You can also do inpainting ). If the video contains area of grey color 127, they will be considered as masks and will be filled based on the Text prompt of the reference Images.
|
- *a Control Video*\
|
||||||
|
Based on your choice, you can decide to transfer the motion, the depth in a new Video. You can tell WanGP to use only the first n frames of Control Video and to extrapolate the rest. You can also do inpainting. If the video contains area of grey color 127, they will be considered as masks and will be filled based on the Text prompt of the reference Images.
|
||||||
|
|
||||||
- reference Images: Use this to inject people or objects of your choice in the video. You can select multiple reference Images. The integration of the image is more efficient if the background is replaced by the full white color. You can do that with your preferred background remover or use the built in background remover by checking the box *Remove background*
|
- *Reference Images*\
|
||||||
|
A reference Image can be as well a background that you want to use as a setting for the video or people or objects of your choice that you want to inject in the video. You can select multiple reference Images. The integration of object / person image is more efficient if the background is replaced by the full white color. For complex background removal you can use the Image version of the Matanyone tool that is embedded with WanGP or use you can use the fast on the fly background remover by selecting an option in the drop down box *Remove background*. Becareful not to remove the background of the reference image that is a landscape or setting (always the first reference image) that you want to use as a start image / background for the video. It helps greatly to reference and describe explictly the injected objects / people of the Reference Images in the text prompt.
|
||||||
|
|
||||||
|
- *a Video Mask*\
|
||||||
|
This offers a stronger mechanism to tell Vace which parts should be kept (black) or replaced (white). You can do as well inpainting / outpainting, fill the missing part of a video more efficientlty with just the video hint. For instance, if a video mask is white except at the beginning and at the end where it is black, the first and last frames will be kept and everything in between will be generated.
|
||||||
|
|
||||||
- a Video Mask
|
|
||||||
This offers a stronger mechanism to tell Vace which parts should be kept (black) or replaced (white). You can do as well inpainting / outpainting, fill the missing part of a video more efficientlty with just the video hint. If a video mask is white, it will be generated so with black frames at the beginning and at the end and the rest white, you could generate the missing frames in between.
|
|
||||||
|
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
@ -336,13 +340,29 @@ There is also a guide that describes the various combination of hints (https://g
|
|||||||
It seems you will get better results with Vace if you turn on "Skip Layer Guidance" with its default configuration.
|
It seems you will get better results with Vace if you turn on "Skip Layer Guidance" with its default configuration.
|
||||||
|
|
||||||
Other recommended setttings for Vace:
|
Other recommended setttings for Vace:
|
||||||
- Use a long prompt description especially for the people / objects that are in the background and not in reference images. This will ensure consistency between the windows.
|
- Use a long prompt description especially for the people / objects that are in the background and not in reference images. This will ensure consistency between the windows.
|
||||||
- Set a medium size overlap window: long enough to give the model a sense of the motion but short enough so any overlapped blurred frames do no turn the rest of the video into a blurred video
|
- Set a medium size overlap window: long enough to give the model a sense of the motion but short enough so any overlapped blurred frames do no turn the rest of the video into a blurred video
|
||||||
- Truncate at least the last 4 frames of the each generated window as Vace last frames tends to be blurry
|
- Truncate at least the last 4 frames of the each generated window as Vace last frames tends to be blurry
|
||||||
|
|
||||||
|
**WanGP integrates the Matanyone tool which is tuned to work with Vace**.
|
||||||
|
|
||||||
### VACE and Sky Reels v2 Diffusion Forcing Slidig Window
|
This can be very useful to create at the same time a control video and a mask video that go together.\
|
||||||
With this mode (that works for the moment only with Vace and Sky Reels v2) you can merge mutiple Videos to form a very long video (up to 1 min).
|
For example, if you want to replace a face of a person in a video:
|
||||||
|
- load the video in the Matanyone tool
|
||||||
|
- click the face on the first frame and create a mask for it (if you have some trouble to select only the face look at the tips below)
|
||||||
|
- generate both the control video and the mask video by clicking *Generate Video Matting*
|
||||||
|
- Click *Export to current Video Input and Video Mask*
|
||||||
|
- In the *Reference Image* field of the Vace screen, load a picture of the replacement face
|
||||||
|
|
||||||
|
Please notes that sometime it may be useful to create *Background Masks* if want for instance to replace everything but a character that is in the video. You can do that by selecting *Background Mask* in the *Matanyone settings*
|
||||||
|
|
||||||
|
If you have some trouble creating the perfect mask, be aware of these tips:
|
||||||
|
- Using the Matanyone Settings you can also define Negative Point Prompts to remove parts of the current selection.
|
||||||
|
- Sometime it is very hard to fit everything you want in a single mask, it may be much easier to combine multiple independent sub Masks before producing the Matting : each sub Mask is created by selecting an area of an image and by clicking the Add Mask button. Sub masks can then be enabled / disabled in the Matanyone settings.
|
||||||
|
|
||||||
|
|
||||||
|
### VACE, Sky Reels v2 Diffusion Forcing Slidig Window and LTX Video
|
||||||
|
With this mode (that works for the moment only with Vace, Sky Reels v2 and LTX Video) you can merge mutiple Videos to form a very long video (up to 1 min).
|
||||||
|
|
||||||
When combined with Vace this feature can use the same control video to generate the full Video that results from concatenining the different windows. For instance the first 0-4s of the control video will be used to generate the first window then the next 4-8s of the control video will be used to generate the second window, and so on. So if your control video contains a person walking, your generate video could contain up to one minute of this person walking.
|
When combined with Vace this feature can use the same control video to generate the full Video that results from concatenining the different windows. For instance the first 0-4s of the control video will be used to generate the first window then the next 4-8s of the control video will be used to generate the second window, and so on. So if your control video contains a person walking, your generate video could contain up to one minute of this person walking.
|
||||||
|
|
||||||
@ -352,12 +372,16 @@ Sliding Windows are turned on by default and are triggered as soon as you try to
|
|||||||
|
|
||||||
Although the window duration is set by the *Sliding Window Size* form field, the actual number of frames generated by each iteration will be less, because of the *overlap frames* and *discard last frames*:
|
Although the window duration is set by the *Sliding Window Size* form field, the actual number of frames generated by each iteration will be less, because of the *overlap frames* and *discard last frames*:
|
||||||
- *overlap frames* : the first frames of a new window are filled with last frames of the previous window in order to ensure continuity between the two windows
|
- *overlap frames* : the first frames of a new window are filled with last frames of the previous window in order to ensure continuity between the two windows
|
||||||
- *discard last frames* : quite often (Vace model Only) the last frames of a window have a worse quality. You can decide here how many ending frames of a new window should be dropped.
|
- *discard last frames* : sometime (Vace 1.3B model Only) the last frames of a window have a worse quality. You can decide here how many ending frames of a new window should be dropped.
|
||||||
s
|
|
||||||
|
There is some inevitable quality degradation over time to due to accumulated errors in calculation. One trick to reduce it / hide it is to add some noise (usually not noticable) on the overlapped frames using the *add overlapped noise* option.
|
||||||
|
|
||||||
|
|
||||||
Number of Generated Frames = [Number of Windows - 1] * ([Window Size] - [Overlap Frames] - [Discard Last Frames]) + [Window Size]
|
Number of Generated Frames = [Number of Windows - 1] * ([Window Size] - [Overlap Frames] - [Discard Last Frames]) + [Window Size]
|
||||||
|
|
||||||
Experimental: if your prompt is broken into multiple lines (each line separated by a carriage return), then each line of the prompt will be used for a new window. If there are more windows to generate than prompt lines, the last prompt line will be repeated.
|
Experimental: if your prompt is broken into multiple lines (each line separated by a carriage return), then each line of the prompt will be used for a new window. If there are more windows to generate than prompt lines, the last prompt line will be repeated.
|
||||||
|
|
||||||
|
|
||||||
### Command line parameters for Gradio Server
|
### Command line parameters for Gradio Server
|
||||||
--i2v : launch the image to video generator\
|
--i2v : launch the image to video generator\
|
||||||
--t2v : launch the text to video generator (default defined in the configuration)\
|
--t2v : launch the text to video generator (default defined in the configuration)\
|
||||||
|
|||||||
@ -1502,7 +1502,7 @@ class LTXVideoPipeline(DiffusionPipeline):
|
|||||||
extra_conditioning_mask.append(conditioning_mask)
|
extra_conditioning_mask.append(conditioning_mask)
|
||||||
|
|
||||||
# Patchify the updated latents and calculate their pixel coordinates
|
# Patchify the updated latents and calculate their pixel coordinates
|
||||||
init_latents, init_latent_coords = self.patchifier.patchify(
|
init_latents, init_latent_coords = self.patchifier.patchify(
|
||||||
latents=init_latents
|
latents=init_latents
|
||||||
)
|
)
|
||||||
init_pixel_coords = latent_to_pixel_coords(
|
init_pixel_coords = latent_to_pixel_coords(
|
||||||
|
|||||||
@ -85,7 +85,7 @@ def get_frames_from_image(image_input, image_state):
|
|||||||
model.samcontroler.sam_controler.reset_image()
|
model.samcontroler.sam_controler.reset_image()
|
||||||
model.samcontroler.sam_controler.set_image(image_state["origin_images"][0])
|
model.samcontroler.sam_controler.set_image(image_state["origin_images"][0])
|
||||||
return image_state, image_info, image_state["origin_images"][0], \
|
return image_state, image_info, image_state["origin_images"][0], \
|
||||||
gr.update(visible=True, maximum=10, value=10), gr.update(visible=True, maximum=len(frames), value=len(frames)), gr.update(visible=False, maximum=len(frames), value=len(frames)), \
|
gr.update(visible=True, maximum=10, value=10), gr.update(visible=False, maximum=len(frames), value=len(frames)), \
|
||||||
gr.update(visible=True), gr.update(visible=True), \
|
gr.update(visible=True), gr.update(visible=True), \
|
||||||
gr.update(visible=True), gr.update(visible=True),\
|
gr.update(visible=True), gr.update(visible=True),\
|
||||||
gr.update(visible=True), gr.update(visible=True), \
|
gr.update(visible=True), gr.update(visible=True), \
|
||||||
@ -273,6 +273,57 @@ def save_video(frames, output_path, fps):
|
|||||||
|
|
||||||
return output_path
|
return output_path
|
||||||
|
|
||||||
|
# image matting
|
||||||
|
def image_matting(video_state, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size, refine_iter):
|
||||||
|
matanyone_processor = InferenceCore(matanyone_model, cfg=matanyone_model.cfg)
|
||||||
|
if interactive_state["track_end_number"]:
|
||||||
|
following_frames = video_state["origin_images"][video_state["select_frame_number"]:interactive_state["track_end_number"]]
|
||||||
|
else:
|
||||||
|
following_frames = video_state["origin_images"][video_state["select_frame_number"]:]
|
||||||
|
|
||||||
|
if interactive_state["multi_mask"]["masks"]:
|
||||||
|
if len(mask_dropdown) == 0:
|
||||||
|
mask_dropdown = ["mask_001"]
|
||||||
|
mask_dropdown.sort()
|
||||||
|
template_mask = interactive_state["multi_mask"]["masks"][int(mask_dropdown[0].split("_")[1]) - 1] * (int(mask_dropdown[0].split("_")[1]))
|
||||||
|
for i in range(1,len(mask_dropdown)):
|
||||||
|
mask_number = int(mask_dropdown[i].split("_")[1]) - 1
|
||||||
|
template_mask = np.clip(template_mask+interactive_state["multi_mask"]["masks"][mask_number]*(mask_number+1), 0, mask_number+1)
|
||||||
|
video_state["masks"][video_state["select_frame_number"]]= template_mask
|
||||||
|
else:
|
||||||
|
template_mask = video_state["masks"][video_state["select_frame_number"]]
|
||||||
|
|
||||||
|
# operation error
|
||||||
|
if len(np.unique(template_mask))==1:
|
||||||
|
template_mask[0][0]=1
|
||||||
|
foreground, alpha = matanyone(matanyone_processor, following_frames, template_mask*255, r_erode=erode_kernel_size, r_dilate=dilate_kernel_size, n_warmup=refine_iter)
|
||||||
|
|
||||||
|
|
||||||
|
foreground_mat = False
|
||||||
|
|
||||||
|
output_frames = []
|
||||||
|
for frame_origin, frame_alpha in zip(following_frames, alpha):
|
||||||
|
if foreground_mat:
|
||||||
|
frame_alpha[frame_alpha > 127] = 255
|
||||||
|
frame_alpha[frame_alpha <= 127] = 0
|
||||||
|
else:
|
||||||
|
frame_temp = frame_alpha.copy()
|
||||||
|
frame_alpha[frame_temp > 127] = 0
|
||||||
|
frame_alpha[frame_temp <= 127] = 255
|
||||||
|
|
||||||
|
|
||||||
|
output_frame = np.bitwise_and(frame_origin, 255-frame_alpha)
|
||||||
|
frame_grey = frame_alpha.copy()
|
||||||
|
frame_grey[frame_alpha == 255] = 255
|
||||||
|
output_frame += frame_grey
|
||||||
|
output_frames.append(output_frame)
|
||||||
|
foreground = output_frames
|
||||||
|
|
||||||
|
foreground_output = Image.fromarray(foreground[-1])
|
||||||
|
alpha_output = Image.fromarray(alpha[-1][:,:,0])
|
||||||
|
|
||||||
|
return foreground_output, gr.update(visible=True)
|
||||||
|
|
||||||
# video matting
|
# video matting
|
||||||
def video_matting(video_state, end_slider, matting_type, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size):
|
def video_matting(video_state, end_slider, matting_type, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size):
|
||||||
matanyone_processor = InferenceCore(matanyone_model, cfg=matanyone_model.cfg)
|
matanyone_processor = InferenceCore(matanyone_model, cfg=matanyone_model.cfg)
|
||||||
@ -397,7 +448,7 @@ def restart():
|
|||||||
"inference_times": 0,
|
"inference_times": 0,
|
||||||
"negative_click_times" : 0,
|
"negative_click_times" : 0,
|
||||||
"positive_click_times": 0,
|
"positive_click_times": 0,
|
||||||
"mask_save": arg_mask_save,
|
"mask_save": False,
|
||||||
"multi_mask": {
|
"multi_mask": {
|
||||||
"mask_names": [],
|
"mask_names": [],
|
||||||
"masks": []
|
"masks": []
|
||||||
@ -457,6 +508,15 @@ def export_to_vace_video_input(foreground_video_output):
|
|||||||
gr.Info("Masked Video Input transferred to Vace For Inpainting")
|
gr.Info("Masked Video Input transferred to Vace For Inpainting")
|
||||||
return "V#" + str(time.time()), foreground_video_output
|
return "V#" + str(time.time()), foreground_video_output
|
||||||
|
|
||||||
|
|
||||||
|
def export_image(image_refs, image_output):
|
||||||
|
gr.Info("Masked Image transferred to Current Video")
|
||||||
|
# return "MV#" + str(time.time()), foreground_video_output, alpha_video_output
|
||||||
|
if image_refs == None:
|
||||||
|
image_refs =[]
|
||||||
|
image_refs.append( image_output)
|
||||||
|
return image_refs
|
||||||
|
|
||||||
def export_to_current_video_engine(foreground_video_output, alpha_video_output):
|
def export_to_current_video_engine(foreground_video_output, alpha_video_output):
|
||||||
gr.Info("Masked Video Input and Full Mask transferred to Current Video Engine For Inpainting")
|
gr.Info("Masked Video Input and Full Mask transferred to Current Video Engine For Inpainting")
|
||||||
# return "MV#" + str(time.time()), foreground_video_output, alpha_video_output
|
# return "MV#" + str(time.time()), foreground_video_output, alpha_video_output
|
||||||
@ -471,14 +531,17 @@ def teleport_to_vace_1_3B():
|
|||||||
def teleport_to_vace_14B():
|
def teleport_to_vace_14B():
|
||||||
return gr.Tabs(selected="video_gen"), gr.Dropdown(value="vace_14B")
|
return gr.Tabs(selected="video_gen"), gr.Dropdown(value="vace_14B")
|
||||||
|
|
||||||
def display(tabs, model_choice, vace_video_input, vace_video_mask, video_prompt_video_guide_trigger):
|
def display(tabs, model_choice, vace_video_input, vace_video_mask, vace_image_refs, video_prompt_video_guide_trigger):
|
||||||
# my_tab.select(fn=load_unload_models, inputs=[], outputs=[])
|
# my_tab.select(fn=load_unload_models, inputs=[], outputs=[])
|
||||||
|
|
||||||
media_url = "https://github.com/pq-yang/MatAnyone/releases/download/media/"
|
media_url = "https://github.com/pq-yang/MatAnyone/releases/download/media/"
|
||||||
|
|
||||||
# download assets
|
# download assets
|
||||||
|
|
||||||
gr.Markdown("Mast Edition is provided by MatAnyone")
|
gr.Markdown("<B>Mast Edition is provided by MatAnyone</B>")
|
||||||
|
gr.Markdown("If you have some trouble creating the perfect mask, be aware of these tips:")
|
||||||
|
gr.Markdown("- Using the Matanyone Settings you can also define Negative Point Prompts to remove parts of the current selection.")
|
||||||
|
gr.Markdown("- Sometime it is very hard to fit everything you want in a single mask, it may be much easier to combine multiple independent sub Masks before producing the Matting : each sub Mask is created by selecting an area of an image and by clicking the Add Mask button. Sub masks can then be enabled / disabled in the Matanyone settings.")
|
||||||
|
|
||||||
with gr.Column( visible=True):
|
with gr.Column( visible=True):
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
@ -493,216 +556,368 @@ def display(tabs, model_choice, vace_video_input, vace_video_mask, video_prompt_
|
|||||||
gr.Video(value="preprocessing/matanyone/tutorial_multi_targets.mp4", elem_classes="video")
|
gr.Video(value="preprocessing/matanyone/tutorial_multi_targets.mp4", elem_classes="video")
|
||||||
|
|
||||||
|
|
||||||
click_state = gr.State([[],[]])
|
|
||||||
|
|
||||||
interactive_state = gr.State({
|
|
||||||
"inference_times": 0,
|
|
||||||
"negative_click_times" : 0,
|
|
||||||
"positive_click_times": 0,
|
|
||||||
"mask_save": arg_mask_save,
|
|
||||||
"multi_mask": {
|
|
||||||
"mask_names": [],
|
|
||||||
"masks": []
|
|
||||||
},
|
|
||||||
"track_end_number": None,
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
video_state = gr.State(
|
with gr.Tabs():
|
||||||
{
|
with gr.TabItem("Video"):
|
||||||
"user_name": "",
|
|
||||||
"video_name": "",
|
|
||||||
"origin_images": None,
|
|
||||||
"painted_images": None,
|
|
||||||
"masks": None,
|
|
||||||
"inpaint_masks": None,
|
|
||||||
"logits": None,
|
|
||||||
"select_frame_number": 0,
|
|
||||||
"fps": 16,
|
|
||||||
"audio": "",
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
with gr.Column( visible=True):
|
click_state = gr.State([[],[]])
|
||||||
with gr.Row():
|
|
||||||
with gr.Accordion('MatAnyone Settings (click to expand)', open=False):
|
interactive_state = gr.State({
|
||||||
|
"inference_times": 0,
|
||||||
|
"negative_click_times" : 0,
|
||||||
|
"positive_click_times": 0,
|
||||||
|
"mask_save": arg_mask_save,
|
||||||
|
"multi_mask": {
|
||||||
|
"mask_names": [],
|
||||||
|
"masks": []
|
||||||
|
},
|
||||||
|
"track_end_number": None,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
video_state = gr.State(
|
||||||
|
{
|
||||||
|
"user_name": "",
|
||||||
|
"video_name": "",
|
||||||
|
"origin_images": None,
|
||||||
|
"painted_images": None,
|
||||||
|
"masks": None,
|
||||||
|
"inpaint_masks": None,
|
||||||
|
"logits": None,
|
||||||
|
"select_frame_number": 0,
|
||||||
|
"fps": 16,
|
||||||
|
"audio": "",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
with gr.Column( visible=True):
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
erode_kernel_size = gr.Slider(label='Erode Kernel Size',
|
with gr.Accordion('MatAnyone Settings (click to expand)', open=False):
|
||||||
minimum=0,
|
with gr.Row():
|
||||||
maximum=30,
|
erode_kernel_size = gr.Slider(label='Erode Kernel Size',
|
||||||
step=1,
|
minimum=0,
|
||||||
value=10,
|
maximum=30,
|
||||||
info="Erosion on the added mask",
|
step=1,
|
||||||
interactive=True)
|
value=10,
|
||||||
dilate_kernel_size = gr.Slider(label='Dilate Kernel Size',
|
info="Erosion on the added mask",
|
||||||
minimum=0,
|
interactive=True)
|
||||||
maximum=30,
|
dilate_kernel_size = gr.Slider(label='Dilate Kernel Size',
|
||||||
step=1,
|
minimum=0,
|
||||||
value=10,
|
maximum=30,
|
||||||
info="Dilation on the added mask",
|
step=1,
|
||||||
interactive=True)
|
value=10,
|
||||||
|
info="Dilation on the added mask",
|
||||||
|
interactive=True)
|
||||||
|
|
||||||
|
with gr.Row():
|
||||||
|
image_selection_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Start Frame", info="Choose the start frame for target assignment and video matting", visible=False)
|
||||||
|
end_selection_slider = gr.Slider(minimum=1, maximum=300, step=1, value=81, label="Last Frame to Process", info="Last Frame to Process", visible=False)
|
||||||
|
|
||||||
|
track_pause_number_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="End frame", visible=False)
|
||||||
|
with gr.Row():
|
||||||
|
point_prompt = gr.Radio(
|
||||||
|
choices=["Positive", "Negative"],
|
||||||
|
value="Positive",
|
||||||
|
label="Point Prompt",
|
||||||
|
info="Click to add positive or negative point for target mask",
|
||||||
|
interactive=True,
|
||||||
|
visible=False,
|
||||||
|
min_width=100,
|
||||||
|
scale=1)
|
||||||
|
matting_type = gr.Radio(
|
||||||
|
choices=["Foreground", "Background"],
|
||||||
|
value="Foreground",
|
||||||
|
label="Matting Type",
|
||||||
|
info="Type of Video Matting to Generate",
|
||||||
|
interactive=True,
|
||||||
|
visible=False,
|
||||||
|
min_width=100,
|
||||||
|
scale=1)
|
||||||
|
mask_dropdown = gr.Dropdown(multiselect=True, value=[], label="Mask Selection", info="Choose 1~all mask(s) added in Step 2", visible=False, scale=2)
|
||||||
|
|
||||||
|
# input video
|
||||||
|
with gr.Row(equal_height=True):
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
gr.Markdown("## Step1: Upload video")
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
step2_title = gr.Markdown("## Step2: Add masks <small>(Several clicks then **`Add Mask`** <u>one by one</u>)</small>", visible=False)
|
||||||
|
with gr.Row(equal_height=True):
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
video_input = gr.Video(label="Input Video", elem_classes="video")
|
||||||
|
extract_frames_button = gr.Button(value="Load Video", interactive=True, elem_classes="new_button")
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
video_info = gr.Textbox(label="Video Info", visible=False)
|
||||||
|
template_frame = gr.Image(label="Start Frame", type="pil",interactive=True, elem_id="template_frame", visible=False, elem_classes="image")
|
||||||
|
with gr.Row():
|
||||||
|
clear_button_click = gr.Button(value="Clear Clicks", interactive=True, visible=False, min_width=100)
|
||||||
|
add_mask_button = gr.Button(value="Set Mask", interactive=True, visible=False, min_width=100)
|
||||||
|
remove_mask_button = gr.Button(value="Remove Mask", interactive=True, visible=False, min_width=100) # no use
|
||||||
|
matting_button = gr.Button(value="Generate Video Matting", interactive=True, visible=False, min_width=100)
|
||||||
|
with gr.Row():
|
||||||
|
gr.Markdown("")
|
||||||
|
|
||||||
|
# output video
|
||||||
|
with gr.Column() as output_row: #equal_height=True
|
||||||
|
with gr.Row():
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
foreground_video_output = gr.Video(label="Masked Video Output", visible=False, elem_classes="video")
|
||||||
|
foreground_output_button = gr.Button(value="Black & White Video Output", visible=False, elem_classes="new_button")
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
alpha_video_output = gr.Video(label="B & W Mask Video Output", visible=False, elem_classes="video")
|
||||||
|
alpha_output_button = gr.Button(value="Alpha Mask Output", visible=False, elem_classes="new_button")
|
||||||
|
with gr.Row():
|
||||||
|
with gr.Row(visible= False):
|
||||||
|
export_to_vace_video_14B_btn = gr.Button("Export to current Video Input Video For Inpainting", visible= False)
|
||||||
|
with gr.Row(visible= True):
|
||||||
|
export_to_current_video_engine_btn = gr.Button("Export to current Video Input and Video Mask", visible= False)
|
||||||
|
|
||||||
|
export_to_vace_video_14B_btn.click( fn=teleport_to_vace_14B, inputs=[], outputs=[tabs, model_choice]).then(
|
||||||
|
fn=export_to_current_video_engine, inputs= [foreground_video_output, alpha_video_output], outputs= [video_prompt_video_guide_trigger, vace_video_input, vace_video_mask])
|
||||||
|
|
||||||
|
export_to_current_video_engine_btn.click( fn=export_to_current_video_engine, inputs= [foreground_video_output, alpha_video_output], outputs= [vace_video_input, vace_video_mask]).then( #video_prompt_video_guide_trigger,
|
||||||
|
fn=teleport_to_video_tab, inputs= [], outputs= [tabs])
|
||||||
|
|
||||||
|
|
||||||
|
# first step: get the video information
|
||||||
|
extract_frames_button.click(
|
||||||
|
fn=get_frames_from_video,
|
||||||
|
inputs=[
|
||||||
|
video_input, video_state
|
||||||
|
],
|
||||||
|
outputs=[video_state, video_info, template_frame,
|
||||||
|
image_selection_slider, end_selection_slider, track_pause_number_slider, point_prompt, matting_type, clear_button_click, add_mask_button, matting_button, template_frame,
|
||||||
|
foreground_video_output, alpha_video_output, foreground_output_button, alpha_output_button, mask_dropdown, step2_title]
|
||||||
|
)
|
||||||
|
|
||||||
|
# second step: select images from slider
|
||||||
|
image_selection_slider.release(fn=select_video_template,
|
||||||
|
inputs=[image_selection_slider, video_state, interactive_state],
|
||||||
|
outputs=[template_frame, video_state, interactive_state], api_name="select_image")
|
||||||
|
track_pause_number_slider.release(fn=get_end_number,
|
||||||
|
inputs=[track_pause_number_slider, video_state, interactive_state],
|
||||||
|
outputs=[template_frame, interactive_state], api_name="end_image")
|
||||||
|
|
||||||
|
# click select image to get mask using sam
|
||||||
|
template_frame.select(
|
||||||
|
fn=sam_refine,
|
||||||
|
inputs=[video_state, point_prompt, click_state, interactive_state],
|
||||||
|
outputs=[template_frame, video_state, interactive_state]
|
||||||
|
)
|
||||||
|
|
||||||
|
# add different mask
|
||||||
|
add_mask_button.click(
|
||||||
|
fn=add_multi_mask,
|
||||||
|
inputs=[video_state, interactive_state, mask_dropdown],
|
||||||
|
outputs=[interactive_state, mask_dropdown, template_frame, click_state]
|
||||||
|
)
|
||||||
|
|
||||||
|
remove_mask_button.click(
|
||||||
|
fn=remove_multi_mask,
|
||||||
|
inputs=[interactive_state, mask_dropdown],
|
||||||
|
outputs=[interactive_state, mask_dropdown]
|
||||||
|
)
|
||||||
|
|
||||||
|
# video matting
|
||||||
|
matting_button.click(
|
||||||
|
fn=show_outputs,
|
||||||
|
inputs=[],
|
||||||
|
outputs=[foreground_video_output, alpha_video_output]).then(
|
||||||
|
fn=video_matting,
|
||||||
|
inputs=[video_state, end_selection_slider, matting_type, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size],
|
||||||
|
outputs=[foreground_video_output, alpha_video_output,foreground_video_output, alpha_video_output, export_to_vace_video_14B_btn, export_to_current_video_engine_btn]
|
||||||
|
)
|
||||||
|
|
||||||
|
# click to get mask
|
||||||
|
mask_dropdown.change(
|
||||||
|
fn=show_mask,
|
||||||
|
inputs=[video_state, interactive_state, mask_dropdown],
|
||||||
|
outputs=[template_frame]
|
||||||
|
)
|
||||||
|
|
||||||
|
# clear input
|
||||||
|
video_input.change(
|
||||||
|
fn=restart,
|
||||||
|
inputs=[],
|
||||||
|
outputs=[
|
||||||
|
video_state,
|
||||||
|
interactive_state,
|
||||||
|
click_state,
|
||||||
|
foreground_video_output, alpha_video_output,
|
||||||
|
template_frame,
|
||||||
|
image_selection_slider, end_selection_slider, track_pause_number_slider,point_prompt, export_to_vace_video_14B_btn, export_to_current_video_engine_btn, matting_type, clear_button_click,
|
||||||
|
add_mask_button, matting_button, template_frame, foreground_video_output, alpha_video_output, remove_mask_button, foreground_output_button, alpha_output_button, mask_dropdown, video_info, step2_title
|
||||||
|
],
|
||||||
|
queue=False,
|
||||||
|
show_progress=False)
|
||||||
|
|
||||||
|
video_input.clear(
|
||||||
|
fn=restart,
|
||||||
|
inputs=[],
|
||||||
|
outputs=[
|
||||||
|
video_state,
|
||||||
|
interactive_state,
|
||||||
|
click_state,
|
||||||
|
foreground_video_output, alpha_video_output,
|
||||||
|
template_frame,
|
||||||
|
image_selection_slider , end_selection_slider, track_pause_number_slider,point_prompt, export_to_vace_video_14B_btn, export_to_current_video_engine_btn, matting_type, clear_button_click,
|
||||||
|
add_mask_button, matting_button, template_frame, foreground_video_output, alpha_video_output, remove_mask_button, foreground_output_button, alpha_output_button, mask_dropdown, video_info, step2_title
|
||||||
|
],
|
||||||
|
queue=False,
|
||||||
|
show_progress=False)
|
||||||
|
|
||||||
|
# points clear
|
||||||
|
clear_button_click.click(
|
||||||
|
fn = clear_click,
|
||||||
|
inputs = [video_state, click_state,],
|
||||||
|
outputs = [template_frame,click_state],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
with gr.TabItem("Image"):
|
||||||
|
click_state = gr.State([[],[]])
|
||||||
|
|
||||||
|
interactive_state = gr.State({
|
||||||
|
"inference_times": 0,
|
||||||
|
"negative_click_times" : 0,
|
||||||
|
"positive_click_times": 0,
|
||||||
|
"mask_save": False,
|
||||||
|
"multi_mask": {
|
||||||
|
"mask_names": [],
|
||||||
|
"masks": []
|
||||||
|
},
|
||||||
|
"track_end_number": None,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
image_state = gr.State(
|
||||||
|
{
|
||||||
|
"user_name": "",
|
||||||
|
"image_name": "",
|
||||||
|
"origin_images": None,
|
||||||
|
"painted_images": None,
|
||||||
|
"masks": None,
|
||||||
|
"inpaint_masks": None,
|
||||||
|
"logits": None,
|
||||||
|
"select_frame_number": 0,
|
||||||
|
"fps": 30
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
with gr.Group(elem_classes="gr-monochrome-group", visible=True):
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
image_selection_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Start Frame", info="Choose the start frame for target assignment and video matting", visible=False)
|
with gr.Accordion('MatAnyone Settings (click to expand)', open=False):
|
||||||
end_selection_slider = gr.Slider(minimum=1, maximum=300, step=1, value=81, label="Last Frame to Process", info="Last Frame to Process", visible=False)
|
with gr.Row():
|
||||||
|
erode_kernel_size = gr.Slider(label='Erode Kernel Size',
|
||||||
|
minimum=0,
|
||||||
|
maximum=30,
|
||||||
|
step=1,
|
||||||
|
value=10,
|
||||||
|
info="Erosion on the added mask",
|
||||||
|
interactive=True)
|
||||||
|
dilate_kernel_size = gr.Slider(label='Dilate Kernel Size',
|
||||||
|
minimum=0,
|
||||||
|
maximum=30,
|
||||||
|
step=1,
|
||||||
|
value=10,
|
||||||
|
info="Dilation on the added mask",
|
||||||
|
interactive=True)
|
||||||
|
|
||||||
track_pause_number_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="End frame", visible=False)
|
with gr.Row():
|
||||||
|
image_selection_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Num of Refinement Iterations", info="More iterations → More details & More time", visible=False)
|
||||||
|
track_pause_number_slider = gr.Slider(minimum=1, maximum=100, step=1, value=1, label="Track end frame", visible=False)
|
||||||
|
with gr.Row():
|
||||||
|
point_prompt = gr.Radio(
|
||||||
|
choices=["Positive", "Negative"],
|
||||||
|
value="Positive",
|
||||||
|
label="Point Prompt",
|
||||||
|
info="Click to add positive or negative point for target mask",
|
||||||
|
interactive=True,
|
||||||
|
visible=False,
|
||||||
|
min_width=100,
|
||||||
|
scale=1)
|
||||||
|
mask_dropdown = gr.Dropdown(multiselect=True, value=[], label="Mask Selection", info="Choose 1~all mask(s) added in Step 2", visible=False)
|
||||||
|
|
||||||
|
|
||||||
|
with gr.Column():
|
||||||
|
# input image
|
||||||
|
with gr.Row(equal_height=True):
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
gr.Markdown("## Step1: Upload image")
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
step2_title = gr.Markdown("## Step2: Add masks <small>(Several clicks then **`Add Mask`** <u>one by one</u>)</small>", visible=False)
|
||||||
|
with gr.Row(equal_height=True):
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
image_input = gr.Image(label="Input Image", elem_classes="image")
|
||||||
|
extract_frames_button = gr.Button(value="Load Image", interactive=True, elem_classes="new_button")
|
||||||
|
with gr.Column(scale=2):
|
||||||
|
image_info = gr.Textbox(label="Image Info", visible=False)
|
||||||
|
template_frame = gr.Image(type="pil", label="Start Frame", interactive=True, elem_id="template_frame", visible=False, elem_classes="image")
|
||||||
|
with gr.Row(equal_height=True, elem_classes="mask_button_group"):
|
||||||
|
clear_button_click = gr.Button(value="Clear Clicks", interactive=True, visible=False, elem_classes="new_button", min_width=100)
|
||||||
|
add_mask_button = gr.Button(value="Add Mask", interactive=True, visible=False, elem_classes="new_button", min_width=100)
|
||||||
|
remove_mask_button = gr.Button(value="Remove Mask", interactive=True, visible=False, elem_classes="new_button", min_width=100)
|
||||||
|
matting_button = gr.Button(value="Image Matting", interactive=True, visible=False, elem_classes="green_button", min_width=100)
|
||||||
|
|
||||||
|
# output image
|
||||||
|
with gr.Row(equal_height=True):
|
||||||
|
foreground_image_output = gr.Image(type="pil", label="Foreground Output", visible=False, elem_classes="image")
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
point_prompt = gr.Radio(
|
with gr.Row():
|
||||||
choices=["Positive", "Negative"],
|
export_image_btn = gr.Button(value="Add to current Reference Images", visible=False, elem_classes="new_button")
|
||||||
value="Positive",
|
with gr.Column(scale=2, visible= False):
|
||||||
label="Point Prompt",
|
alpha_image_output = gr.Image(type="pil", label="Alpha Output", visible=False, elem_classes="image")
|
||||||
info="Click to add positive or negative point for target mask",
|
|
||||||
interactive=True,
|
|
||||||
visible=False,
|
|
||||||
min_width=100,
|
|
||||||
scale=1)
|
|
||||||
matting_type = gr.Radio(
|
|
||||||
choices=["Foreground", "Background"],
|
|
||||||
value="Foreground",
|
|
||||||
label="Matting Type",
|
|
||||||
info="Type of Video Matting to Generate",
|
|
||||||
interactive=True,
|
|
||||||
visible=False,
|
|
||||||
min_width=100,
|
|
||||||
scale=1)
|
|
||||||
mask_dropdown = gr.Dropdown(multiselect=True, value=[], label="Mask Selection", info="Choose 1~all mask(s) added in Step 2", visible=False, scale=2)
|
|
||||||
|
|
||||||
gr.Markdown("---")
|
|
||||||
|
|
||||||
with gr.Column():
|
|
||||||
# input video
|
|
||||||
with gr.Row(equal_height=True):
|
|
||||||
with gr.Column(scale=2):
|
|
||||||
gr.Markdown("## Step1: Upload video")
|
|
||||||
with gr.Column(scale=2):
|
|
||||||
step2_title = gr.Markdown("## Step2: Add masks <small>(Several clicks then **`Add Mask`** <u>one by one</u>)</small>", visible=False)
|
|
||||||
with gr.Row(equal_height=True):
|
|
||||||
with gr.Column(scale=2):
|
|
||||||
video_input = gr.Video(label="Input Video", elem_classes="video")
|
|
||||||
extract_frames_button = gr.Button(value="Load Video", interactive=True, elem_classes="new_button")
|
|
||||||
with gr.Column(scale=2):
|
|
||||||
video_info = gr.Textbox(label="Video Info", visible=False)
|
|
||||||
template_frame = gr.Image(label="Start Frame", type="pil",interactive=True, elem_id="template_frame", visible=False, elem_classes="image")
|
|
||||||
with gr.Row():
|
|
||||||
clear_button_click = gr.Button(value="Clear Clicks", interactive=True, visible=False, min_width=100)
|
|
||||||
add_mask_button = gr.Button(value="Set Mask", interactive=True, visible=False, min_width=100)
|
|
||||||
remove_mask_button = gr.Button(value="Remove Mask", interactive=True, visible=False, min_width=100) # no use
|
|
||||||
matting_button = gr.Button(value="Generate Video Matting", interactive=True, visible=False, min_width=100)
|
|
||||||
with gr.Row():
|
|
||||||
gr.Markdown("")
|
|
||||||
|
|
||||||
# output video
|
|
||||||
with gr.Column() as output_row: #equal_height=True
|
|
||||||
with gr.Row():
|
|
||||||
with gr.Column(scale=2):
|
|
||||||
foreground_video_output = gr.Video(label="Masked Video Output", visible=False, elem_classes="video")
|
|
||||||
foreground_output_button = gr.Button(value="Black & White Video Output", visible=False, elem_classes="new_button")
|
|
||||||
with gr.Column(scale=2):
|
|
||||||
alpha_video_output = gr.Video(label="B & W Mask Video Output", visible=False, elem_classes="video")
|
|
||||||
alpha_output_button = gr.Button(value="Alpha Mask Output", visible=False, elem_classes="new_button")
|
alpha_output_button = gr.Button(value="Alpha Mask Output", visible=False, elem_classes="new_button")
|
||||||
with gr.Row():
|
|
||||||
with gr.Row(visible= False):
|
|
||||||
export_to_vace_video_14B_btn = gr.Button("Export to current Video Input Video For Inpainting", visible= False)
|
|
||||||
with gr.Row(visible= True):
|
|
||||||
export_to_current_video_engine_btn = gr.Button("Export to current Video Input and Video Mask", visible= False)
|
|
||||||
|
|
||||||
export_to_vace_video_14B_btn.click( fn=teleport_to_vace_14B, inputs=[], outputs=[tabs, model_choice]).then(
|
export_image_btn.click( fn=export_image, inputs= [vace_image_refs, foreground_image_output], outputs= [vace_image_refs]).then( #video_prompt_video_guide_trigger,
|
||||||
fn=export_to_current_video_engine, inputs= [foreground_video_output, alpha_video_output], outputs= [video_prompt_video_guide_trigger, vace_video_input, vace_video_mask])
|
fn=teleport_to_video_tab, inputs= [], outputs= [tabs])
|
||||||
|
|
||||||
export_to_current_video_engine_btn.click( fn=export_to_current_video_engine, inputs= [foreground_video_output, alpha_video_output], outputs= [vace_video_input, vace_video_mask]).then( #video_prompt_video_guide_trigger,
|
# first step: get the image information
|
||||||
fn=teleport_to_video_tab, inputs= [], outputs= [tabs])
|
extract_frames_button.click(
|
||||||
|
fn=get_frames_from_image,
|
||||||
|
inputs=[
|
||||||
|
image_input, image_state
|
||||||
|
],
|
||||||
|
outputs=[image_state, image_info, template_frame,
|
||||||
|
image_selection_slider, track_pause_number_slider,point_prompt, clear_button_click, add_mask_button, matting_button, template_frame,
|
||||||
|
foreground_image_output, alpha_image_output, export_image_btn, alpha_output_button, mask_dropdown, step2_title]
|
||||||
|
)
|
||||||
|
|
||||||
# first step: get the video information
|
# second step: select images from slider
|
||||||
extract_frames_button.click(
|
image_selection_slider.release(fn=select_image_template,
|
||||||
fn=get_frames_from_video,
|
inputs=[image_selection_slider, image_state, interactive_state],
|
||||||
inputs=[
|
outputs=[template_frame, image_state, interactive_state], api_name="select_image")
|
||||||
video_input, video_state
|
track_pause_number_slider.release(fn=get_end_number,
|
||||||
],
|
inputs=[track_pause_number_slider, image_state, interactive_state],
|
||||||
outputs=[video_state, video_info, template_frame,
|
outputs=[template_frame, interactive_state], api_name="end_image")
|
||||||
image_selection_slider, end_selection_slider, track_pause_number_slider, point_prompt, matting_type, clear_button_click, add_mask_button, matting_button, template_frame,
|
|
||||||
foreground_video_output, alpha_video_output, foreground_output_button, alpha_output_button, mask_dropdown, step2_title]
|
|
||||||
)
|
|
||||||
|
|
||||||
# second step: select images from slider
|
# click select image to get mask using sam
|
||||||
image_selection_slider.release(fn=select_video_template,
|
template_frame.select(
|
||||||
inputs=[image_selection_slider, video_state, interactive_state],
|
fn=sam_refine,
|
||||||
outputs=[template_frame, video_state, interactive_state], api_name="select_image")
|
inputs=[image_state, point_prompt, click_state, interactive_state],
|
||||||
track_pause_number_slider.release(fn=get_end_number,
|
outputs=[template_frame, image_state, interactive_state]
|
||||||
inputs=[track_pause_number_slider, video_state, interactive_state],
|
)
|
||||||
outputs=[template_frame, interactive_state], api_name="end_image")
|
|
||||||
|
|
||||||
# click select image to get mask using sam
|
# add different mask
|
||||||
template_frame.select(
|
add_mask_button.click(
|
||||||
fn=sam_refine,
|
fn=add_multi_mask,
|
||||||
inputs=[video_state, point_prompt, click_state, interactive_state],
|
inputs=[image_state, interactive_state, mask_dropdown],
|
||||||
outputs=[template_frame, video_state, interactive_state]
|
outputs=[interactive_state, mask_dropdown, template_frame, click_state]
|
||||||
)
|
)
|
||||||
|
|
||||||
# add different mask
|
remove_mask_button.click(
|
||||||
add_mask_button.click(
|
fn=remove_multi_mask,
|
||||||
fn=add_multi_mask,
|
inputs=[interactive_state, mask_dropdown],
|
||||||
inputs=[video_state, interactive_state, mask_dropdown],
|
outputs=[interactive_state, mask_dropdown]
|
||||||
outputs=[interactive_state, mask_dropdown, template_frame, click_state]
|
)
|
||||||
)
|
|
||||||
|
|
||||||
remove_mask_button.click(
|
# image matting
|
||||||
fn=remove_multi_mask,
|
matting_button.click(
|
||||||
inputs=[interactive_state, mask_dropdown],
|
fn=image_matting,
|
||||||
outputs=[interactive_state, mask_dropdown]
|
inputs=[image_state, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size, image_selection_slider],
|
||||||
)
|
outputs=[foreground_image_output, export_image_btn]
|
||||||
|
)
|
||||||
|
|
||||||
# video matting
|
|
||||||
matting_button.click(
|
|
||||||
fn=show_outputs,
|
|
||||||
inputs=[],
|
|
||||||
outputs=[foreground_video_output, alpha_video_output]).then(
|
|
||||||
fn=video_matting,
|
|
||||||
inputs=[video_state, end_selection_slider, matting_type, interactive_state, mask_dropdown, erode_kernel_size, dilate_kernel_size],
|
|
||||||
outputs=[foreground_video_output, alpha_video_output,foreground_video_output, alpha_video_output, export_to_vace_video_14B_btn, export_to_current_video_engine_btn]
|
|
||||||
)
|
|
||||||
|
|
||||||
# click to get mask
|
|
||||||
mask_dropdown.change(
|
|
||||||
fn=show_mask,
|
|
||||||
inputs=[video_state, interactive_state, mask_dropdown],
|
|
||||||
outputs=[template_frame]
|
|
||||||
)
|
|
||||||
|
|
||||||
# clear input
|
|
||||||
video_input.change(
|
|
||||||
fn=restart,
|
|
||||||
inputs=[],
|
|
||||||
outputs=[
|
|
||||||
video_state,
|
|
||||||
interactive_state,
|
|
||||||
click_state,
|
|
||||||
foreground_video_output, alpha_video_output,
|
|
||||||
template_frame,
|
|
||||||
image_selection_slider, end_selection_slider, track_pause_number_slider,point_prompt, export_to_vace_video_14B_btn, export_to_current_video_engine_btn, matting_type, clear_button_click,
|
|
||||||
add_mask_button, matting_button, template_frame, foreground_video_output, alpha_video_output, remove_mask_button, foreground_output_button, alpha_output_button, mask_dropdown, video_info, step2_title
|
|
||||||
],
|
|
||||||
queue=False,
|
|
||||||
show_progress=False)
|
|
||||||
|
|
||||||
video_input.clear(
|
|
||||||
fn=restart,
|
|
||||||
inputs=[],
|
|
||||||
outputs=[
|
|
||||||
video_state,
|
|
||||||
interactive_state,
|
|
||||||
click_state,
|
|
||||||
foreground_video_output, alpha_video_output,
|
|
||||||
template_frame,
|
|
||||||
image_selection_slider , end_selection_slider, track_pause_number_slider,point_prompt, export_to_vace_video_14B_btn, export_to_current_video_engine_btn, matting_type, clear_button_click,
|
|
||||||
add_mask_button, matting_button, template_frame, foreground_video_output, alpha_video_output, remove_mask_button, foreground_output_button, alpha_output_button, mask_dropdown, video_info, step2_title
|
|
||||||
],
|
|
||||||
queue=False,
|
|
||||||
show_progress=False)
|
|
||||||
|
|
||||||
# points clear
|
|
||||||
clear_button_click.click(
|
|
||||||
fn = clear_click,
|
|
||||||
inputs = [video_state, click_state,],
|
|
||||||
outputs = [template_frame,click_state],
|
|
||||||
)
|
|
||||||
|
|||||||
BIN
preprocessing/matanyone/tutorial_multi_targets.mp4
Normal file
BIN
preprocessing/matanyone/tutorial_multi_targets.mp4
Normal file
Binary file not shown.
BIN
preprocessing/matanyone/tutorial_single_target.mp4
Normal file
BIN
preprocessing/matanyone/tutorial_single_target.mp4
Normal file
Binary file not shown.
@ -111,7 +111,7 @@ class WanT2V:
|
|||||||
|
|
||||||
self.adapt_vace_model()
|
self.adapt_vace_model()
|
||||||
|
|
||||||
def vace_encode_frames(self, frames, ref_images, masks=None, tile_size = 0, overlapped_latents = 0, overlap_noise = 0):
|
def vace_encode_frames(self, frames, ref_images, masks=None, tile_size = 0, overlapped_latents = None):
|
||||||
if ref_images is None:
|
if ref_images is None:
|
||||||
ref_images = [None] * len(frames)
|
ref_images = [None] * len(frames)
|
||||||
else:
|
else:
|
||||||
@ -123,10 +123,10 @@ class WanT2V:
|
|||||||
inactive = [i * (1 - m) + 0 * m for i, m in zip(frames, masks)]
|
inactive = [i * (1 - m) + 0 * m for i, m in zip(frames, masks)]
|
||||||
reactive = [i * m + 0 * (1 - m) for i, m in zip(frames, masks)]
|
reactive = [i * m + 0 * (1 - m) for i, m in zip(frames, masks)]
|
||||||
inactive = self.vae.encode(inactive, tile_size = tile_size)
|
inactive = self.vae.encode(inactive, tile_size = tile_size)
|
||||||
# inactive = [ t * (1.0 - noise_factor) + torch.randn_like(t ) * noise_factor for t in inactive]
|
self.toto = inactive[0].clone()
|
||||||
# if overlapped_latents > 0:
|
if overlapped_latents != None :
|
||||||
# for t in inactive:
|
# inactive[0][:, 0:1] = self.vae.encode([frames[0][:, 0:1]], tile_size = tile_size)[0] # redundant
|
||||||
# t[:, :overlapped_latents ] = t[:, :overlapped_latents ] * (1.0 - noise_factor) + torch.randn_like(t[:, :overlapped_latents ] ) * noise_factor
|
inactive[0][:, 1:overlapped_latents.shape[1] + 1] = overlapped_latents
|
||||||
|
|
||||||
reactive = self.vae.encode(reactive, tile_size = tile_size)
|
reactive = self.vae.encode(reactive, tile_size = tile_size)
|
||||||
latents = [torch.cat((u, c), dim=0) for u, c in zip(inactive, reactive)]
|
latents = [torch.cat((u, c), dim=0) for u, c in zip(inactive, reactive)]
|
||||||
@ -190,13 +190,13 @@ class WanT2V:
|
|||||||
num_frames = total_frames - prepend_count
|
num_frames = total_frames - prepend_count
|
||||||
if sub_src_mask is not None and sub_src_video is not None:
|
if sub_src_mask is not None and sub_src_video is not None:
|
||||||
src_video[i], src_mask[i], _, _, _ = self.vid_proc.load_video_pair(sub_src_video, sub_src_mask, max_frames= num_frames, trim_video = trim_video - prepend_count, start_frame = start_frame, canvas_height = canvas_height, canvas_width = canvas_width, fit_into_canvas = fit_into_canvas)
|
src_video[i], src_mask[i], _, _, _ = self.vid_proc.load_video_pair(sub_src_video, sub_src_mask, max_frames= num_frames, trim_video = trim_video - prepend_count, start_frame = start_frame, canvas_height = canvas_height, canvas_width = canvas_width, fit_into_canvas = fit_into_canvas)
|
||||||
# src_video is [-1, 1], 0 = inpainting area (in fact 127 in [0, 255])
|
# src_video is [-1, 1] (at this function output), 0 = inpainting area (in fact 127 in [0, 255])
|
||||||
# src_mask is [-1, 1], 0 = preserve original video (in fact 127 in [0, 255]) and 1 = Inpainting (in fact 255 in [0, 255])
|
# src_mask is [-1, 1] (at this function output), 0 = preserve original video (in fact 127 in [0, 255]) and 1 = Inpainting (in fact 255 in [0, 255])
|
||||||
src_video[i] = src_video[i].to(device)
|
src_video[i] = src_video[i].to(device)
|
||||||
src_mask[i] = src_mask[i].to(device)
|
src_mask[i] = src_mask[i].to(device)
|
||||||
if prepend_count > 0:
|
if prepend_count > 0:
|
||||||
src_video[i] = torch.cat( [sub_pre_src_video, src_video[i]], dim=1)
|
src_video[i] = torch.cat( [sub_pre_src_video, src_video[i]], dim=1)
|
||||||
src_mask[i] = torch.cat( [torch.zeros_like(sub_pre_src_video), src_mask[i]] ,1)
|
src_mask[i] = torch.cat( [torch.full_like(sub_pre_src_video, -1.0), src_mask[i]] ,1)
|
||||||
src_video_shape = src_video[i].shape
|
src_video_shape = src_video[i].shape
|
||||||
if src_video_shape[1] != total_frames:
|
if src_video_shape[1] != total_frames:
|
||||||
src_video[i] = torch.cat( [src_video[i], src_video[i].new_zeros(src_video_shape[0], total_frames -src_video_shape[1], *src_video_shape[-2:])], dim=1)
|
src_video[i] = torch.cat( [src_video[i], src_video[i].new_zeros(src_video_shape[0], total_frames -src_video_shape[1], *src_video_shape[-2:])], dim=1)
|
||||||
@ -300,7 +300,8 @@ class WanT2V:
|
|||||||
slg_end = 1.0,
|
slg_end = 1.0,
|
||||||
cfg_star_switch = True,
|
cfg_star_switch = True,
|
||||||
cfg_zero_step = 5,
|
cfg_zero_step = 5,
|
||||||
overlapped_latents = 0,
|
overlapped_latents = None,
|
||||||
|
return_latent_slice = None,
|
||||||
overlap_noise = 0,
|
overlap_noise = 0,
|
||||||
model_filename = None,
|
model_filename = None,
|
||||||
**bbargs
|
**bbargs
|
||||||
@ -373,8 +374,10 @@ class WanT2V:
|
|||||||
input_frames = [u.to(self.device) for u in input_frames]
|
input_frames = [u.to(self.device) for u in input_frames]
|
||||||
input_ref_images = [ None if u == None else [v.to(self.device) for v in u] for u in input_ref_images]
|
input_ref_images = [ None if u == None else [v.to(self.device) for v in u] for u in input_ref_images]
|
||||||
input_masks = [u.to(self.device) for u in input_masks]
|
input_masks = [u.to(self.device) for u in input_masks]
|
||||||
|
previous_latents = None
|
||||||
z0 = self.vace_encode_frames(input_frames, input_ref_images, masks=input_masks, tile_size = VAE_tile_size, overlapped_latents = overlapped_latents, overlap_noise = overlap_noise )
|
# if overlapped_latents != None:
|
||||||
|
# input_ref_images = [u[-1:] for u in input_ref_images]
|
||||||
|
z0 = self.vace_encode_frames(input_frames, input_ref_images, masks=input_masks, tile_size = VAE_tile_size, overlapped_latents = overlapped_latents )
|
||||||
m0 = self.vace_encode_masks(input_masks, input_ref_images)
|
m0 = self.vace_encode_masks(input_masks, input_ref_images)
|
||||||
z = self.vace_latent(z0, m0)
|
z = self.vace_latent(z0, m0)
|
||||||
|
|
||||||
@ -442,8 +445,9 @@ class WanT2V:
|
|||||||
if vace:
|
if vace:
|
||||||
ref_images_count = len(input_ref_images[0]) if input_ref_images != None and input_ref_images[0] != None else 0
|
ref_images_count = len(input_ref_images[0]) if input_ref_images != None and input_ref_images[0] != None else 0
|
||||||
kwargs.update({'vace_context' : z, 'vace_context_scale' : context_scale})
|
kwargs.update({'vace_context' : z, 'vace_context_scale' : context_scale})
|
||||||
if overlapped_latents > 0:
|
if overlapped_latents != None:
|
||||||
z_reactive = [ zz[0:16, ref_images_count:overlapped_latents + ref_images_count].clone() for zz in z]
|
overlapped_latents_size = overlapped_latents.shape[1] + 1
|
||||||
|
z_reactive = [ zz[0:16, 0:overlapped_latents_size + ref_images_count].clone() for zz in z]
|
||||||
|
|
||||||
|
|
||||||
if self.model.enable_teacache:
|
if self.model.enable_teacache:
|
||||||
@ -453,13 +457,14 @@ class WanT2V:
|
|||||||
if callback != None:
|
if callback != None:
|
||||||
callback(-1, None, True)
|
callback(-1, None, True)
|
||||||
for i, t in enumerate(tqdm(timesteps)):
|
for i, t in enumerate(tqdm(timesteps)):
|
||||||
if vace and overlapped_latents > 0 :
|
if overlapped_latents != None:
|
||||||
# noise_factor = overlap_noise *(i/(len(timesteps)-1)) / 1000
|
# overlap_noise_factor = overlap_noise *(i/(len(timesteps)-1)) / 1000
|
||||||
noise_factor = overlap_noise / 1000 # * (999-t) / 999
|
overlap_noise_factor = overlap_noise / 1000
|
||||||
# noise_factor = overlap_noise / 1000 # * t / 999
|
latent_noise_factor = t / 1000
|
||||||
for zz, zz_r in zip(z, z_reactive):
|
for zz, zz_r, ll in zip(z, z_reactive, [latents]):
|
||||||
zz[0:16, ref_images_count:overlapped_latents + ref_images_count] = zz_r * (1.0 - noise_factor) + torch.randn_like(zz_r ) * noise_factor
|
pass
|
||||||
|
zz[0:16, ref_images_count:overlapped_latents_size + ref_images_count] = zz_r[:, ref_images_count:] * (1.0 - overlap_noise_factor) + torch.randn_like(zz_r[:, ref_images_count:] ) * overlap_noise_factor
|
||||||
|
ll[:, 0:overlapped_latents_size + ref_images_count] = zz_r * (1.0 - latent_noise_factor) + torch.randn_like(zz_r ) * latent_noise_factor
|
||||||
if target_camera != None:
|
if target_camera != None:
|
||||||
latent_model_input = torch.cat([latents, source_latents], dim=1)
|
latent_model_input = torch.cat([latents, source_latents], dim=1)
|
||||||
else:
|
else:
|
||||||
@ -552,6 +557,13 @@ class WanT2V:
|
|||||||
|
|
||||||
x0 = [latents]
|
x0 = [latents]
|
||||||
|
|
||||||
|
if return_latent_slice != None:
|
||||||
|
if overlapped_latents != None:
|
||||||
|
# latents [:, 1:] = self.toto
|
||||||
|
for zz, zz_r, ll in zip(z, z_reactive, [latents]):
|
||||||
|
ll[:, 0:overlapped_latents_size + ref_images_count] = zz_r
|
||||||
|
|
||||||
|
latent_slice = latents[:, return_latent_slice].clone()
|
||||||
if input_frames == None:
|
if input_frames == None:
|
||||||
if phantom:
|
if phantom:
|
||||||
# phantom post processing
|
# phantom post processing
|
||||||
@ -560,11 +572,9 @@ class WanT2V:
|
|||||||
else:
|
else:
|
||||||
# vace post processing
|
# vace post processing
|
||||||
videos = self.decode_latent(x0, input_ref_images, VAE_tile_size)
|
videos = self.decode_latent(x0, input_ref_images, VAE_tile_size)
|
||||||
|
if return_latent_slice != None:
|
||||||
del latents
|
return { "x" : videos[0], "latent_slice" : latent_slice }
|
||||||
del sample_scheduler
|
return videos[0]
|
||||||
|
|
||||||
return videos[0] if self.rank == 0 else None
|
|
||||||
|
|
||||||
def adapt_vace_model(self):
|
def adapt_vace_model(self):
|
||||||
model = self.model
|
model = self.model
|
||||||
|
|||||||
@ -91,11 +91,11 @@ def calculate_new_dimensions(canvas_height, canvas_width, height, width, fit_int
|
|||||||
return new_height, new_width
|
return new_height, new_width
|
||||||
|
|
||||||
def resize_and_remove_background(img_list, budget_width, budget_height, rm_background, fit_into_canvas = False ):
|
def resize_and_remove_background(img_list, budget_width, budget_height, rm_background, fit_into_canvas = False ):
|
||||||
if rm_background:
|
if rm_background > 0:
|
||||||
session = new_session()
|
session = new_session()
|
||||||
|
|
||||||
output_list =[]
|
output_list =[]
|
||||||
for img in img_list:
|
for i, img in enumerate(img_list):
|
||||||
width, height = img.size
|
width, height = img.size
|
||||||
|
|
||||||
if fit_into_canvas:
|
if fit_into_canvas:
|
||||||
@ -113,9 +113,10 @@ def resize_and_remove_background(img_list, budget_width, budget_height, rm_backg
|
|||||||
new_height = int( round(height * scale / 16) * 16)
|
new_height = int( round(height * scale / 16) * 16)
|
||||||
new_width = int( round(width * scale / 16) * 16)
|
new_width = int( round(width * scale / 16) * 16)
|
||||||
resized_image= img.resize((new_width,new_height), resample=Image.Resampling.LANCZOS)
|
resized_image= img.resize((new_width,new_height), resample=Image.Resampling.LANCZOS)
|
||||||
if rm_background:
|
if rm_background == 1 or rm_background == 2 and i > 0 :
|
||||||
resized_image = remove(resized_image, session=session, alpha_matting = True, bgcolor=[255, 255, 255, 0]).convert('RGB')
|
# resized_image = remove(resized_image, session=session, alpha_matting_erode_size = 1,alpha_matting_background_threshold = 70, alpha_foreground_background_threshold = 100, alpha_matting = True, bgcolor=[255, 255, 255, 0]).convert('RGB')
|
||||||
output_list.append(resized_image)
|
resized_image = remove(resized_image, session=session, alpha_matting_erode_size = 1, alpha_matting = True, bgcolor=[255, 255, 255, 0]).convert('RGB')
|
||||||
|
output_list.append(resized_image) #alpha_matting_background_threshold = 30, alpha_foreground_background_threshold = 200,
|
||||||
return output_list
|
return output_list
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
214
wgp.py
214
wgp.py
@ -204,9 +204,6 @@ def process_prompt_and_add_tasks(state, model_choice):
|
|||||||
|
|
||||||
if isinstance(image_refs, list):
|
if isinstance(image_refs, list):
|
||||||
image_refs = [ convert_image(tup[0]) for tup in image_refs ]
|
image_refs = [ convert_image(tup[0]) for tup in image_refs ]
|
||||||
# os.environ["U2NET_HOME"] = os.path.join(os.getcwd(), "ckpts", "rembg")
|
|
||||||
# from wan.utils.utils import resize_and_remove_background
|
|
||||||
# image_refs = resize_and_remove_background(image_refs, width, height, inputs["remove_background_image_ref"] ==1, fit_into_canvas= True)
|
|
||||||
|
|
||||||
|
|
||||||
if len(prompts) > 0:
|
if len(prompts) > 0:
|
||||||
@ -333,8 +330,10 @@ def process_prompt_and_add_tasks(state, model_choice):
|
|||||||
if "O" in video_prompt_type :
|
if "O" in video_prompt_type :
|
||||||
keep_frames_video_guide= inputs["keep_frames_video_guide"]
|
keep_frames_video_guide= inputs["keep_frames_video_guide"]
|
||||||
video_length = inputs["video_length"]
|
video_length = inputs["video_length"]
|
||||||
if len(keep_frames_video_guide) ==0:
|
if len(keep_frames_video_guide) > 0:
|
||||||
gr.Info(f"Warning : you have asked to reuse all the frames of the control Video in the Alternate Video Ending it. Please make sure the number of frames of the control Video is lower than the total number of frames to generate otherwise it won't make a difference.")
|
gr.Info("Keeping Frames with Extending Video is not yet supported")
|
||||||
|
return
|
||||||
|
# gr.Info(f"Warning : you have asked to reuse all the frames of the control Video in the Alternate Video Ending it. Please make sure the number of frames of the control Video is lower than the total number of frames to generate otherwise it won't make a difference.")
|
||||||
# elif keep_frames >= video_length:
|
# elif keep_frames >= video_length:
|
||||||
# gr.Info(f"The number of frames in the control Video to reuse ({keep_frames_video_guide}) in Alternate Video Ending can not be bigger than the total number of frames ({video_length}) to generate.")
|
# gr.Info(f"The number of frames in the control Video to reuse ({keep_frames_video_guide}) in Alternate Video Ending can not be bigger than the total number of frames ({video_length}) to generate.")
|
||||||
# return
|
# return
|
||||||
@ -349,11 +348,6 @@ def process_prompt_and_add_tasks(state, model_choice):
|
|||||||
if isinstance(image_refs, list):
|
if isinstance(image_refs, list):
|
||||||
image_refs = [ convert_image(tup[0]) for tup in image_refs ]
|
image_refs = [ convert_image(tup[0]) for tup in image_refs ]
|
||||||
|
|
||||||
# os.environ["U2NET_HOME"] = os.path.join(os.getcwd(), "ckpts", "rembg")
|
|
||||||
# from wan.utils.utils import resize_and_remove_background
|
|
||||||
# image_refs = resize_and_remove_background(image_refs, width, height, inputs["remove_background_image_ref"] ==1)
|
|
||||||
|
|
||||||
|
|
||||||
if len(prompts) > 0:
|
if len(prompts) > 0:
|
||||||
prompts = ["\n".join(prompts)]
|
prompts = ["\n".join(prompts)]
|
||||||
|
|
||||||
@ -1464,7 +1458,6 @@ lock_ui_attention = False
|
|||||||
lock_ui_transformer = False
|
lock_ui_transformer = False
|
||||||
lock_ui_compile = False
|
lock_ui_compile = False
|
||||||
|
|
||||||
preload =int(args.preload)
|
|
||||||
force_profile_no = int(args.profile)
|
force_profile_no = int(args.profile)
|
||||||
verbose_level = int(args.verbose)
|
verbose_level = int(args.verbose)
|
||||||
quantizeTransformer = args.quantize_transformer
|
quantizeTransformer = args.quantize_transformer
|
||||||
@ -1482,17 +1475,21 @@ if os.path.isfile("t2v_settings.json"):
|
|||||||
if not os.path.isfile(server_config_filename) and os.path.isfile("gradio_config.json"):
|
if not os.path.isfile(server_config_filename) and os.path.isfile("gradio_config.json"):
|
||||||
shutil.move("gradio_config.json", server_config_filename)
|
shutil.move("gradio_config.json", server_config_filename)
|
||||||
|
|
||||||
|
if not os.path.isdir("ckpts/umt5-xxl/"):
|
||||||
|
os.makedirs("ckpts/umt5-xxl/")
|
||||||
src_move = [ "ckpts/models_clip_open-clip-xlm-roberta-large-vit-huge-14-bf16.safetensors", "ckpts/models_t5_umt5-xxl-enc-bf16.safetensors", "ckpts/models_t5_umt5-xxl-enc-quanto_int8.safetensors" ]
|
src_move = [ "ckpts/models_clip_open-clip-xlm-roberta-large-vit-huge-14-bf16.safetensors", "ckpts/models_t5_umt5-xxl-enc-bf16.safetensors", "ckpts/models_t5_umt5-xxl-enc-quanto_int8.safetensors" ]
|
||||||
tgt_move = [ "ckpts/xlm-roberta-large/", "ckpts/umt5-xxl/", "ckpts/umt5-xxl/"]
|
tgt_move = [ "ckpts/xlm-roberta-large/", "ckpts/umt5-xxl/", "ckpts/umt5-xxl/"]
|
||||||
for src,tgt in zip(src_move,tgt_move):
|
for src,tgt in zip(src_move,tgt_move):
|
||||||
if os.path.isfile(src):
|
if os.path.isfile(src):
|
||||||
try:
|
try:
|
||||||
shutil.move(src, tgt)
|
if os.path.isfile(tgt):
|
||||||
|
shutil.remove(src)
|
||||||
|
else:
|
||||||
|
shutil.move(src, tgt)
|
||||||
except:
|
except:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
if not Path(server_config_filename).is_file():
|
if not Path(server_config_filename).is_file():
|
||||||
server_config = {"attention_mode" : "auto",
|
server_config = {"attention_mode" : "auto",
|
||||||
"transformer_types": [],
|
"transformer_types": [],
|
||||||
@ -1755,7 +1752,10 @@ def get_default_settings(filename):
|
|||||||
"flow_shift": 13,
|
"flow_shift": 13,
|
||||||
"resolution": "1280x720"
|
"resolution": "1280x720"
|
||||||
})
|
})
|
||||||
|
elif get_model_type(filename) in ("vace_14B"):
|
||||||
|
ui_defaults.update({
|
||||||
|
"sliding_window_discard_last_frames": 0,
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
with open(defaults_filename, "w", encoding="utf-8") as f:
|
with open(defaults_filename, "w", encoding="utf-8") as f:
|
||||||
@ -2136,6 +2136,9 @@ def load_models(model_filename):
|
|||||||
global transformer_filename, transformer_loras_filenames
|
global transformer_filename, transformer_loras_filenames
|
||||||
model_family = get_model_family(model_filename)
|
model_family = get_model_family(model_filename)
|
||||||
perc_reserved_mem_max = args.perc_reserved_mem_max
|
perc_reserved_mem_max = args.perc_reserved_mem_max
|
||||||
|
preload =int(args.preload)
|
||||||
|
if preload == 0:
|
||||||
|
preload = server_config.get("preload_in_VRAM", 0)
|
||||||
new_transformer_loras_filenames = None
|
new_transformer_loras_filenames = None
|
||||||
dependent_models = get_dependent_models(model_filename, quantization= transformer_quantization, dtype_policy = transformer_dtype_policy)
|
dependent_models = get_dependent_models(model_filename, quantization= transformer_quantization, dtype_policy = transformer_dtype_policy)
|
||||||
new_transformer_loras_filenames = [model_filename] if "_lora" in model_filename else None
|
new_transformer_loras_filenames = [model_filename] if "_lora" in model_filename else None
|
||||||
@ -2259,7 +2262,8 @@ def apply_changes( state,
|
|||||||
preload_model_policy_choice = 1,
|
preload_model_policy_choice = 1,
|
||||||
UI_theme_choice = "default",
|
UI_theme_choice = "default",
|
||||||
enhancer_enabled_choice = 0,
|
enhancer_enabled_choice = 0,
|
||||||
fit_canvas_choice = 0
|
fit_canvas_choice = 0,
|
||||||
|
preload_in_VRAM_choice = 0
|
||||||
):
|
):
|
||||||
if args.lock_config:
|
if args.lock_config:
|
||||||
return
|
return
|
||||||
@ -2284,6 +2288,7 @@ def apply_changes( state,
|
|||||||
"UI_theme" : UI_theme_choice,
|
"UI_theme" : UI_theme_choice,
|
||||||
"fit_canvas": fit_canvas_choice,
|
"fit_canvas": fit_canvas_choice,
|
||||||
"enhancer_enabled" : enhancer_enabled_choice,
|
"enhancer_enabled" : enhancer_enabled_choice,
|
||||||
|
"preload_in_VRAM" : preload_in_VRAM_choice
|
||||||
}
|
}
|
||||||
|
|
||||||
if Path(server_config_filename).is_file():
|
if Path(server_config_filename).is_file():
|
||||||
@ -2456,26 +2461,20 @@ def refresh_gallery(state): #, msg
|
|||||||
prompt = "<BR><DIV style='height:8px'></DIV>".join(prompts)
|
prompt = "<BR><DIV style='height:8px'></DIV>".join(prompts)
|
||||||
if enhanced:
|
if enhanced:
|
||||||
prompt = "<U><B>Enhanced:</B></U><BR>" + prompt
|
prompt = "<U><B>Enhanced:</B></U><BR>" + prompt
|
||||||
|
list_uri = []
|
||||||
start_img_uri = task.get('start_image_data_base64')
|
start_img_uri = task.get('start_image_data_base64')
|
||||||
start_img_uri = start_img_uri[0] if start_img_uri !=None else None
|
if start_img_uri != None:
|
||||||
|
list_uri += start_img_uri
|
||||||
end_img_uri = task.get('end_image_data_base64')
|
end_img_uri = task.get('end_image_data_base64')
|
||||||
end_img_uri = end_img_uri[0] if end_img_uri !=None else None
|
if end_img_uri != None:
|
||||||
|
list_uri += end_img_uri
|
||||||
|
|
||||||
thumbnail_size = "100px"
|
thumbnail_size = "100px"
|
||||||
if start_img_uri:
|
thumbnails = ""
|
||||||
start_img_md = f'<img src="{start_img_uri}" alt="Start" style="max-width:{thumbnail_size}; max-height:{thumbnail_size}; display: block; margin: auto; object-fit: contain;" />'
|
for img_uri in list_uri:
|
||||||
if end_img_uri:
|
thumbnails += f'<TD><img src="{img_uri}" alt="Start" style="max-width:{thumbnail_size}; max-height:{thumbnail_size}; display: block; margin: auto; object-fit: contain;" /></TD>'
|
||||||
end_img_md = f'<img src="{end_img_uri}" alt="End" style="max-width:{thumbnail_size}; max-height:{thumbnail_size}; display: block; margin: auto; object-fit: contain;" />'
|
|
||||||
|
|
||||||
label = f"Prompt of Video being Generated"
|
html = "<STYLE> #PINFO, #PINFO th, #PINFO td {border: 1px solid #CCCCCC;background-color:#FFFFFF;}</STYLE><TABLE WIDTH=100% ID=PINFO ><TR><TD width=100%>" + prompt + "</TD>" + thumbnails + "</TR></TABLE>"
|
||||||
|
|
||||||
html = "<STYLE> #PINFO, #PINFO th, #PINFO td {border: 1px solid #CCCCCC;background-color:#FFFFFF;}</STYLE><TABLE WIDTH=100% ID=PINFO ><TR><TD width=100%>" + prompt + "</TD>"
|
|
||||||
if start_img_md != "":
|
|
||||||
html += "<TD>" + start_img_md + "</TD>"
|
|
||||||
if end_img_md != "":
|
|
||||||
html += "<TD>" + end_img_md + "</TD>"
|
|
||||||
|
|
||||||
html += "</TR></TABLE>"
|
|
||||||
html_output = gr.HTML(html, visible= True)
|
html_output = gr.HTML(html, visible= True)
|
||||||
return gr.Gallery(selected_index=choice, value = file_list), html_output, gr.Button(visible=False), gr.Button(visible=True), gr.Row(visible=True), update_queue_data(queue), gr.Button(interactive= abort_interactive), gr.Button(visible= onemorewindow_visible)
|
return gr.Gallery(selected_index=choice, value = file_list), html_output, gr.Button(visible=False), gr.Button(visible=True), gr.Row(visible=True), update_queue_data(queue), gr.Button(interactive= abort_interactive), gr.Button(visible= onemorewindow_visible)
|
||||||
|
|
||||||
@ -2680,7 +2679,7 @@ def generate_video(
|
|||||||
sliding_window_overlap,
|
sliding_window_overlap,
|
||||||
sliding_window_overlap_noise,
|
sliding_window_overlap_noise,
|
||||||
sliding_window_discard_last_frames,
|
sliding_window_discard_last_frames,
|
||||||
remove_background_image_ref,
|
remove_background_images_ref,
|
||||||
temporal_upsampling,
|
temporal_upsampling,
|
||||||
spatial_upsampling,
|
spatial_upsampling,
|
||||||
RIFLEx_setting,
|
RIFLEx_setting,
|
||||||
@ -2816,13 +2815,14 @@ def generate_video(
|
|||||||
fps = 30
|
fps = 30
|
||||||
else:
|
else:
|
||||||
fps = 16
|
fps = 16
|
||||||
|
latent_size = 8 if ltxv else 4
|
||||||
|
|
||||||
original_image_refs = image_refs
|
original_image_refs = image_refs
|
||||||
if image_refs != None and len(image_refs) > 0 and (hunyuan_custom or phantom or vace):
|
if image_refs != None and len(image_refs) > 0 and (hunyuan_custom or phantom or vace):
|
||||||
send_cmd("progress", [0, get_latest_status(state, "Removing Images References Background")])
|
send_cmd("progress", [0, get_latest_status(state, "Removing Images References Background")])
|
||||||
os.environ["U2NET_HOME"] = os.path.join(os.getcwd(), "ckpts", "rembg")
|
os.environ["U2NET_HOME"] = os.path.join(os.getcwd(), "ckpts", "rembg")
|
||||||
from wan.utils.utils import resize_and_remove_background
|
from wan.utils.utils import resize_and_remove_background
|
||||||
image_refs = resize_and_remove_background(image_refs, width, height, remove_background_image_ref ==1, fit_into_canvas= not vace)
|
image_refs = resize_and_remove_background(image_refs, width, height, remove_background_images_ref, fit_into_canvas= not vace)
|
||||||
update_task_thumbnails(task, locals())
|
update_task_thumbnails(task, locals())
|
||||||
send_cmd("output")
|
send_cmd("output")
|
||||||
|
|
||||||
@ -2879,7 +2879,6 @@ def generate_video(
|
|||||||
repeat_no = 0
|
repeat_no = 0
|
||||||
extra_generation = 0
|
extra_generation = 0
|
||||||
initial_total_windows = 0
|
initial_total_windows = 0
|
||||||
max_frames_to_generate = video_length
|
|
||||||
if diffusion_forcing or vace or ltxv:
|
if diffusion_forcing or vace or ltxv:
|
||||||
reuse_frames = min(sliding_window_size - 4, sliding_window_overlap)
|
reuse_frames = min(sliding_window_size - 4, sliding_window_overlap)
|
||||||
else:
|
else:
|
||||||
@ -2888,8 +2887,9 @@ def generate_video(
|
|||||||
video_length += sliding_window_overlap
|
video_length += sliding_window_overlap
|
||||||
sliding_window = (vace or diffusion_forcing or ltxv) and video_length > sliding_window_size
|
sliding_window = (vace or diffusion_forcing or ltxv) and video_length > sliding_window_size
|
||||||
|
|
||||||
|
discard_last_frames = sliding_window_discard_last_frames
|
||||||
|
default_max_frames_to_generate = video_length
|
||||||
if sliding_window:
|
if sliding_window:
|
||||||
discard_last_frames = sliding_window_discard_last_frames
|
|
||||||
left_after_first_window = video_length - sliding_window_size + discard_last_frames
|
left_after_first_window = video_length - sliding_window_size + discard_last_frames
|
||||||
initial_total_windows= 1 + math.ceil(left_after_first_window / (sliding_window_size - discard_last_frames - reuse_frames))
|
initial_total_windows= 1 + math.ceil(left_after_first_window / (sliding_window_size - discard_last_frames - reuse_frames))
|
||||||
video_length = sliding_window_size
|
video_length = sliding_window_size
|
||||||
@ -2913,6 +2913,7 @@ def generate_video(
|
|||||||
prefix_video_frames_count = 0
|
prefix_video_frames_count = 0
|
||||||
frames_already_processed = None
|
frames_already_processed = None
|
||||||
pre_video_guide = None
|
pre_video_guide = None
|
||||||
|
overlapped_latents = None
|
||||||
window_no = 0
|
window_no = 0
|
||||||
extra_windows = 0
|
extra_windows = 0
|
||||||
guide_start_frame = 0
|
guide_start_frame = 0
|
||||||
@ -2920,6 +2921,8 @@ def generate_video(
|
|||||||
gen["extra_windows"] = 0
|
gen["extra_windows"] = 0
|
||||||
gen["total_windows"] = 1
|
gen["total_windows"] = 1
|
||||||
gen["window_no"] = 1
|
gen["window_no"] = 1
|
||||||
|
num_frames_generated = 0
|
||||||
|
max_frames_to_generate = default_max_frames_to_generate
|
||||||
start_time = time.time()
|
start_time = time.time()
|
||||||
if prompt_enhancer_image_caption_model != None and prompt_enhancer !=None and len(prompt_enhancer)>0:
|
if prompt_enhancer_image_caption_model != None and prompt_enhancer !=None and len(prompt_enhancer)>0:
|
||||||
text_encoder_max_tokens = 256
|
text_encoder_max_tokens = 256
|
||||||
@ -2955,38 +2958,50 @@ def generate_video(
|
|||||||
while not abort:
|
while not abort:
|
||||||
if sliding_window:
|
if sliding_window:
|
||||||
prompt = prompts[window_no] if window_no < len(prompts) else prompts[-1]
|
prompt = prompts[window_no] if window_no < len(prompts) else prompts[-1]
|
||||||
extra_windows += gen.get("extra_windows",0)
|
new_extra_windows = gen.get("extra_windows",0)
|
||||||
if extra_windows > 0:
|
|
||||||
video_length = sliding_window_size
|
|
||||||
gen["extra_windows"] = 0
|
gen["extra_windows"] = 0
|
||||||
|
extra_windows += new_extra_windows
|
||||||
|
max_frames_to_generate += new_extra_windows * (sliding_window_size - discard_last_frames - reuse_frames)
|
||||||
|
sliding_window = sliding_window or extra_windows > 0
|
||||||
|
if sliding_window and window_no > 0:
|
||||||
|
num_frames_generated -= reuse_frames
|
||||||
|
if (max_frames_to_generate - prefix_video_frames_count - num_frames_generated) < latent_size:
|
||||||
|
break
|
||||||
|
video_length = min(sliding_window_size, ((max_frames_to_generate - num_frames_generated - prefix_video_frames_count + reuse_frames + discard_last_frames) // latent_size) * latent_size + 1 )
|
||||||
|
|
||||||
total_windows = initial_total_windows + extra_windows
|
total_windows = initial_total_windows + extra_windows
|
||||||
gen["total_windows"] = total_windows
|
gen["total_windows"] = total_windows
|
||||||
if window_no >= total_windows:
|
if window_no >= total_windows:
|
||||||
break
|
break
|
||||||
window_no += 1
|
window_no += 1
|
||||||
gen["window_no"] = window_no
|
gen["window_no"] = window_no
|
||||||
|
return_latent_slice = None
|
||||||
|
if reuse_frames > 0:
|
||||||
|
return_latent_slice = slice(-(reuse_frames - 1 + discard_last_frames ) // latent_size, None if discard_last_frames == 0 else -(discard_last_frames // latent_size) )
|
||||||
|
|
||||||
if hunyuan_custom:
|
if hunyuan_custom:
|
||||||
src_ref_images = image_refs
|
src_ref_images = image_refs
|
||||||
elif phantom:
|
elif phantom:
|
||||||
src_ref_images = image_refs.copy() if image_refs != None else None
|
src_ref_images = image_refs.copy() if image_refs != None else None
|
||||||
elif diffusion_forcing or ltxv:
|
elif diffusion_forcing or ltxv or vace and "O" in video_prompt_type:
|
||||||
|
if vace:
|
||||||
|
video_source = video_guide
|
||||||
|
video_guide = None
|
||||||
if video_source != None and len(video_source) > 0 and window_no == 1:
|
if video_source != None and len(video_source) > 0 and window_no == 1:
|
||||||
keep_frames_video_source= 1000 if len(keep_frames_video_source) ==0 else int(keep_frames_video_source)
|
keep_frames_video_source= 1000 if len(keep_frames_video_source) ==0 else int(keep_frames_video_source)
|
||||||
|
keep_frames_video_source = (keep_frames_video_source // latent_size ) * latent_size + 1
|
||||||
prefix_video = preprocess_video(None, width=width, height=height,video_in=video_source, max_frames= keep_frames_video_source , start_frame = 0, fit_canvas= fit_canvas, target_fps = fps, block_size = 32 if ltxv else 16)
|
prefix_video = preprocess_video(None, width=width, height=height,video_in=video_source, max_frames= keep_frames_video_source , start_frame = 0, fit_canvas= fit_canvas, target_fps = fps, block_size = 32 if ltxv else 16)
|
||||||
prefix_video = prefix_video .permute(3, 0, 1, 2)
|
prefix_video = prefix_video .permute(3, 0, 1, 2)
|
||||||
prefix_video = prefix_video .float().div_(127.5).sub_(1.) # c, f, h, w
|
prefix_video = prefix_video .float().div_(127.5).sub_(1.) # c, f, h, w
|
||||||
prefix_video_frames_count = prefix_video.shape[1]
|
|
||||||
pre_video_guide = prefix_video[:, -reuse_frames:]
|
pre_video_guide = prefix_video[:, -reuse_frames:]
|
||||||
|
prefix_video_frames_count = pre_video_guide.shape[1]
|
||||||
elif vace:
|
if vace:
|
||||||
# video_prompt_type = video_prompt_type +"G"
|
height, width = pre_video_guide.shape[-2:]
|
||||||
|
if vace:
|
||||||
image_refs_copy = image_refs.copy() if image_refs != None else None # required since prepare_source do inplace modifications
|
image_refs_copy = image_refs.copy() if image_refs != None else None # required since prepare_source do inplace modifications
|
||||||
video_guide_copy = video_guide
|
video_guide_copy = video_guide
|
||||||
video_mask_copy = video_mask
|
video_mask_copy = video_mask
|
||||||
if any(process in video_prompt_type for process in ("P", "D", "G")) :
|
if any(process in video_prompt_type for process in ("P", "D", "G")) :
|
||||||
prompts_max = gen["prompts_max"]
|
|
||||||
|
|
||||||
preprocess_type = None
|
preprocess_type = None
|
||||||
if "P" in video_prompt_type :
|
if "P" in video_prompt_type :
|
||||||
progress_args = [0, get_latest_status(state,"Extracting Open Pose Information")]
|
progress_args = [0, get_latest_status(state,"Extracting Open Pose Information")]
|
||||||
@ -3005,8 +3020,11 @@ def generate_video(
|
|||||||
if len(error) > 0:
|
if len(error) > 0:
|
||||||
raise gr.Error(f"invalid keep frames {keep_frames_video_guide}")
|
raise gr.Error(f"invalid keep frames {keep_frames_video_guide}")
|
||||||
keep_frames_parsed = keep_frames_parsed[guide_start_frame: guide_start_frame + video_length]
|
keep_frames_parsed = keep_frames_parsed[guide_start_frame: guide_start_frame + video_length]
|
||||||
|
|
||||||
if window_no == 1:
|
if window_no == 1:
|
||||||
image_size = (height, width) # VACE_SIZE_CONFIGS[resolution_reformated] # default frame dimensions until it is set by video_src (if there is any)
|
image_size = (height, width) # default frame dimensions until it is set by video_src (if there is any)
|
||||||
|
|
||||||
|
|
||||||
src_video, src_mask, src_ref_images = wan_model.prepare_source([video_guide_copy],
|
src_video, src_mask, src_ref_images = wan_model.prepare_source([video_guide_copy],
|
||||||
[video_mask_copy ],
|
[video_mask_copy ],
|
||||||
[image_refs_copy],
|
[image_refs_copy],
|
||||||
@ -3017,29 +3035,24 @@ def generate_video(
|
|||||||
pre_src_video = [pre_video_guide],
|
pre_src_video = [pre_video_guide],
|
||||||
fit_into_canvas = fit_canvas
|
fit_into_canvas = fit_canvas
|
||||||
)
|
)
|
||||||
# if window_no == 1 and src_video != None and len(src_video) > 0:
|
|
||||||
# image_size = src_video[0].shape[-2:]
|
|
||||||
prompts_max = gen["prompts_max"]
|
|
||||||
status = get_latest_status(state)
|
status = get_latest_status(state)
|
||||||
|
|
||||||
|
|
||||||
gen["progress_status"] = status
|
gen["progress_status"] = status
|
||||||
gen["progress_phase"] = ("Encoding Prompt", -1 )
|
gen["progress_phase"] = ("Encoding Prompt", -1 )
|
||||||
callback = build_callback(state, trans, send_cmd, status, num_inference_steps)
|
callback = build_callback(state, trans, send_cmd, status, num_inference_steps)
|
||||||
progress_args = [0, merge_status_context(status, "Encoding Prompt")]
|
progress_args = [0, merge_status_context(status, "Encoding Prompt")]
|
||||||
send_cmd("progress", progress_args)
|
send_cmd("progress", progress_args)
|
||||||
|
|
||||||
|
if trans.enable_teacache:
|
||||||
|
trans.teacache_counter = 0
|
||||||
|
trans.num_steps = num_inference_steps
|
||||||
|
trans.teacache_skipped_steps = 0
|
||||||
|
trans.previous_residual = None
|
||||||
|
trans.previous_modulated_input = None
|
||||||
|
|
||||||
# samples = torch.empty( (1,2)) #for testing
|
# samples = torch.empty( (1,2)) #for testing
|
||||||
# if False:
|
# if False:
|
||||||
|
|
||||||
try:
|
try:
|
||||||
if trans.enable_teacache:
|
|
||||||
trans.teacache_counter = 0
|
|
||||||
trans.num_steps = num_inference_steps
|
|
||||||
trans.teacache_skipped_steps = 0
|
|
||||||
trans.previous_residual = None
|
|
||||||
trans.previous_modulated_input = None
|
|
||||||
|
|
||||||
samples = wan_model.generate(
|
samples = wan_model.generate(
|
||||||
input_prompt = prompt,
|
input_prompt = prompt,
|
||||||
image_start = image_start,
|
image_start = image_start,
|
||||||
@ -3049,7 +3062,7 @@ def generate_video(
|
|||||||
input_masks = src_mask,
|
input_masks = src_mask,
|
||||||
input_video= pre_video_guide if diffusion_forcing or ltxv else source_video,
|
input_video= pre_video_guide if diffusion_forcing or ltxv else source_video,
|
||||||
target_camera= target_camera,
|
target_camera= target_camera,
|
||||||
frame_num=(video_length // 4)* 4 + 1,
|
frame_num=(video_length // latent_size)* latent_size + 1,
|
||||||
height = height,
|
height = height,
|
||||||
width = width,
|
width = width,
|
||||||
fit_into_canvas = fit_canvas == 1,
|
fit_into_canvas = fit_canvas == 1,
|
||||||
@ -3076,7 +3089,8 @@ def generate_video(
|
|||||||
causal_block_size = 5,
|
causal_block_size = 5,
|
||||||
causal_attention = True,
|
causal_attention = True,
|
||||||
fps = fps,
|
fps = fps,
|
||||||
overlapped_latents = 0 if reuse_frames == 0 or window_no == 1 else ((reuse_frames - 1) // 4 + 1),
|
overlapped_latents = overlapped_latents,
|
||||||
|
return_latent_slice= return_latent_slice,
|
||||||
overlap_noise = sliding_window_overlap_noise,
|
overlap_noise = sliding_window_overlap_noise,
|
||||||
model_filename = model_filename,
|
model_filename = model_filename,
|
||||||
)
|
)
|
||||||
@ -3109,6 +3123,7 @@ def generate_video(
|
|||||||
tb = traceback.format_exc().split('\n')[:-1]
|
tb = traceback.format_exc().split('\n')[:-1]
|
||||||
print('\n'.join(tb))
|
print('\n'.join(tb))
|
||||||
send_cmd("error", new_error)
|
send_cmd("error", new_error)
|
||||||
|
clear_status(state)
|
||||||
return
|
return
|
||||||
finally:
|
finally:
|
||||||
trans.previous_residual = None
|
trans.previous_residual = None
|
||||||
@ -3118,33 +3133,42 @@ def generate_video(
|
|||||||
print(f"Teacache Skipped Steps:{trans.teacache_skipped_steps}/{trans.num_steps}" )
|
print(f"Teacache Skipped Steps:{trans.teacache_skipped_steps}/{trans.num_steps}" )
|
||||||
|
|
||||||
if samples != None:
|
if samples != None:
|
||||||
|
if isinstance(samples, dict):
|
||||||
|
overlapped_latents = samples.get("latent_slice", None)
|
||||||
|
samples= samples["x"]
|
||||||
samples = samples.to("cpu")
|
samples = samples.to("cpu")
|
||||||
offload.last_offload_obj.unload_all()
|
offload.last_offload_obj.unload_all()
|
||||||
gc.collect()
|
gc.collect()
|
||||||
torch.cuda.empty_cache()
|
torch.cuda.empty_cache()
|
||||||
|
|
||||||
|
# time_flag = datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d-%Hh%Mm%Ss")
|
||||||
|
# save_prompt = "_in_" + original_prompts[0]
|
||||||
|
# file_name = f"{time_flag}_seed{seed}_{sanitize_file_name(save_prompt[:50]).strip()}.mp4"
|
||||||
|
# sample = samples.cpu()
|
||||||
|
# cache_video( tensor=sample[None].clone(), save_file=os.path.join(save_path, file_name), fps=16, nrow=1, normalize=True, value_range=(-1, 1))
|
||||||
|
|
||||||
if samples == None:
|
if samples == None:
|
||||||
abort = True
|
abort = True
|
||||||
state["prompt"] = ""
|
state["prompt"] = ""
|
||||||
send_cmd("output")
|
send_cmd("output")
|
||||||
else:
|
else:
|
||||||
sample = samples.cpu()
|
sample = samples.cpu()
|
||||||
if True: # for testing
|
# if True: # for testing
|
||||||
torch.save(sample, "output.pt")
|
# torch.save(sample, "output.pt")
|
||||||
else:
|
# else:
|
||||||
sample =torch.load("output.pt")
|
# sample =torch.load("output.pt")
|
||||||
|
if gen.get("extra_windows",0) > 0:
|
||||||
|
sliding_window = True
|
||||||
if sliding_window :
|
if sliding_window :
|
||||||
guide_start_frame += video_length
|
guide_start_frame += video_length
|
||||||
if discard_last_frames > 0:
|
if discard_last_frames > 0:
|
||||||
sample = sample[: , :-discard_last_frames]
|
sample = sample[: , :-discard_last_frames]
|
||||||
guide_start_frame -= discard_last_frames
|
guide_start_frame -= discard_last_frames
|
||||||
if reuse_frames == 0:
|
if reuse_frames == 0:
|
||||||
pre_video_guide = sample[:,9999 :]
|
pre_video_guide = sample[:,9999 :].clone()
|
||||||
else:
|
else:
|
||||||
# noise_factor = 200/ 1000
|
pre_video_guide = sample[:, -reuse_frames:].clone()
|
||||||
# pre_video_guide = sample[:, -reuse_frames:] * (1.0 - noise_factor) + torch.randn_like(sample[:, -reuse_frames:] ) * noise_factor
|
num_frames_generated += sample.shape[1]
|
||||||
pre_video_guide = sample[:, -reuse_frames:]
|
|
||||||
|
|
||||||
|
|
||||||
if prefix_video != None:
|
if prefix_video != None:
|
||||||
@ -3158,7 +3182,6 @@ def generate_video(
|
|||||||
sample = sample[: , :]
|
sample = sample[: , :]
|
||||||
else:
|
else:
|
||||||
sample = sample[: , reuse_frames:]
|
sample = sample[: , reuse_frames:]
|
||||||
|
|
||||||
guide_start_frame -= reuse_frames
|
guide_start_frame -= reuse_frames
|
||||||
|
|
||||||
exp = 0
|
exp = 0
|
||||||
@ -3252,15 +3275,9 @@ def generate_video(
|
|||||||
print(f"New video saved to Path: "+video_path)
|
print(f"New video saved to Path: "+video_path)
|
||||||
file_list.append(video_path)
|
file_list.append(video_path)
|
||||||
send_cmd("output")
|
send_cmd("output")
|
||||||
if sliding_window :
|
|
||||||
if max_frames_to_generate > 0 and extra_windows == 0:
|
|
||||||
current_length = sample.shape[1]
|
|
||||||
if (current_length - prefix_video_frames_count)>= max_frames_to_generate:
|
|
||||||
break
|
|
||||||
video_length = min(sliding_window_size, ((max_frames_to_generate - (current_length - prefix_video_frames_count) + reuse_frames + discard_last_frames) // 4) * 4 + 1 )
|
|
||||||
|
|
||||||
seed += 1
|
seed += 1
|
||||||
|
clear_status(state)
|
||||||
if temp_filename!= None and os.path.isfile(temp_filename):
|
if temp_filename!= None and os.path.isfile(temp_filename):
|
||||||
os.remove(temp_filename)
|
os.remove(temp_filename)
|
||||||
offload.unload_loras_from_model(trans)
|
offload.unload_loras_from_model(trans)
|
||||||
@ -3631,6 +3648,15 @@ def merge_status_context(status="", context=""):
|
|||||||
else:
|
else:
|
||||||
return status + " - " + context
|
return status + " - " + context
|
||||||
|
|
||||||
|
def clear_status(state):
|
||||||
|
gen = get_gen_info(state)
|
||||||
|
gen["extra_windows"] = 0
|
||||||
|
gen["total_windows"] = 1
|
||||||
|
gen["window_no"] = 1
|
||||||
|
gen["extra_orders"] = 0
|
||||||
|
gen["repeat_no"] = 0
|
||||||
|
gen["total_generation"] = 0
|
||||||
|
|
||||||
def get_latest_status(state, context=""):
|
def get_latest_status(state, context=""):
|
||||||
gen = get_gen_info(state)
|
gen = get_gen_info(state)
|
||||||
prompt_no = gen["prompt_no"]
|
prompt_no = gen["prompt_no"]
|
||||||
@ -3999,7 +4025,7 @@ def prepare_inputs_dict(target, inputs ):
|
|||||||
inputs.pop("model_mode")
|
inputs.pop("model_mode")
|
||||||
|
|
||||||
if not "Vace" in model_filename or not "phantom" in model_filename or not "hunyuan_video_custom" in model_filename:
|
if not "Vace" in model_filename or not "phantom" in model_filename or not "hunyuan_video_custom" in model_filename:
|
||||||
unsaved_params = ["keep_frames_video_guide", "video_prompt_type", "remove_background_image_ref"]
|
unsaved_params = ["keep_frames_video_guide", "video_prompt_type", "remove_background_images_ref"]
|
||||||
for k in unsaved_params:
|
for k in unsaved_params:
|
||||||
inputs.pop(k)
|
inputs.pop(k)
|
||||||
|
|
||||||
@ -4066,7 +4092,7 @@ def save_inputs(
|
|||||||
sliding_window_overlap,
|
sliding_window_overlap,
|
||||||
sliding_window_overlap_noise,
|
sliding_window_overlap_noise,
|
||||||
sliding_window_discard_last_frames,
|
sliding_window_discard_last_frames,
|
||||||
remove_background_image_ref,
|
remove_background_images_ref,
|
||||||
temporal_upsampling,
|
temporal_upsampling,
|
||||||
spatial_upsampling,
|
spatial_upsampling,
|
||||||
RIFLEx_setting,
|
RIFLEx_setting,
|
||||||
@ -4458,7 +4484,7 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
|
|||||||
("Transfer Human Motion from the Control Video", "PV"),
|
("Transfer Human Motion from the Control Video", "PV"),
|
||||||
("Transfer Depth from the Control Video", "DV"),
|
("Transfer Depth from the Control Video", "DV"),
|
||||||
("Recolorize the Control Video", "CV"),
|
("Recolorize the Control Video", "CV"),
|
||||||
# ("Alternate Video Ending", "OV"),
|
("Extend Video", "OV"),
|
||||||
("Video contains Open Pose, Depth, Black & White, Inpainting ", "V"),
|
("Video contains Open Pose, Depth, Black & White, Inpainting ", "V"),
|
||||||
("Control Video and Mask video for Inpainting ", "MV"),
|
("Control Video and Mask video for Inpainting ", "MV"),
|
||||||
],
|
],
|
||||||
@ -4489,7 +4515,17 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
|
|||||||
)
|
)
|
||||||
|
|
||||||
# with gr.Row():
|
# with gr.Row():
|
||||||
remove_background_image_ref = gr.Checkbox(value=ui_defaults.get("remove_background_image_ref",1), label= "Remove Background of Images References", visible= "I" in video_prompt_type_value, scale =1 )
|
remove_background_images_ref = gr.Dropdown(
|
||||||
|
choices=[
|
||||||
|
("Keep Backgrounds of All Images (landscape)", 0),
|
||||||
|
("Remove Backgrounds of All Images (objects / faces)", 1),
|
||||||
|
("Keep it for first Image (landscape) and remove it for other Images (objects / faces)", 2),
|
||||||
|
],
|
||||||
|
value=ui_defaults.get("remove_background_images_ref",1),
|
||||||
|
label="Remove Background of Images References", scale = 3, visible= "I" in video_prompt_type_value
|
||||||
|
)
|
||||||
|
|
||||||
|
# remove_background_images_ref = gr.Checkbox(value=ui_defaults.get("remove_background_images_ref",1), label= "Remove Background of Images References", visible= "I" in video_prompt_type_value, scale =1 )
|
||||||
|
|
||||||
|
|
||||||
video_mask = gr.Video(label= "Video Mask (for Inpainting or Outpaing, white pixels = Mask)", visible= "M" in video_prompt_type_value, value= ui_defaults.get("video_mask", None))
|
video_mask = gr.Video(label= "Video Mask (for Inpainting or Outpaing, white pixels = Mask)", visible= "M" in video_prompt_type_value, value= ui_defaults.get("video_mask", None))
|
||||||
@ -4730,7 +4766,7 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
|
|||||||
else:
|
else:
|
||||||
sliding_window_size = gr.Slider(5, 137, value=ui_defaults.get("sliding_window_size", 81), step=4, label="Sliding Window Size")
|
sliding_window_size = gr.Slider(5, 137, value=ui_defaults.get("sliding_window_size", 81), step=4, label="Sliding Window Size")
|
||||||
sliding_window_overlap = gr.Slider(1, 97, value=ui_defaults.get("sliding_window_overlap",5), step=4, label="Windows Frames Overlap (needed to maintain continuity between windows, a higher value will require more windows)")
|
sliding_window_overlap = gr.Slider(1, 97, value=ui_defaults.get("sliding_window_overlap",5), step=4, label="Windows Frames Overlap (needed to maintain continuity between windows, a higher value will require more windows)")
|
||||||
sliding_window_overlap_noise = gr.Slider(0, 100, value=ui_defaults.get("sliding_window_overlap_noise",20), step=1, label="Noise to be added to overlapped frames to reduce blur effect")
|
sliding_window_overlap_noise = gr.Slider(0, 150, value=ui_defaults.get("sliding_window_overlap_noise",20), step=1, label="Noise to be added to overlapped frames to reduce blur effect")
|
||||||
sliding_window_discard_last_frames = gr.Slider(0, 20, value=ui_defaults.get("sliding_window_discard_last_frames", 8), step=4, label="Discard Last Frames of a Window (that may have bad quality)", visible = True)
|
sliding_window_discard_last_frames = gr.Slider(0, 20, value=ui_defaults.get("sliding_window_discard_last_frames", 8), step=4, label="Discard Last Frames of a Window (that may have bad quality)", visible = True)
|
||||||
|
|
||||||
|
|
||||||
@ -4811,7 +4847,7 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
|
|||||||
|
|
||||||
image_prompt_type.change(fn=refresh_image_prompt_type, inputs=[state, image_prompt_type], outputs=[image_start, image_end, video_source, keep_frames_video_source] )
|
image_prompt_type.change(fn=refresh_image_prompt_type, inputs=[state, image_prompt_type], outputs=[image_start, image_end, video_source, keep_frames_video_source] )
|
||||||
video_prompt_video_guide_trigger.change(fn=refresh_video_prompt_video_guide_trigger, inputs=[video_prompt_type, video_prompt_video_guide_trigger], outputs=[video_prompt_type, video_prompt_type_video_guide, video_guide, video_mask, keep_frames_video_guide])
|
video_prompt_video_guide_trigger.change(fn=refresh_video_prompt_video_guide_trigger, inputs=[video_prompt_type, video_prompt_video_guide_trigger], outputs=[video_prompt_type, video_prompt_type_video_guide, video_guide, video_mask, keep_frames_video_guide])
|
||||||
video_prompt_type_image_refs.input(fn=refresh_video_prompt_type_image_refs, inputs = [video_prompt_type, video_prompt_type_image_refs], outputs = [video_prompt_type, image_refs, remove_background_image_ref ])
|
video_prompt_type_image_refs.input(fn=refresh_video_prompt_type_image_refs, inputs = [video_prompt_type, video_prompt_type_image_refs], outputs = [video_prompt_type, image_refs, remove_background_images_ref ])
|
||||||
video_prompt_type_video_guide.input(fn=refresh_video_prompt_type_video_guide, inputs = [video_prompt_type, video_prompt_type_video_guide], outputs = [video_prompt_type, video_guide, keep_frames_video_guide, video_mask])
|
video_prompt_type_video_guide.input(fn=refresh_video_prompt_type_video_guide, inputs = [video_prompt_type, video_prompt_type_video_guide], outputs = [video_prompt_type, video_guide, keep_frames_video_guide, video_mask])
|
||||||
|
|
||||||
show_advanced.change(fn=switch_advanced, inputs=[state, show_advanced, lset_name], outputs=[advanced_row, preset_buttons_rows, refresh_lora_btn, refresh2_row ,lset_name ]).then(
|
show_advanced.change(fn=switch_advanced, inputs=[state, show_advanced, lset_name], outputs=[advanced_row, preset_buttons_rows, refresh_lora_btn, refresh2_row ,lset_name ]).then(
|
||||||
@ -5036,7 +5072,7 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
|
|||||||
)
|
)
|
||||||
|
|
||||||
return ( state, loras_choices, lset_name, state,
|
return ( state, loras_choices, lset_name, state,
|
||||||
video_guide, video_mask, video_prompt_video_guide_trigger, prompt_enhancer
|
video_guide, video_mask, image_refs, video_prompt_video_guide_trigger, prompt_enhancer
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@ -5250,6 +5286,7 @@ def generate_configuration_tab(state, blocks, header, model_choice, prompt_enhan
|
|||||||
value= profile,
|
value= profile,
|
||||||
label="Profile (for power users only, not needed to change it)"
|
label="Profile (for power users only, not needed to change it)"
|
||||||
)
|
)
|
||||||
|
preload_in_VRAM_choice = gr.Slider(0, 40000, value=server_config.get("preload_in_VRAM", 0), step=100, label="Number of MB of Models that are Preloaded in VRAM (0 will use Profile default)")
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -5277,7 +5314,8 @@ def generate_configuration_tab(state, blocks, header, model_choice, prompt_enhan
|
|||||||
preload_model_policy_choice,
|
preload_model_policy_choice,
|
||||||
UI_theme_choice,
|
UI_theme_choice,
|
||||||
enhancer_enabled_choice,
|
enhancer_enabled_choice,
|
||||||
fit_canvas_choice
|
fit_canvas_choice,
|
||||||
|
preload_in_VRAM_choice
|
||||||
],
|
],
|
||||||
outputs= [msg , header, model_choice, prompt_enhancer_row]
|
outputs= [msg , header, model_choice, prompt_enhancer_row]
|
||||||
)
|
)
|
||||||
@ -5661,7 +5699,7 @@ def create_demo():
|
|||||||
theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
|
theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
|
||||||
|
|
||||||
with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
|
with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
|
||||||
gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.2 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
|
gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.21 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
|
||||||
global model_list
|
global model_list
|
||||||
|
|
||||||
tab_state = gr.State({ "tab_no":0 })
|
tab_state = gr.State({ "tab_no":0 })
|
||||||
@ -5680,7 +5718,7 @@ def create_demo():
|
|||||||
header = gr.Markdown(generate_header(transformer_filename, compile, attention_mode), visible= True)
|
header = gr.Markdown(generate_header(transformer_filename, compile, attention_mode), visible= True)
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
( state, loras_choices, lset_name, state,
|
( state, loras_choices, lset_name, state,
|
||||||
video_guide, video_mask, video_prompt_type_video_trigger, prompt_enhancer_row
|
video_guide, video_mask, image_refs, video_prompt_type_video_trigger, prompt_enhancer_row
|
||||||
) = generate_video_tab(model_choice=model_choice, header=header, main = main)
|
) = generate_video_tab(model_choice=model_choice, header=header, main = main)
|
||||||
with gr.Tab("Informations", id="info"):
|
with gr.Tab("Informations", id="info"):
|
||||||
generate_info_tab()
|
generate_info_tab()
|
||||||
@ -5688,7 +5726,7 @@ def create_demo():
|
|||||||
from preprocessing.matanyone import app as matanyone_app
|
from preprocessing.matanyone import app as matanyone_app
|
||||||
vmc_event_handler = matanyone_app.get_vmc_event_handler()
|
vmc_event_handler = matanyone_app.get_vmc_event_handler()
|
||||||
|
|
||||||
matanyone_app.display(main_tabs, model_choice, video_guide, video_mask, video_prompt_type_video_trigger)
|
matanyone_app.display(main_tabs, model_choice, video_guide, video_mask, image_refs, video_prompt_type_video_trigger)
|
||||||
if not args.lock_config:
|
if not args.lock_config:
|
||||||
with gr.Tab("Downloads", id="downloads") as downloads_tab:
|
with gr.Tab("Downloads", id="downloads") as downloads_tab:
|
||||||
generate_download_tab(lset_name, loras_choices, state)
|
generate_download_tab(lset_name, loras_choices, state)
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user