mirror of
https://github.com/THUDM/CogVideo.git
synced 2025-04-06 03:57:56 +08:00
Merge pull request #297 from THUDM/CogVideoX_dev
release cogvlm-llama3-caption
This commit is contained in:
commit
6a2efb844b
@ -26,6 +26,9 @@ Experience the CogVideoX-5B model online at <a href="https://huggingface.co/spac
|
|||||||
This model allows inputting an image as a background combined with prompts to generate videos, providing greater
|
This model allows inputting an image as a background combined with prompts to generate videos, providing greater
|
||||||
controllability. With this release, the CogVideoX series now supports three tasks: text-to-video, video extension, and
|
controllability. With this release, the CogVideoX series now supports three tasks: text-to-video, video extension, and
|
||||||
image-to-video generation. Feel free to try it out [online](https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space).
|
image-to-video generation. Feel free to try it out [online](https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space).
|
||||||
|
- 🔥🔥 **News**: ```2024/9/19```: The caption model used in the CogVideoX training process to convert video data into text
|
||||||
|
descriptions, [cogvlm2-llama3-caption](https://huggingface.co/THUDM/cogvlm2-llama3-caption), is now open-source. Feel
|
||||||
|
free to download and use it.
|
||||||
- 🔥 **News**: ```2024/9/16```: We have added an automated video generation tool! You can now use local open-source
|
- 🔥 **News**: ```2024/9/16```: We have added an automated video generation tool! You can now use local open-source
|
||||||
models + FLUX + CogVideoX to automatically generate high-quality videos. Feel free
|
models + FLUX + CogVideoX to automatically generate high-quality videos. Feel free
|
||||||
to [try it out](tools/llm_flux_cogvideox/llm_flux_cogvideox.py).
|
to [try it out](tools/llm_flux_cogvideox/llm_flux_cogvideox.py).
|
||||||
@ -319,7 +322,8 @@ Here provide three projects that can be run directly on free Colab T4 instances:
|
|||||||
CogVideoX-5B Quantized Text-to-Video Inference Colab code, which takes about 30 minutes per run.
|
CogVideoX-5B Quantized Text-to-Video Inference Colab code, which takes about 30 minutes per run.
|
||||||
+ [CogVideoX-5B-I2V-Colab.ipynb](https://colab.research.google.com/drive/17CqYCqSwz39nZAX2YyonDxosVKUZGzcX?usp=sharing):
|
+ [CogVideoX-5B-I2V-Colab.ipynb](https://colab.research.google.com/drive/17CqYCqSwz39nZAX2YyonDxosVKUZGzcX?usp=sharing):
|
||||||
CogVideoX-5B Image-to-Video Colab code.
|
CogVideoX-5B Image-to-Video Colab code.
|
||||||
|
+ [CogVideoX-5B-V2V-Colab.ipynb](https://colab.research.google.com/drive/1comfGAUJnChl5NwPuO8Ox5_6WCy4kbNN?usp=sharing):
|
||||||
|
CogVideoX-5B Video-to-Video Colab code.
|
||||||
|
|
||||||
### Inference
|
### Inference
|
||||||
|
|
||||||
|
@ -24,6 +24,9 @@
|
|||||||
|
|
||||||
- 🔥🔥 **ニュース**: ```2024/9/19```: CogVideoXシリーズの画像生成ビデオモデル **CogVideoX-5B-I2V**
|
- 🔥🔥 **ニュース**: ```2024/9/19```: CogVideoXシリーズの画像生成ビデオモデル **CogVideoX-5B-I2V**
|
||||||
をオープンソース化しました。このモデルでは、背景として画像を入力し、プロンプトと組み合わせてビデオを生成でき、より強力なコントロール性を提供します。これで、CogVideoXシリーズは、テキスト生成ビデオ、ビデオ拡張、画像生成ビデオの3つのタスクをサポートしています。ぜひ [オンラインでお試しください](https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space)。
|
をオープンソース化しました。このモデルでは、背景として画像を入力し、プロンプトと組み合わせてビデオを生成でき、より強力なコントロール性を提供します。これで、CogVideoXシリーズは、テキスト生成ビデオ、ビデオ拡張、画像生成ビデオの3つのタスクをサポートしています。ぜひ [オンラインでお試しください](https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space)。
|
||||||
|
- 🔥🔥 **ニュース**: ```2024/9/19```:CogVideoX
|
||||||
|
のトレーニングプロセスで、ビデオデータをテキストに変換するためのキャプションモデル [cogvlm2-llama3-caption](https://huggingface.co/THUDM/cogvlm2-llama3-caption)
|
||||||
|
がオープンソース化されました。ぜひダウンロードしてご利用ください。
|
||||||
- 🔥 **ニュース**: ```2024/9/16```: 自動動画生成ツールを追加しました!オープンソースのローカルモデル + FLUX + CogVideoX
|
- 🔥 **ニュース**: ```2024/9/16```: 自動動画生成ツールを追加しました!オープンソースのローカルモデル + FLUX + CogVideoX
|
||||||
を使用して、高品質な動画を自動生成できます。ぜひ[お試しください](tools/llm_flux_cogvideox/llm_flux_cogvideox.py)。
|
を使用して、高品質な動画を自動生成できます。ぜひ[お試しください](tools/llm_flux_cogvideox/llm_flux_cogvideox.py)。
|
||||||
- 🔥 **ニュース**: ```2024/9/15```: CogVideoXのLoRAファインチューニングの重みがエクスポートされ、`diffusers`
|
- 🔥 **ニュース**: ```2024/9/15```: CogVideoXのLoRAファインチューニングの重みがエクスポートされ、`diffusers`
|
||||||
@ -286,6 +289,8 @@ pipe.vae.enable_tiling()
|
|||||||
CogVideoX-5B テキストからビデオへの量子化推論用Colabコード。1回の実行に約30分かかります。
|
CogVideoX-5B テキストからビデオへの量子化推論用Colabコード。1回の実行に約30分かかります。
|
||||||
+ [CogVideoX-5B-I2V-Colab.ipynb](https://colab.research.google.com/drive/17CqYCqSwz39nZAX2YyonDxosVKUZGzcX?usp=sharing):
|
+ [CogVideoX-5B-I2V-Colab.ipynb](https://colab.research.google.com/drive/17CqYCqSwz39nZAX2YyonDxosVKUZGzcX?usp=sharing):
|
||||||
CogVideoX-5B 画像からビデオへの生成用Colabコード。
|
CogVideoX-5B 画像からビデオへの生成用Colabコード。
|
||||||
|
+ [CogVideoX-5B-V2V-Colab.ipynb](https://colab.research.google.com/drive/1comfGAUJnChl5NwPuO8Ox5_6WCy4kbNN?usp=sharing):
|
||||||
|
CogVideoX-5B ビデオからビデオへの生成用Colabコード。
|
||||||
|
|
||||||
### Inference
|
### Inference
|
||||||
|
|
||||||
|
@ -26,6 +26,9 @@
|
|||||||
- 🔥🔥 **News**: ```2024/9/19```: 我们开源 CogVideoX 系列图生视频模型 **CogVideoX-5B-I2V**
|
- 🔥🔥 **News**: ```2024/9/19```: 我们开源 CogVideoX 系列图生视频模型 **CogVideoX-5B-I2V**
|
||||||
。该模型可以将一张图像作为背景输入,结合提示词一起生成视频,具有更强的可控性。
|
。该模型可以将一张图像作为背景输入,结合提示词一起生成视频,具有更强的可控性。
|
||||||
至此,CogVideoX系列模型已经支持文本生成视频,视频续写,图片生成视频三种任务。欢迎前往在线[体验](https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space)。
|
至此,CogVideoX系列模型已经支持文本生成视频,视频续写,图片生成视频三种任务。欢迎前往在线[体验](https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space)。
|
||||||
|
- 🔥🔥 **News**: ```2024/9/19```: CogVideoX 训练过程中用于将视频数据转换为文本描述的 Caption
|
||||||
|
模型 [cogvlm2-llama3-caption](https://huggingface.co/THUDM/cogvlm2-llama3-caption)
|
||||||
|
已经开源。欢迎前往下载并使用。
|
||||||
- 🔥 **News**: ```2024/9/16```: 我们添加自动化生成视频工具,你可以使用本地开源模型 + FLUX + CogVideoX
|
- 🔥 **News**: ```2024/9/16```: 我们添加自动化生成视频工具,你可以使用本地开源模型 + FLUX + CogVideoX
|
||||||
实现自动生成优质视频,欢迎[体验](tools/llm_flux_cogvideox/llm_flux_cogvideox.py)
|
实现自动生成优质视频,欢迎[体验](tools/llm_flux_cogvideox/llm_flux_cogvideox.py)
|
||||||
- 🔥 **News**: ```2024/9/15```: CogVideoX LoRA微调权重导出并在`diffusers`库中测试通过,请查看[教程](sat/README_zh.md)。
|
- 🔥 **News**: ```2024/9/15```: CogVideoX LoRA微调权重导出并在`diffusers`库中测试通过,请查看[教程](sat/README_zh.md)。
|
||||||
@ -276,6 +279,8 @@ pipe.vae.enable_tiling()
|
|||||||
CogVideoX-5B 文字生成视频量化推理 Colab 代码,运行一次大约需要30分钟。
|
CogVideoX-5B 文字生成视频量化推理 Colab 代码,运行一次大约需要30分钟。
|
||||||
+ [CogVideoX-5B-I2V-Colab.ipynb](https://colab.research.google.com/drive/17CqYCqSwz39nZAX2YyonDxosVKUZGzcX?usp=sharing):
|
+ [CogVideoX-5B-I2V-Colab.ipynb](https://colab.research.google.com/drive/17CqYCqSwz39nZAX2YyonDxosVKUZGzcX?usp=sharing):
|
||||||
CogVideoX-5B 图片生成视频 Colab 代码。
|
CogVideoX-5B 图片生成视频 Colab 代码。
|
||||||
|
+ [CogVideoX-5B-V2V-Colab.ipynb](https://colab.research.google.com/drive/1comfGAUJnChl5NwPuO8Ox5_6WCy4kbNN?usp=sharing):
|
||||||
|
CogVideoX-5B 视频生成视频 Colab 代码。
|
||||||
|
|
||||||
### inference
|
### inference
|
||||||
|
|
||||||
|
@ -40,10 +40,10 @@ device = "cuda" if torch.cuda.is_available() else "cpu"
|
|||||||
hf_hub_download(repo_id="ai-forever/Real-ESRGAN", filename="RealESRGAN_x4.pth", local_dir="model_real_esran")
|
hf_hub_download(repo_id="ai-forever/Real-ESRGAN", filename="RealESRGAN_x4.pth", local_dir="model_real_esran")
|
||||||
snapshot_download(repo_id="AlexWortega/RIFE", local_dir="model_rife")
|
snapshot_download(repo_id="AlexWortega/RIFE", local_dir="model_rife")
|
||||||
|
|
||||||
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to(device)
|
pipe = CogVideoXPipeline.from_pretrained("/share/official_pretrains/hf_home/CogVideoX-5b", torch_dtype=torch.bfloat16).to(device)
|
||||||
pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
|
pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
|
||||||
pipe_video = CogVideoXVideoToVideoPipeline.from_pretrained(
|
pipe_video = CogVideoXVideoToVideoPipeline.from_pretrained(
|
||||||
"THUDM/CogVideoX-5b",
|
"/share/official_pretrains/hf_home/CogVideoX-5b",
|
||||||
transformer=pipe.transformer,
|
transformer=pipe.transformer,
|
||||||
vae=pipe.vae,
|
vae=pipe.vae,
|
||||||
scheduler=pipe.scheduler,
|
scheduler=pipe.scheduler,
|
||||||
@ -53,9 +53,9 @@ pipe_video = CogVideoXVideoToVideoPipeline.from_pretrained(
|
|||||||
).to(device)
|
).to(device)
|
||||||
|
|
||||||
pipe_image = CogVideoXImageToVideoPipeline.from_pretrained(
|
pipe_image = CogVideoXImageToVideoPipeline.from_pretrained(
|
||||||
"THUDM/CogVideoX-5b-I2V",
|
"/share/official_pretrains/hf_home/CogVideoX-5b-I2V",
|
||||||
transformer=CogVideoXTransformer3DModel.from_pretrained(
|
transformer=CogVideoXTransformer3DModel.from_pretrained(
|
||||||
"THUDM/CogVideoX-5b-I2V", subfolder="transformer", torch_dtype=torch.bfloat16
|
"/share/official_pretrains/hf_home/CogVideoX-5b-I2V", subfolder="transformer", torch_dtype=torch.bfloat16
|
||||||
),
|
),
|
||||||
vae=pipe.vae,
|
vae=pipe.vae,
|
||||||
scheduler=pipe.scheduler,
|
scheduler=pipe.scheduler,
|
||||||
@ -322,11 +322,11 @@ with gr.Blocks() as demo:
|
|||||||
with gr.Column():
|
with gr.Column():
|
||||||
with gr.Accordion("I2V: Image Input (cannot be used simultaneously with video input)", open=False):
|
with gr.Accordion("I2V: Image Input (cannot be used simultaneously with video input)", open=False):
|
||||||
image_input = gr.Image(label="Input Image (will be cropped to 720 * 480)")
|
image_input = gr.Image(label="Input Image (will be cropped to 720 * 480)")
|
||||||
examples_component_images = gr.Examples(examples_images, inputs=[examples_images], cache_examples=False)
|
examples_component_images = gr.Examples(examples_images, inputs=[image_input], cache_examples=False)
|
||||||
with gr.Accordion("V2V: Video Input (cannot be used simultaneously with image input)", open=False):
|
with gr.Accordion("V2V: Video Input (cannot be used simultaneously with image input)", open=False):
|
||||||
video_input = gr.Video(label="Input Video (will be cropped to 49 frames, 6 seconds at 8fps)")
|
video_input = gr.Video(label="Input Video (will be cropped to 49 frames, 6 seconds at 8fps)")
|
||||||
strength = gr.Slider(0.1, 1.0, value=0.8, step=0.01, label="Strength")
|
strength = gr.Slider(0.1, 1.0, value=0.8, step=0.01, label="Strength")
|
||||||
examples_component_videos = gr.Examples(examples_videos, inputs=[examples_videos], cache_examples=False)
|
examples_component_videos = gr.Examples(examples_videos, inputs=[video_input], cache_examples=False)
|
||||||
prompt = gr.Textbox(label="Prompt (Less than 200 Words)", placeholder="Enter your prompt here", lines=5)
|
prompt = gr.Textbox(label="Prompt (Less than 200 Words)", placeholder="Enter your prompt here", lines=5)
|
||||||
|
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
|
194
tools/llm_flux_cogvideox/gradio_page.py
Normal file
194
tools/llm_flux_cogvideox/gradio_page.py
Normal file
@ -0,0 +1,194 @@
|
|||||||
|
import os
|
||||||
|
import gradio as gr
|
||||||
|
import gc
|
||||||
|
import random
|
||||||
|
import torch
|
||||||
|
import numpy as np
|
||||||
|
from PIL import Image
|
||||||
|
import transformers
|
||||||
|
from diffusers import CogVideoXImageToVideoPipeline, CogVideoXDPMScheduler, DiffusionPipeline
|
||||||
|
from diffusers.utils import export_to_video
|
||||||
|
from transformers import AutoTokenizer
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
import moviepy.editor as mp
|
||||||
|
|
||||||
|
torch.set_float32_matmul_precision("high")
|
||||||
|
|
||||||
|
# Set default values
|
||||||
|
caption_generator_model_id = "/share/home/zyx/Models/Meta-Llama-3.1-8B-Instruct"
|
||||||
|
image_generator_model_id = "/share/home/zyx/Models/FLUX.1-dev"
|
||||||
|
video_generator_model_id = "/share/official_pretrains/hf_home/CogVideoX-5b-I2V"
|
||||||
|
seed = 1337
|
||||||
|
|
||||||
|
os.makedirs("./output", exist_ok=True)
|
||||||
|
os.makedirs("./gradio_tmp", exist_ok=True)
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(caption_generator_model_id, trust_remote_code=True)
|
||||||
|
caption_generator = transformers.pipeline(
|
||||||
|
"text-generation",
|
||||||
|
model=caption_generator_model_id,
|
||||||
|
device_map="balanced",
|
||||||
|
model_kwargs={
|
||||||
|
"local_files_only": True,
|
||||||
|
"torch_dtype": torch.bfloat16,
|
||||||
|
},
|
||||||
|
trust_remote_code=True,
|
||||||
|
tokenizer=tokenizer
|
||||||
|
)
|
||||||
|
|
||||||
|
image_generator = DiffusionPipeline.from_pretrained(
|
||||||
|
image_generator_model_id,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
device_map="balanced"
|
||||||
|
)
|
||||||
|
# image_generator.to("cuda")
|
||||||
|
|
||||||
|
video_generator = CogVideoXImageToVideoPipeline.from_pretrained(
|
||||||
|
video_generator_model_id,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
device_map="balanced"
|
||||||
|
)
|
||||||
|
|
||||||
|
video_generator.vae.enable_slicing()
|
||||||
|
video_generator.vae.enable_tiling()
|
||||||
|
|
||||||
|
video_generator.scheduler = CogVideoXDPMScheduler.from_config(
|
||||||
|
video_generator.scheduler.config, timestep_spacing="trailing"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Define prompts
|
||||||
|
SYSTEM_PROMPT = """
|
||||||
|
You are part of a team of people that create videos using generative models. You use a video-generation model that can generate a video about anything you describe.
|
||||||
|
|
||||||
|
For example, if you respond with "A beautiful morning in the woods with the sun peaking through the trees", the video generation model will create a video of exactly as described. Your task is to summarize the descriptions of videos provided by users and create detailed prompts to feed into the generative model.
|
||||||
|
|
||||||
|
There are a few rules to follow:
|
||||||
|
- You will only ever output a single video description per request.
|
||||||
|
- If the user mentions to summarize the prompt in [X] words, make sure not to exceed the limit.
|
||||||
|
|
||||||
|
Your responses should just be the video generation prompt. Here are examples:
|
||||||
|
- "A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting."
|
||||||
|
- "A street artist, clad in a worn-out denim jacket and a colorful bandana, stands before a vast concrete wall in the heart of the city, holding a can of spray paint, spray-painting a colorful bird on a mottled wall."
|
||||||
|
""".strip()
|
||||||
|
|
||||||
|
USER_PROMPT = """
|
||||||
|
Could you generate a prompt for a video generation model? Please limit the prompt to [{0}] words.
|
||||||
|
""".strip()
|
||||||
|
|
||||||
|
|
||||||
|
def generate_caption(prompt):
|
||||||
|
num_words = random.choice([25, 50, 75, 100])
|
||||||
|
user_prompt = USER_PROMPT.format(num_words)
|
||||||
|
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": SYSTEM_PROMPT},
|
||||||
|
{"role": "user", "content": prompt + "\n" + user_prompt},
|
||||||
|
]
|
||||||
|
|
||||||
|
response = caption_generator(
|
||||||
|
messages,
|
||||||
|
max_new_tokens=226,
|
||||||
|
return_full_text=False
|
||||||
|
)
|
||||||
|
caption = response[0]["generated_text"]
|
||||||
|
if caption.startswith("\"") and caption.endswith("\""):
|
||||||
|
caption = caption[1:-1]
|
||||||
|
return caption
|
||||||
|
|
||||||
|
|
||||||
|
def generate_image(caption, progress=gr.Progress(track_tqdm=True)):
|
||||||
|
image = image_generator(
|
||||||
|
prompt=caption,
|
||||||
|
height=480,
|
||||||
|
width=720,
|
||||||
|
num_inference_steps=30,
|
||||||
|
guidance_scale=3.5,
|
||||||
|
).images[0]
|
||||||
|
return image, image # One for output One for State
|
||||||
|
|
||||||
|
|
||||||
|
def generate_video(
|
||||||
|
caption,
|
||||||
|
image,
|
||||||
|
progress=gr.Progress(track_tqdm=True)
|
||||||
|
):
|
||||||
|
generator = torch.Generator().manual_seed(seed)
|
||||||
|
video_frames = video_generator(
|
||||||
|
image=image,
|
||||||
|
prompt=caption,
|
||||||
|
height=480,
|
||||||
|
width=720,
|
||||||
|
num_frames=49,
|
||||||
|
num_inference_steps=50,
|
||||||
|
guidance_scale=6,
|
||||||
|
use_dynamic_cfg=True,
|
||||||
|
generator=generator,
|
||||||
|
).frames[0]
|
||||||
|
video_path = save_video(video_frames)
|
||||||
|
gif_path = convert_to_gif(video_path)
|
||||||
|
return video_path, gif_path
|
||||||
|
|
||||||
|
|
||||||
|
def save_video(tensor):
|
||||||
|
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||||
|
video_path = f"./output/{timestamp}.mp4"
|
||||||
|
os.makedirs(os.path.dirname(video_path), exist_ok=True)
|
||||||
|
export_to_video(tensor, video_path, fps=8)
|
||||||
|
return video_path
|
||||||
|
|
||||||
|
|
||||||
|
def convert_to_gif(video_path):
|
||||||
|
clip = mp.VideoFileClip(video_path)
|
||||||
|
clip = clip.set_fps(8)
|
||||||
|
clip = clip.resize(height=240)
|
||||||
|
gif_path = video_path.replace(".mp4", ".gif")
|
||||||
|
clip.write_gif(gif_path, fps=8)
|
||||||
|
return gif_path
|
||||||
|
|
||||||
|
|
||||||
|
def delete_old_files():
|
||||||
|
while True:
|
||||||
|
now = datetime.now()
|
||||||
|
cutoff = now - timedelta(minutes=10)
|
||||||
|
directories = ["./output", "./gradio_tmp"]
|
||||||
|
|
||||||
|
for directory in directories:
|
||||||
|
for filename in os.listdir(directory):
|
||||||
|
file_path = os.path.join(directory, filename)
|
||||||
|
if os.path.isfile(file_path):
|
||||||
|
file_mtime = datetime.fromtimestamp(os.path.getmtime(file_path))
|
||||||
|
if file_mtime < cutoff:
|
||||||
|
os.remove(file_path)
|
||||||
|
time.sleep(600)
|
||||||
|
|
||||||
|
|
||||||
|
threading.Thread(target=delete_old_files, daemon=True).start()
|
||||||
|
|
||||||
|
with gr.Blocks() as demo:
|
||||||
|
gr.Markdown("""
|
||||||
|
<div style="text-align: center; font-size: 32px; font-weight: bold; margin-bottom: 20px;">
|
||||||
|
LLM + FLUX + CogVideoX-I2V Space 🤗
|
||||||
|
</div>
|
||||||
|
""")
|
||||||
|
with gr.Row():
|
||||||
|
with gr.Column():
|
||||||
|
prompt = gr.Textbox(label="Prompt", placeholder="Enter your prompt here", lines=5)
|
||||||
|
generate_caption_button = gr.Button("Generate Caption")
|
||||||
|
caption = gr.Textbox(label="Caption", placeholder="Caption will appear here", lines=5)
|
||||||
|
generate_image_button = gr.Button("Generate Image")
|
||||||
|
image_output = gr.Image(label="Generated Image")
|
||||||
|
state_image = gr.State()
|
||||||
|
generate_caption_button.click(fn=generate_caption, inputs=prompt, outputs=caption)
|
||||||
|
generate_image_button.click(fn=generate_image, inputs=caption, outputs=[image_output, state_image])
|
||||||
|
with gr.Column():
|
||||||
|
video_output = gr.Video(label="Generated Video", width=720, height=480)
|
||||||
|
download_video_button = gr.File(label="📥 Download Video", visible=False)
|
||||||
|
download_gif_button = gr.File(label="📥 Download GIF", visible=False)
|
||||||
|
generate_video_button = gr.Button("Generate Video from Image")
|
||||||
|
generate_video_button.click(fn=generate_video, inputs=[caption, state_image],
|
||||||
|
outputs=[video_output, download_gif_button])
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
demo.launch()
|
Loading…
x
Reference in New Issue
Block a user