diff --git a/README.md b/README.md index d5790fd..3341042 100644 --- a/README.md +++ b/README.md @@ -57,18 +57,18 @@ to [清影](https://chatglm.cn/video). The table below shows the list of video generation models we currently provide, along with related basic information: -| Model Name | CogVideoX-2B | -|-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------| -| Prompt Language | English | -| GPU Memory Required for Inference (FP16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) | -| GPU Memory Required for Fine-tuning(bs=1) | 42GB | -| Prompt Max Length | 226 Tokens | -| Video Length | 6 seconds | -| Frames Per Second | 8 frames | -| Resolution | 720 * 480 | -| Quantized Inference | Not Supported | -| Multi-card Inference | Not Supported | -| Download Link | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) | +| Model Name | CogVideoX-2B | +|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------| +| Prompt Language | English | +| GPU Memory Required for Inference (FP16) | 36GB using diffusers (will be optimized before the PR is merged) and 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer) | +| GPU Memory Required for Fine-tuning(bs=1) | 42GB | +| Prompt Max Length | 226 Tokens | +| Video Length | 6 seconds | +| Frames Per Second | 8 frames | +| Resolution | 720 * 480 | +| Quantized Inference | Not Supported | +| Multi-card Inference | Not Supported | +| Download Link | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) | ## Project Structure diff --git a/README_zh.md b/README_zh.md index e31ce5c..32cb615 100644 --- a/README_zh.md +++ b/README_zh.md @@ -54,18 +54,18 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生 下表战展示目前我们提供的视频生成模型列表,以及相关基础信息: -| 模型名字 | CogVideoX-2B | -|----------------|-------------------------------------------------------------------------------------------------------------------------------------| -| 提示词语言 | English | -| 推理显存消耗 (FP-16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) | -| 微调显存消耗 (bs=1) | 42GB | -| 提示词长度上限 | 226 Tokens | -| 视频长度 | 6 seconds | -| 帧率(每秒) | 8 frames | -| 视频分辨率 | 720 * 480 | -| 量化推理 | 不支持 | -| 多卡推理 | 不支持 | -| 权重地址 | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) | +| 模型名字 | CogVideoX-2B | +|----------------|--------------------------------------------------------------------------------------------------------------------------------------| +| 提示词语言 | English | +| 推理显存消耗 (FP-16) | 36GB using diffusers (will be optimized before the PR is merged) and 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer) | +| 微调显存消耗 (bs=1) | 42GB | +| 提示词长度上限 | 226 Tokens | +| 视频长度 | 6 seconds | +| 帧率(每秒) | 8 frames | +| 视频分辨率 | 720 * 480 | +| 量化推理 | 不支持 | +| 多卡推理 | 不支持 | +| 权重地址 | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) | ## 项目结构 diff --git a/inference/cli_demo.py b/inference/cli_demo.py index 0650352..47bbd8e 100644 --- a/inference/cli_demo.py +++ b/inference/cli_demo.py @@ -43,7 +43,7 @@ def generate_video( # Load the pre-trained CogVideoX pipeline with the specified precision (float16) and move it to the specified device pipe = CogVideoXPipeline.from_pretrained(model_path, torch_dtype=dtype).to(device) - + pipe.enable_sequential_cpu_offload() # Enable sequential CPU offload for faster inference # Encode the prompt to get the prompt embeddings prompt_embeds, _ = pipe.encode_prompt( prompt=prompt, # The textual description for video generation diff --git a/inference/cli_vae_demo.py b/inference/cli_vae_demo.py index 0d3ea28..2b1ed15 100644 --- a/inference/cli_vae_demo.py +++ b/inference/cli_vae_demo.py @@ -4,7 +4,7 @@ This script demonstrates how to encode video frames using a pre-trained CogVideo Note: This script requires the `diffusers>=0.30.0` library to be installed. If the video appears “completely green” and cannot be viewed, please switch to a different player to watch it. This is a normal phenomenon. - Cost 71GB of GPU memory for encoding a 1-minute video at 720p resolution. + Cost 71GB of GPU memory for encoding a 6s video at 720p resolution. Run the script: $ python cli_demo.py --model_path THUDM/CogVideoX-2b --video_path path/to/video.mp4 --output_path path/to/output