mirror of
https://github.com/THUDM/CogVideo.git
synced 2025-04-06 03:57:56 +08:00
显存优化
This commit is contained in:
parent
ad855f622c
commit
f7721c7fd2
24
README.md
24
README.md
@ -57,18 +57,18 @@ to [清影](https://chatglm.cn/video).
|
|||||||
The table below shows the list of video generation models we currently provide,
|
The table below shows the list of video generation models we currently provide,
|
||||||
along with related basic information:
|
along with related basic information:
|
||||||
|
|
||||||
| Model Name | CogVideoX-2B |
|
| Model Name | CogVideoX-2B |
|
||||||
|-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
|
|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
| Prompt Language | English |
|
| Prompt Language | English |
|
||||||
| GPU Memory Required for Inference (FP16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) |
|
| GPU Memory Required for Inference (FP16) | 36GB using diffusers (will be optimized before the PR is merged) and 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer) |
|
||||||
| GPU Memory Required for Fine-tuning(bs=1) | 42GB |
|
| GPU Memory Required for Fine-tuning(bs=1) | 42GB |
|
||||||
| Prompt Max Length | 226 Tokens |
|
| Prompt Max Length | 226 Tokens |
|
||||||
| Video Length | 6 seconds |
|
| Video Length | 6 seconds |
|
||||||
| Frames Per Second | 8 frames |
|
| Frames Per Second | 8 frames |
|
||||||
| Resolution | 720 * 480 |
|
| Resolution | 720 * 480 |
|
||||||
| Quantized Inference | Not Supported |
|
| Quantized Inference | Not Supported |
|
||||||
| Multi-card Inference | Not Supported |
|
| Multi-card Inference | Not Supported |
|
||||||
| Download Link | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
|
| Download Link | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
|
24
README_zh.md
24
README_zh.md
@ -54,18 +54,18 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
|
|||||||
|
|
||||||
下表战展示目前我们提供的视频生成模型列表,以及相关基础信息:
|
下表战展示目前我们提供的视频生成模型列表,以及相关基础信息:
|
||||||
|
|
||||||
| 模型名字 | CogVideoX-2B |
|
| 模型名字 | CogVideoX-2B |
|
||||||
|----------------|-------------------------------------------------------------------------------------------------------------------------------------|
|
|----------------|--------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
| 提示词语言 | English |
|
| 提示词语言 | English |
|
||||||
| 推理显存消耗 (FP-16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) |
|
| 推理显存消耗 (FP-16) | 36GB using diffusers (will be optimized before the PR is merged) and 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer) |
|
||||||
| 微调显存消耗 (bs=1) | 42GB |
|
| 微调显存消耗 (bs=1) | 42GB |
|
||||||
| 提示词长度上限 | 226 Tokens |
|
| 提示词长度上限 | 226 Tokens |
|
||||||
| 视频长度 | 6 seconds |
|
| 视频长度 | 6 seconds |
|
||||||
| 帧率(每秒) | 8 frames |
|
| 帧率(每秒) | 8 frames |
|
||||||
| 视频分辨率 | 720 * 480 |
|
| 视频分辨率 | 720 * 480 |
|
||||||
| 量化推理 | 不支持 |
|
| 量化推理 | 不支持 |
|
||||||
| 多卡推理 | 不支持 |
|
| 多卡推理 | 不支持 |
|
||||||
| 权重地址 | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
|
| 权重地址 | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
|
||||||
|
|
||||||
## 项目结构
|
## 项目结构
|
||||||
|
|
||||||
|
@ -43,7 +43,7 @@ def generate_video(
|
|||||||
|
|
||||||
# Load the pre-trained CogVideoX pipeline with the specified precision (float16) and move it to the specified device
|
# Load the pre-trained CogVideoX pipeline with the specified precision (float16) and move it to the specified device
|
||||||
pipe = CogVideoXPipeline.from_pretrained(model_path, torch_dtype=dtype).to(device)
|
pipe = CogVideoXPipeline.from_pretrained(model_path, torch_dtype=dtype).to(device)
|
||||||
|
pipe.enable_sequential_cpu_offload() # Enable sequential CPU offload for faster inference
|
||||||
# Encode the prompt to get the prompt embeddings
|
# Encode the prompt to get the prompt embeddings
|
||||||
prompt_embeds, _ = pipe.encode_prompt(
|
prompt_embeds, _ = pipe.encode_prompt(
|
||||||
prompt=prompt, # The textual description for video generation
|
prompt=prompt, # The textual description for video generation
|
||||||
|
@ -4,7 +4,7 @@ This script demonstrates how to encode video frames using a pre-trained CogVideo
|
|||||||
Note:
|
Note:
|
||||||
This script requires the `diffusers>=0.30.0` library to be installed.
|
This script requires the `diffusers>=0.30.0` library to be installed.
|
||||||
If the video appears “completely green” and cannot be viewed, please switch to a different player to watch it. This is a normal phenomenon.
|
If the video appears “completely green” and cannot be viewed, please switch to a different player to watch it. This is a normal phenomenon.
|
||||||
Cost 71GB of GPU memory for encoding a 1-minute video at 720p resolution.
|
Cost 71GB of GPU memory for encoding a 6s video at 720p resolution.
|
||||||
|
|
||||||
Run the script:
|
Run the script:
|
||||||
$ python cli_demo.py --model_path THUDM/CogVideoX-2b --video_path path/to/video.mp4 --output_path path/to/output
|
$ python cli_demo.py --model_path THUDM/CogVideoX-2b --video_path path/to/video.mp4 --output_path path/to/output
|
||||||
|
Loading…
x
Reference in New Issue
Block a user