显存优化

2026-06-27 02:28:18 +08:00 · 2024-08-06 03:04:06 +08:00 · 2024-08-06 03:04:06 +08:00 · f7721c7fd2
commit f7721c7fd2
parent ad855f622c
4 changed files with 26 additions and 26 deletions
--- a/README.md
+++ b/README.md
@ -57,18 +57,18 @@ to [清影](https://chatglm.cn/video).
 The table below shows the list of video generation models we currently provide,
 along with related basic information:

-| Model Name                                | CogVideoX-2B                                                                                                                         | 
-|-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
-| Prompt Language                           | English                                                                                                                              | 
-| GPU Memory Required for Inference (FP16)  | 36GB using diffusers (will be optimized before the PR is merged)  and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) | 
-| GPU Memory Required for Fine-tuning(bs=1) | 42GB                                                                                                                                 |
-| Prompt Max  Length                        | 226 Tokens                                                                                                                           |
-| Video Length                              | 6 seconds                                                                                                                            | 
-| Frames Per Second                         | 8 frames                                                                                                                             | 
-| Resolution                                | 720 * 480                                                                                                                            |
-| Quantized Inference                       | Not Supported                                                                                                                        |          
-| Multi-card Inference                      | Not Supported                                                                                                                        |                             
-| Download Link                             | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B)                                                                         |
+| Model Name                                | CogVideoX-2B                                                                                                                          | 
+|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
+| Prompt Language                           | English                                                                                                                               | 
+| GPU Memory Required for Inference (FP16)  | 36GB using diffusers (will be optimized before the PR is merged)  and 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer) | 
+| GPU Memory Required for Fine-tuning(bs=1) | 42GB                                                                                                                                  |
+| Prompt Max  Length                        | 226 Tokens                                                                                                                            |
+| Video Length                              | 6 seconds                                                                                                                             | 
+| Frames Per Second                         | 8 frames                                                                                                                              | 
+| Resolution                                | 720 * 480                                                                                                                             |
+| Quantized Inference                       | Not Supported                                                                                                                         |          
+| Multi-card Inference                      | Not Supported                                                                                                                         |                             
+| Download Link                             | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B)                                                                          |

 ## Project Structure

--- a/README_zh.md
+++ b/README_zh.md
@ -54,18 +54,18 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生

 下表战展示目前我们提供的视频生成模型列表，以及相关基础信息:

-| 模型名字           | CogVideoX-2B                                                                                                                        | 
-|----------------|-------------------------------------------------------------------------------------------------------------------------------------|
-| 提示词语言          | English                                                                                                                             | 
-| 推理显存消耗 (FP-16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) | 
-| 微调显存消耗 (bs=1)  | 42GB                                                                                                                                |
-| 提示词长度上限        | 226 Tokens                                                                                                                          |
-| 视频长度           | 6 seconds                                                                                                                           | 
-| 帧率（每秒）         | 8 frames                                                                                                                            | 
-| 视频分辨率          | 720 * 480                                                                                                                           |
-| 量化推理           | 不支持                                                                                                                                 |          
-| 多卡推理           | 不支持                                                                                                                                 |                             
-| 权重地址           | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B)                                                                        |
+| 模型名字           | CogVideoX-2B                                                                                                                         | 
+|----------------|--------------------------------------------------------------------------------------------------------------------------------------|
+| 提示词语言          | English                                                                                                                              | 
+| 推理显存消耗 (FP-16) | 36GB using diffusers (will be optimized before the PR is merged) and 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer) | 
+| 微调显存消耗 (bs=1)  | 42GB                                                                                                                                 |
+| 提示词长度上限        | 226 Tokens                                                                                                                           |
+| 视频长度           | 6 seconds                                                                                                                            | 
+| 帧率（每秒）         | 8 frames                                                                                                                             | 
+| 视频分辨率          | 720 * 480                                                                                                                            |
+| 量化推理           | 不支持                                                                                                                                  |          
+| 多卡推理           | 不支持                                                                                                                                  |                             
+| 权重地址           | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B)                                                                         |

 ## 项目结构

--- a/inference/cli_demo.py
+++ b/inference/cli_demo.py
@ -43,7 +43,7 @@ def generate_video(

    # Load the pre-trained CogVideoX pipeline with the specified precision (float16) and move it to the specified device
    pipe = CogVideoXPipeline.from_pretrained(model_path, torch_dtype=dtype).to(device)
-
+    pipe.enable_sequential_cpu_offload() # Enable sequential CPU offload for faster inference
    # Encode the prompt to get the prompt embeddings
    prompt_embeds, _ = pipe.encode_prompt(
        prompt=prompt,  # The textual description for video generation
--- a/inference/cli_vae_demo.py
+++ b/inference/cli_vae_demo.py
@ -4,7 +4,7 @@ This script demonstrates how to encode video frames using a pre-trained CogVideo
 Note:
    This script requires the `diffusers>=0.30.0` library to be installed.
    If the video appears “completely green” and cannot be viewed, please switch to a different player to watch it. This is a normal phenomenon.
-    Cost 71GB of GPU memory for encoding a 1-minute video at 720p resolution.
+    Cost 71GB of GPU memory for encoding a 6s video at 720p resolution.

 Run the script:
    $ python cli_demo.py --model_path THUDM/CogVideoX-2b --video_path path/to/video.mp4 --output_path path/to/output