From 125432d4032b827706ac769012aaf534d3e4e5c4 Mon Sep 17 00:00:00 2001
From: zR <2448370773@qq.com>
Date: Wed, 7 Aug 2024 19:27:53 +0800
Subject: [PATCH] 1
---
README.md | 12 ++++---
README_zh.md | 61 ++++++++++++++++++---------------
inference/cli_demo.py | 44 +++++++++++++++++-------
inference/gradio_web_demo.py | 5 +--
inference/streamlit_web_demo.py | 4 ++-
5 files changed, 78 insertions(+), 48 deletions(-)
diff --git a/README.md b/README.md
index cc96605..fba09da 100644
--- a/README.md
+++ b/README.md
@@ -20,6 +20,8 @@
## Update and News
+- 🔥 **News**: `2024/8/7`: CogVideoX has been integrated into `diffusers` version 0.30.0. Inference can now be performed
+ on a single 3090 GPU. For more details, please refer to the [code](inference/cli_demo.py).
- 🔥 **News**: ``2024/8/6``: We have also open-sourced **3D Causal VAE** used in **CogVideoX-2B**, which can reconstruct
the video almost losslessly.
- 🔥 **News**: ``2024/8/6``: We have open-sourced **CogVideoX-2B**,the first model in the CogVideoX series of video
@@ -106,14 +108,14 @@ along with related basic information:
| Model Name | CogVideoX-2B |
|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Prompt Language | English |
-| GPU Memory Required for Inference (FP16) | 18GB if using [SAT](https://github.com/THUDM/SwissArmyTransformer); 36GB if using diffusers (will be optimized before the PR is merged) |
+| Single GPU Inference (FP16) | 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer)
23.9GB using diffusers |
+| Multi GPUs Inference (FP16) | 20GB minimum per GPU using diffusers |
| GPU Memory Required for Fine-tuning(bs=1) | 40GB |
| Prompt Max Length | 226 Tokens |
| Video Length | 6 seconds |
| Frames Per Second | 8 frames |
| Resolution | 720 * 480 |
| Quantized Inference | Not Supported |
-| Multi-card Inference | Not Supported |
| Download Link (HF diffusers Model) | 🤗 [Huggingface](https://huggingface.co/THUDM/CogVideoX-2B) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/CogVideoX-2b) [💫 WiseModel](https://wisemodel.cn/models/ZhipuAI/CogVideoX-2b) |
| Download Link (SAT Model) | [SAT](./sat/README.md) |
@@ -132,14 +134,16 @@ of the **CogVideoX** open-source model.
CogVideoX is trained on long caption, we need to convert the input text to be consistent with the training
distribution using a LLM. By default, the script uses GLM4, but it can also be replaced with any other LLM such as
GPT, Gemini, etc.
-+ [gradio_web_demo](inference/gradio_web_demo.py): A simple gradio web UI demonstrating how to use the CogVideoX-2B model to generate
++ [gradio_web_demo](inference/gradio_web_demo.py): A simple gradio web UI demonstrating how to use the CogVideoX-2B
+ model to generate
videos.