diff --git a/README.md b/README.md
index b397e58..c292386 100644
--- a/README.md
+++ b/README.md
@@ -50,6 +50,12 @@ Jump to a specific section:
## Quick Start
+### Prompt Optimization
+
+Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use the GLM-4 model to
+optimize the prompt. This is crucial because the model is trained with long prompts, and a good prompt directly affects
+the quality of the generated video.
+
### SAT
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
diff --git a/README_zh.md b/README_zh.md
index a0b3c0b..cae5b40 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -46,6 +46,11 @@
## 快速开始
+### 提示词优化
+
+在开始运行模型之前,请参考[这里](inference/convert_demo.py) 查看我们是怎么使用GLM-4大模型对模型进行优化的,这很重要,
+由于模型是在长提示词下训练的,一额好的直接影响了视频生成的质量。
+
### SAT
查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于此代码进行 CogVideoX 模型结构的改进,研究者使用该代码可以更好的进行快速的迭代和开发。
@@ -59,6 +64,7 @@ pip install -r requirements.txt
查看[diffusers_demo](inference/cli_demo.py):包含对推理代码更详细的解释,包括各种关键的参数。(36GB 推理,显存优化以及微调代码正在开发)
+
## CogVideoX-2B 视频作品
diff --git a/requirements.txt b/requirements.txt
index e195b53..55b376e 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
-git+https://github.com/huggingface/diffusers.git@d1c575ad7ee0390c2735f50cc59a79aae666567a#egg=diffusers
+diffusers>=0.3.0
SwissArmyTransformer==0.4.11 # Inference
torch==2.4.0
torchvision==0.19.0
diff --git a/sat/README.md b/sat/README.md
index a2e69d6..be3fb0f 100644
--- a/sat/README.md
+++ b/sat/README.md
@@ -1,6 +1,7 @@
# SAT CogVideoX-2B
-This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the fine-tuning code for SAT weights.
+This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the
+fine-tuning code for SAT weights.
This code is the framework used by the team to train the model. It has few comments and requires careful study.
@@ -41,12 +42,14 @@ Then unzip, the model structure should look like this:
Next, clone the T5 model, which is not used for training and fine-tuning, but must be used.
-```shell
-git lfs install
-git clone https://huggingface.co/google/t5-v1_1-xxl.git
+```
+git clone https://huggingface.co/THUDM/CogVideoX-2b.git
+mkdir t5-v1_1-xxl
+mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
```
-**We don't need the tf_model.h5** file. This file can be deleted.
+By following the above approach, you will obtain a safetensor format T5 file. Ensure that there are no errors when
+loading it into Deepspeed in Finetune.
3. Modify the file `configs/cogvideox_2b_infer.yaml`.
@@ -101,6 +104,9 @@ bash inference.sh
### Preparing the Environment
+Please note that currently, SAT needs to be installed from the source code for proper fine-tuning. We will address this
+issue in future stable releases.
+
```
git clone https://github.com/THUDM/SwissArmyTransformer.git
cd SwissArmyTransformer
@@ -130,7 +136,8 @@ For style fine-tuning, please prepare at least 50 videos and labels with similar
### Modifying the Configuration File
-We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.
+We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to
+the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.
the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows.
diff --git a/sat/README_zh.md b/sat/README_zh.md
index e2d9be9..833a221 100644
--- a/sat/README_zh.md
+++ b/sat/README_zh.md
@@ -41,12 +41,12 @@ unzip transformer.zip
接着,克隆 T5 模型,该模型不用做训练和微调,但是必须使用。
-```shell
-git lfs install
-git clone https://huggingface.co/google/t5-v1_1-xxl.git
```
-
-**我们不需要使用tf_model.h5**文件。该文件可以删除。
+git clone https://huggingface.co/THUDM/CogVideoX-2b.git
+mkdir t5-v1_1-xxl
+mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
+```
+通过上述方案,你将会得到一个 safetensor 格式的T5文件,确保在 Deepspeed微调过程中读入的时候不会报错。
3. 修改`configs/cogvideox_2b_infer.yaml`中的文件。
@@ -101,6 +101,8 @@ bash inference.sh
### 准备环境
+请注意,目前,SAT需要从源码安装,才能正常微调, 我们将会在未来的稳定版本解决这个问题。
+
```
git clone https://github.com/THUDM/SwissArmyTransformer.git
cd SwissArmyTransformer