提示词更新

2026-01-03 09:46:58 +08:00 · 2024-08-07 16:19:19 +08:00 · 2024-08-07 16:19:19 +08:00 · f0b5f35934
commit f0b5f35934
parent 4c2a1ff22d
5 changed files with 33 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -50,6 +50,12 @@ Jump to a specific section:

 ## Quick Start

+### Prompt Optimization
+
+Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use the GLM-4 model to
+optimize the prompt. This is crucial because the model is trained with long prompts, and a good prompt directly affects
+the quality of the generated video.
+
 ### SAT

 Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
--- a/README_zh.md
+++ b/README_zh.md
@ -46,6 +46,11 @@

 ## 快速开始

+### 提示词优化
+
+在开始运行模型之前，请参考[这里](inference/convert_demo.py) 查看我们是怎么使用GLM-4大模型对模型进行优化的，这很重要，
+由于模型是在长提示词下训练的，一额好的直接影响了视频生成的质量。
+
 ### SAT

 查看sat文件夹下的[sat_demo](sat/README.md)：包含了 SAT 权重的推理代码和微调代码，推荐基于此代码进行 CogVideoX 模型结构的改进，研究者使用该代码可以更好的进行快速的迭代和开发。
@ -59,6 +64,7 @@ pip install -r requirements.txt

 查看[diffusers_demo](inference/cli_demo.py)：包含对推理代码更详细的解释，包括各种关键的参数。（36GB 推理，显存优化以及微调代码正在开发）

+
 ## CogVideoX-2B 视频作品

 <div align="center">
--- a/requirements.txt
+++ b/requirements.txt
@ -1,4 +1,4 @@
-git+https://github.com/huggingface/diffusers.git@d1c575ad7ee0390c2735f50cc59a79aae666567a#egg=diffusers
+diffusers>=0.3.0
 SwissArmyTransformer==0.4.11 # Inference
 torch==2.4.0
 torchvision==0.19.0
--- a/sat/README.md
+++ b/sat/README.md
@ -1,6 +1,7 @@
 # SAT CogVideoX-2B

-This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the fine-tuning code for SAT weights.
+This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the
+fine-tuning code for SAT weights.

 This code is the framework used by the team to train the model. It has few comments and requires careful study.

@ -41,12 +42,14 @@ Then unzip, the model structure should look like this:

 Next, clone the T5 model, which is not used for training and fine-tuning, but must be used.

-```shell
-git lfs install 
-git clone https://huggingface.co/google/t5-v1_1-xxl.git
+```
+git clone https://huggingface.co/THUDM/CogVideoX-2b.git
+mkdir t5-v1_1-xxl
+mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
 ```

-**We don't need the tf_model.h5** file. This file can be deleted.
+By following the above approach, you will obtain a safetensor format T5 file. Ensure that there are no errors when
+loading it into Deepspeed in Finetune.

 3. Modify the file `configs/cogvideox_2b_infer.yaml`.

@ -101,6 +104,9 @@ bash inference.sh

 ### Preparing the Environment

+Please note that currently, SAT needs to be installed from the source code for proper fine-tuning. We will address this
+issue in future stable releases.
+
 ```
 git clone https://github.com/THUDM/SwissArmyTransformer.git
 cd SwissArmyTransformer
@ -130,7 +136,8 @@ For style fine-tuning, please prepare at least 50 videos and labels with similar

 ### Modifying the Configuration File

-We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.
+We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to
+the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.

 the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows.

--- a/sat/README_zh.md
+++ b/sat/README_zh.md
@ -41,12 +41,12 @@ unzip transformer.zip

 接着，克隆 T5 模型，该模型不用做训练和微调，但是必须使用。

-```shell
-git lfs install 
-git clone https://huggingface.co/google/t5-v1_1-xxl.git
 ```
-
-**我们不需要使用tf_model.h5**文件。该文件可以删除。
+git clone https://huggingface.co/THUDM/CogVideoX-2b.git
+mkdir t5-v1_1-xxl
+mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
+```
+通过上述方案，你将会得到一个 safetensor 格式的T5文件，确保在 Deepspeed微调过程中读入的时候不会报错。

 3. 修改`configs/cogvideox_2b_infer.yaml`中的文件。

@ -101,6 +101,8 @@ bash inference.sh

 ### 准备环境

+请注意，目前，SAT需要从源码安装，才能正常微调, 我们将会在未来的稳定版本解决这个问题。
+
 ```
 git clone https://github.com/THUDM/SwissArmyTransformer.git
 cd SwissArmyTransformer