diff --git a/README.md b/README.md index b397e58..c292386 100644 --- a/README.md +++ b/README.md @@ -50,6 +50,12 @@ Jump to a specific section: ## Quick Start +### Prompt Optimization + +Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use the GLM-4 model to +optimize the prompt. This is crucial because the model is trained with long prompts, and a good prompt directly affects +the quality of the generated video. + ### SAT Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is diff --git a/README_zh.md b/README_zh.md index a0b3c0b..cae5b40 100644 --- a/README_zh.md +++ b/README_zh.md @@ -46,6 +46,11 @@ ## 快速开始 +### 提示词优化 + +在开始运行模型之前,请参考[这里](inference/convert_demo.py) 查看我们是怎么使用GLM-4大模型对模型进行优化的,这很重要, +由于模型是在长提示词下训练的,一额好的直接影响了视频生成的质量。 + ### SAT 查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于此代码进行 CogVideoX 模型结构的改进,研究者使用该代码可以更好的进行快速的迭代和开发。 @@ -59,6 +64,7 @@ pip install -r requirements.txt 查看[diffusers_demo](inference/cli_demo.py):包含对推理代码更详细的解释,包括各种关键的参数。(36GB 推理,显存优化以及微调代码正在开发) + ## CogVideoX-2B 视频作品
diff --git a/requirements.txt b/requirements.txt index e195b53..55b376e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,4 @@ -git+https://github.com/huggingface/diffusers.git@d1c575ad7ee0390c2735f50cc59a79aae666567a#egg=diffusers +diffusers>=0.3.0 SwissArmyTransformer==0.4.11 # Inference torch==2.4.0 torchvision==0.19.0 diff --git a/sat/README.md b/sat/README.md index a2e69d6..be3fb0f 100644 --- a/sat/README.md +++ b/sat/README.md @@ -1,6 +1,7 @@ # SAT CogVideoX-2B -This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the fine-tuning code for SAT weights. +This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the +fine-tuning code for SAT weights. This code is the framework used by the team to train the model. It has few comments and requires careful study. @@ -41,12 +42,14 @@ Then unzip, the model structure should look like this: Next, clone the T5 model, which is not used for training and fine-tuning, but must be used. -```shell -git lfs install -git clone https://huggingface.co/google/t5-v1_1-xxl.git +``` +git clone https://huggingface.co/THUDM/CogVideoX-2b.git +mkdir t5-v1_1-xxl +mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl ``` -**We don't need the tf_model.h5** file. This file can be deleted. +By following the above approach, you will obtain a safetensor format T5 file. Ensure that there are no errors when +loading it into Deepspeed in Finetune. 3. Modify the file `configs/cogvideox_2b_infer.yaml`. @@ -101,6 +104,9 @@ bash inference.sh ### Preparing the Environment +Please note that currently, SAT needs to be installed from the source code for proper fine-tuning. We will address this +issue in future stable releases. + ``` git clone https://github.com/THUDM/SwissArmyTransformer.git cd SwissArmyTransformer @@ -130,7 +136,8 @@ For style fine-tuning, please prepare at least 50 videos and labels with similar ### Modifying the Configuration File -We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder. +We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to +the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder. the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows. diff --git a/sat/README_zh.md b/sat/README_zh.md index e2d9be9..833a221 100644 --- a/sat/README_zh.md +++ b/sat/README_zh.md @@ -41,12 +41,12 @@ unzip transformer.zip 接着,克隆 T5 模型,该模型不用做训练和微调,但是必须使用。 -```shell -git lfs install -git clone https://huggingface.co/google/t5-v1_1-xxl.git ``` - -**我们不需要使用tf_model.h5**文件。该文件可以删除。 +git clone https://huggingface.co/THUDM/CogVideoX-2b.git +mkdir t5-v1_1-xxl +mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl +``` +通过上述方案,你将会得到一个 safetensor 格式的T5文件,确保在 Deepspeed微调过程中读入的时候不会报错。 3. 修改`configs/cogvideox_2b_infer.yaml`中的文件。 @@ -101,6 +101,8 @@ bash inference.sh ### 准备环境 +请注意,目前,SAT需要从源码安装,才能正常微调, 我们将会在未来的稳定版本解决这个问题。 + ``` git clone https://github.com/THUDM/SwissArmyTransformer.git cd SwissArmyTransformer