提示词更新

This commit is contained in:
zR 2024-08-07 16:19:19 +08:00
parent 4c2a1ff22d
commit f0b5f35934
5 changed files with 33 additions and 12 deletions

View File

@ -50,6 +50,12 @@ Jump to a specific section:
## Quick Start ## Quick Start
### Prompt Optimization
Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use the GLM-4 model to
optimize the prompt. This is crucial because the model is trained with long prompts, and a good prompt directly affects
the quality of the generated video.
### SAT ### SAT
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is

View File

@ -46,6 +46,11 @@
## 快速开始 ## 快速开始
### 提示词优化
在开始运行模型之前,请参考[这里](inference/convert_demo.py) 查看我们是怎么使用GLM-4大模型对模型进行优化的这很重要
由于模型是在长提示词下训练的,一额好的直接影响了视频生成的质量。
### SAT ### SAT
查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于此代码进行 CogVideoX 模型结构的改进,研究者使用该代码可以更好的进行快速的迭代和开发。 查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于此代码进行 CogVideoX 模型结构的改进,研究者使用该代码可以更好的进行快速的迭代和开发。
@ -59,6 +64,7 @@ pip install -r requirements.txt
查看[diffusers_demo](inference/cli_demo.py)包含对推理代码更详细的解释包括各种关键的参数。36GB 推理,显存优化以及微调代码正在开发) 查看[diffusers_demo](inference/cli_demo.py)包含对推理代码更详细的解释包括各种关键的参数。36GB 推理,显存优化以及微调代码正在开发)
## CogVideoX-2B 视频作品 ## CogVideoX-2B 视频作品
<div align="center"> <div align="center">

View File

@ -1,4 +1,4 @@
git+https://github.com/huggingface/diffusers.git@d1c575ad7ee0390c2735f50cc59a79aae666567a#egg=diffusers diffusers>=0.3.0
SwissArmyTransformer==0.4.11 # Inference SwissArmyTransformer==0.4.11 # Inference
torch==2.4.0 torch==2.4.0
torchvision==0.19.0 torchvision==0.19.0

View File

@ -1,6 +1,7 @@
# SAT CogVideoX-2B # SAT CogVideoX-2B
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the fine-tuning code for SAT weights. This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the
fine-tuning code for SAT weights.
This code is the framework used by the team to train the model. It has few comments and requires careful study. This code is the framework used by the team to train the model. It has few comments and requires careful study.
@ -41,12 +42,14 @@ Then unzip, the model structure should look like this:
Next, clone the T5 model, which is not used for training and fine-tuning, but must be used. Next, clone the T5 model, which is not used for training and fine-tuning, but must be used.
```shell ```
git lfs install git clone https://huggingface.co/THUDM/CogVideoX-2b.git
git clone https://huggingface.co/google/t5-v1_1-xxl.git mkdir t5-v1_1-xxl
mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
``` ```
**We don't need the tf_model.h5** file. This file can be deleted. By following the above approach, you will obtain a safetensor format T5 file. Ensure that there are no errors when
loading it into Deepspeed in Finetune.
3. Modify the file `configs/cogvideox_2b_infer.yaml`. 3. Modify the file `configs/cogvideox_2b_infer.yaml`.
@ -101,6 +104,9 @@ bash inference.sh
### Preparing the Environment ### Preparing the Environment
Please note that currently, SAT needs to be installed from the source code for proper fine-tuning. We will address this
issue in future stable releases.
``` ```
git clone https://github.com/THUDM/SwissArmyTransformer.git git clone https://github.com/THUDM/SwissArmyTransformer.git
cd SwissArmyTransformer cd SwissArmyTransformer
@ -130,7 +136,8 @@ For style fine-tuning, please prepare at least 50 videos and labels with similar
### Modifying the Configuration File ### Modifying the Configuration File
We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder. We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to
the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.
the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows. the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows.

View File

@ -41,12 +41,12 @@ unzip transformer.zip
接着,克隆 T5 模型,该模型不用做训练和微调,但是必须使用。 接着,克隆 T5 模型,该模型不用做训练和微调,但是必须使用。
```shell
git lfs install
git clone https://huggingface.co/google/t5-v1_1-xxl.git
``` ```
git clone https://huggingface.co/THUDM/CogVideoX-2b.git
**我们不需要使用tf_model.h5**文件。该文件可以删除。 mkdir t5-v1_1-xxl
mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
```
通过上述方案,你将会得到一个 safetensor 格式的T5文件确保在 Deepspeed微调过程中读入的时候不会报错。
3. 修改`configs/cogvideox_2b_infer.yaml`中的文件。 3. 修改`configs/cogvideox_2b_infer.yaml`中的文件。
@ -101,6 +101,8 @@ bash inference.sh
### 准备环境 ### 准备环境
请注意目前SAT需要从源码安装才能正常微调, 我们将会在未来的稳定版本解决这个问题。
``` ```
git clone https://github.com/THUDM/SwissArmyTransformer.git git clone https://github.com/THUDM/SwissArmyTransformer.git
cd SwissArmyTransformer cd SwissArmyTransformer