提示词更新

This commit is contained in:
zR 2024-08-07 16:19:19 +08:00
parent 4c2a1ff22d
commit f0b5f35934
5 changed files with 33 additions and 12 deletions

View File

@ -50,6 +50,12 @@ Jump to a specific section:
## Quick Start
### Prompt Optimization
Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use the GLM-4 model to
optimize the prompt. This is crucial because the model is trained with long prompts, and a good prompt directly affects
the quality of the generated video.
### SAT
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is

View File

@ -46,6 +46,11 @@
## 快速开始
### 提示词优化
在开始运行模型之前,请参考[这里](inference/convert_demo.py) 查看我们是怎么使用GLM-4大模型对模型进行优化的这很重要
由于模型是在长提示词下训练的,一额好的直接影响了视频生成的质量。
### SAT
查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于此代码进行 CogVideoX 模型结构的改进,研究者使用该代码可以更好的进行快速的迭代和开发。
@ -59,6 +64,7 @@ pip install -r requirements.txt
查看[diffusers_demo](inference/cli_demo.py)包含对推理代码更详细的解释包括各种关键的参数。36GB 推理,显存优化以及微调代码正在开发)
## CogVideoX-2B 视频作品
<div align="center">

View File

@ -1,4 +1,4 @@
git+https://github.com/huggingface/diffusers.git@d1c575ad7ee0390c2735f50cc59a79aae666567a#egg=diffusers
diffusers>=0.3.0
SwissArmyTransformer==0.4.11 # Inference
torch==2.4.0
torchvision==0.19.0

View File

@ -1,6 +1,7 @@
# SAT CogVideoX-2B
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the fine-tuning code for SAT weights.
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the
fine-tuning code for SAT weights.
This code is the framework used by the team to train the model. It has few comments and requires careful study.
@ -41,12 +42,14 @@ Then unzip, the model structure should look like this:
Next, clone the T5 model, which is not used for training and fine-tuning, but must be used.
```shell
git lfs install
git clone https://huggingface.co/google/t5-v1_1-xxl.git
```
git clone https://huggingface.co/THUDM/CogVideoX-2b.git
mkdir t5-v1_1-xxl
mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
```
**We don't need the tf_model.h5** file. This file can be deleted.
By following the above approach, you will obtain a safetensor format T5 file. Ensure that there are no errors when
loading it into Deepspeed in Finetune.
3. Modify the file `configs/cogvideox_2b_infer.yaml`.
@ -101,6 +104,9 @@ bash inference.sh
### Preparing the Environment
Please note that currently, SAT needs to be installed from the source code for proper fine-tuning. We will address this
issue in future stable releases.
```
git clone https://github.com/THUDM/SwissArmyTransformer.git
cd SwissArmyTransformer
@ -130,7 +136,8 @@ For style fine-tuning, please prepare at least 50 videos and labels with similar
### Modifying the Configuration File
We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.
We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to
the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.
the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows.

View File

@ -41,12 +41,12 @@ unzip transformer.zip
接着,克隆 T5 模型,该模型不用做训练和微调,但是必须使用。
```shell
git lfs install
git clone https://huggingface.co/google/t5-v1_1-xxl.git
```
**我们不需要使用tf_model.h5**文件。该文件可以删除。
git clone https://huggingface.co/THUDM/CogVideoX-2b.git
mkdir t5-v1_1-xxl
mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
```
通过上述方案,你将会得到一个 safetensor 格式的T5文件确保在 Deepspeed微调过程中读入的时候不会报错。
3. 修改`configs/cogvideox_2b_infer.yaml`中的文件。
@ -101,6 +101,8 @@ bash inference.sh
### 准备环境
请注意目前SAT需要从源码安装才能正常微调, 我们将会在未来的稳定版本解决这个问题。
```
git clone https://github.com/THUDM/SwissArmyTransformer.git
cd SwissArmyTransformer