mirror of
https://github.com/THUDM/CogVideo.git
synced 2025-04-06 03:57:56 +08:00
提示词更新
This commit is contained in:
parent
4c2a1ff22d
commit
f0b5f35934
@ -50,6 +50,12 @@ Jump to a specific section:
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prompt Optimization
|
||||
|
||||
Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use the GLM-4 model to
|
||||
optimize the prompt. This is crucial because the model is trained with long prompts, and a good prompt directly affects
|
||||
the quality of the generated video.
|
||||
|
||||
### SAT
|
||||
|
||||
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
|
||||
|
@ -46,6 +46,11 @@
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 提示词优化
|
||||
|
||||
在开始运行模型之前,请参考[这里](inference/convert_demo.py) 查看我们是怎么使用GLM-4大模型对模型进行优化的,这很重要,
|
||||
由于模型是在长提示词下训练的,一额好的直接影响了视频生成的质量。
|
||||
|
||||
### SAT
|
||||
|
||||
查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于此代码进行 CogVideoX 模型结构的改进,研究者使用该代码可以更好的进行快速的迭代和开发。
|
||||
@ -59,6 +64,7 @@ pip install -r requirements.txt
|
||||
|
||||
查看[diffusers_demo](inference/cli_demo.py):包含对推理代码更详细的解释,包括各种关键的参数。(36GB 推理,显存优化以及微调代码正在开发)
|
||||
|
||||
|
||||
## CogVideoX-2B 视频作品
|
||||
|
||||
<div align="center">
|
||||
|
@ -1,4 +1,4 @@
|
||||
git+https://github.com/huggingface/diffusers.git@d1c575ad7ee0390c2735f50cc59a79aae666567a#egg=diffusers
|
||||
diffusers>=0.3.0
|
||||
SwissArmyTransformer==0.4.11 # Inference
|
||||
torch==2.4.0
|
||||
torchvision==0.19.0
|
||||
|
@ -1,6 +1,7 @@
|
||||
# SAT CogVideoX-2B
|
||||
|
||||
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the fine-tuning code for SAT weights.
|
||||
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the
|
||||
fine-tuning code for SAT weights.
|
||||
|
||||
This code is the framework used by the team to train the model. It has few comments and requires careful study.
|
||||
|
||||
@ -41,12 +42,14 @@ Then unzip, the model structure should look like this:
|
||||
|
||||
Next, clone the T5 model, which is not used for training and fine-tuning, but must be used.
|
||||
|
||||
```shell
|
||||
git lfs install
|
||||
git clone https://huggingface.co/google/t5-v1_1-xxl.git
|
||||
```
|
||||
git clone https://huggingface.co/THUDM/CogVideoX-2b.git
|
||||
mkdir t5-v1_1-xxl
|
||||
mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
|
||||
```
|
||||
|
||||
**We don't need the tf_model.h5** file. This file can be deleted.
|
||||
By following the above approach, you will obtain a safetensor format T5 file. Ensure that there are no errors when
|
||||
loading it into Deepspeed in Finetune.
|
||||
|
||||
3. Modify the file `configs/cogvideox_2b_infer.yaml`.
|
||||
|
||||
@ -101,6 +104,9 @@ bash inference.sh
|
||||
|
||||
### Preparing the Environment
|
||||
|
||||
Please note that currently, SAT needs to be installed from the source code for proper fine-tuning. We will address this
|
||||
issue in future stable releases.
|
||||
|
||||
```
|
||||
git clone https://github.com/THUDM/SwissArmyTransformer.git
|
||||
cd SwissArmyTransformer
|
||||
@ -130,7 +136,8 @@ For style fine-tuning, please prepare at least 50 videos and labels with similar
|
||||
|
||||
### Modifying the Configuration File
|
||||
|
||||
We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.
|
||||
We support both `Lora` and `full-parameter fine-tuning` methods. Please note that both fine-tuning methods only apply to
|
||||
the `transformer` part. The `VAE part` is not modified. `T5` is only used as an Encoder.
|
||||
|
||||
the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows.
|
||||
|
||||
|
@ -41,12 +41,12 @@ unzip transformer.zip
|
||||
|
||||
接着,克隆 T5 模型,该模型不用做训练和微调,但是必须使用。
|
||||
|
||||
```shell
|
||||
git lfs install
|
||||
git clone https://huggingface.co/google/t5-v1_1-xxl.git
|
||||
```
|
||||
|
||||
**我们不需要使用tf_model.h5**文件。该文件可以删除。
|
||||
git clone https://huggingface.co/THUDM/CogVideoX-2b.git
|
||||
mkdir t5-v1_1-xxl
|
||||
mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
|
||||
```
|
||||
通过上述方案,你将会得到一个 safetensor 格式的T5文件,确保在 Deepspeed微调过程中读入的时候不会报错。
|
||||
|
||||
3. 修改`configs/cogvideox_2b_infer.yaml`中的文件。
|
||||
|
||||
@ -101,6 +101,8 @@ bash inference.sh
|
||||
|
||||
### 准备环境
|
||||
|
||||
请注意,目前,SAT需要从源码安装,才能正常微调, 我们将会在未来的稳定版本解决这个问题。
|
||||
|
||||
```
|
||||
git clone https://github.com/THUDM/SwissArmyTransformer.git
|
||||
cd SwissArmyTransformer
|
||||
|
Loading…
x
Reference in New Issue
Block a user