mirror of
https://github.com/THUDM/CogVideo.git
synced 2025-04-05 03:04:56 +08:00
add comment as #653
This commit is contained in:
parent
2f275e82b5
commit
7dc8516bcb
@ -4,9 +4,12 @@
|
||||
|
||||
[日本語で読む](./README_ja.md)
|
||||
|
||||
This folder contains inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights, along with fine-tuning code for SAT weights.
|
||||
This folder contains inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights, along with
|
||||
fine-tuning code for SAT weights.
|
||||
|
||||
This code framework was used by our team during model training. There are few comments, so careful study is required.
|
||||
If you are interested in the `CogVideoX1.0` version of the model, please check the SAT
|
||||
folder [here](https://github.com/THUDM/CogVideo/releases/tag/v1.0). This branch only supports the `CogVideoX1.5` series
|
||||
models.
|
||||
|
||||
## Inference Model
|
||||
|
||||
@ -272,7 +275,8 @@ args:
|
||||
force_inference: True
|
||||
```
|
||||
|
||||
+ If using a text file to save multiple prompts, modify `configs/test.txt` as needed. One prompt per line. If you are unsure how to write prompts, use [this code](../inference/convert_demo.py) to call an LLM for refinement.
|
||||
+ If using a text file to save multiple prompts, modify `configs/test.txt` as needed. One prompt per line. If you are
|
||||
unsure how to write prompts, use [this code](../inference/convert_demo.py) to call an LLM for refinement.
|
||||
+ To use command-line input, modify:
|
||||
|
||||
```
|
||||
@ -313,13 +317,15 @@ The dataset should be structured as follows:
|
||||
├── ...
|
||||
```
|
||||
|
||||
Each txt file should have the same name as the corresponding video file and contain the label for that video. The videos and labels should correspond one-to-one. Generally, avoid using one video with multiple labels.
|
||||
Each txt file should have the same name as the corresponding video file and contain the label for that video. The videos
|
||||
and labels should correspond one-to-one. Generally, avoid using one video with multiple labels.
|
||||
|
||||
For style fine-tuning, prepare at least 50 videos and labels with a similar style to facilitate fitting.
|
||||
|
||||
### Modifying the Configuration File
|
||||
|
||||
We support two fine-tuning methods: `Lora` and full-parameter fine-tuning. Note that both methods only fine-tune the `transformer` part. The `VAE` part is not modified, and `T5` is only used as an encoder.
|
||||
We support two fine-tuning methods: `Lora` and full-parameter fine-tuning. Note that both methods only fine-tune the
|
||||
`transformer` part. The `VAE` part is not modified, and `T5` is only used as an encoder.
|
||||
Modify the files in `configs/sft.yaml` (full fine-tuning) as follows:
|
||||
|
||||
```yaml
|
||||
@ -371,13 +377,15 @@ model:
|
||||
|
||||
Edit `finetune_single_gpu.sh` or `finetune_multi_gpus.sh` and select the config file. Below are two examples:
|
||||
|
||||
1. If you want to use the `CogVideoX-2B` model with `Lora`, modify `finetune_single_gpu.sh` or `finetune_multi_gpus.sh` as follows:
|
||||
1. If you want to use the `CogVideoX-2B` model with `Lora`, modify `finetune_single_gpu.sh` or `finetune_multi_gpus.sh`
|
||||
as follows:
|
||||
|
||||
```
|
||||
run_cmd="torchrun --standalone --nproc_per_node=8 train_video.py --base configs/cogvideox_2b_lora.yaml configs/sft.yaml --seed $RANDOM"
|
||||
```
|
||||
|
||||
2. If you want to use the `CogVideoX-2B` model with full fine-tuning, modify `finetune_single_gpu.sh` or `finetune_multi_gpus.sh` as follows:
|
||||
2. If you want to use the `CogVideoX-2B` model with full fine-tuning, modify `finetune_single_gpu.sh` or
|
||||
`finetune_multi_gpus.sh` as follows:
|
||||
|
||||
```
|
||||
run_cmd="torchrun --standalone --nproc_per_node=8 train_video.py --base configs/cogvideox_2b.yaml configs/sft.yaml --seed $RANDOM"
|
||||
@ -417,9 +425,11 @@ python ../tools/convert_weight_sat2hf.py
|
||||
### Exporting Lora Weights from SAT to Huggingface Diffusers
|
||||
|
||||
Support is provided for exporting Lora weights from SAT to Huggingface Diffusers format.
|
||||
After training with the above steps, you’ll find the SAT model with Lora weights in {args.save}/1000/1000/mp_rank_00_model_states.pt
|
||||
After training with the above steps, you’ll find the SAT model with Lora weights in
|
||||
{args.save}/1000/1000/mp_rank_00_model_states.pt
|
||||
|
||||
The export script `export_sat_lora_weight.py` is located in the CogVideoX repository under `tools/`. After exporting, use `load_cogvideox_lora.py` for inference.
|
||||
The export script `export_sat_lora_weight.py` is located in the CogVideoX repository under `tools/`. After exporting,
|
||||
use `load_cogvideox_lora.py` for inference.
|
||||
|
||||
Export command:
|
||||
|
||||
@ -427,7 +437,8 @@ Export command:
|
||||
python tools/export_sat_lora_weight.py --sat_pt_path {args.save}/{experiment_name}-09-09-21-10/1000/mp_rank_00_model_states.pt --lora_save_directory {args.save}/export_hf_lora_weights_1/
|
||||
```
|
||||
|
||||
The following model structures were modified during training. Here is the mapping between SAT and HF Lora structures. Lora adds a low-rank weight to the attention structure of the model.
|
||||
The following model structures were modified during training. Here is the mapping between SAT and HF Lora structures.
|
||||
Lora adds a low-rank weight to the attention structure of the model.
|
||||
|
||||
```
|
||||
'attention.query_key_value.matrix_A.0': 'attn1.to_q.lora_A.weight',
|
||||
|
@ -5,7 +5,8 @@
|
||||
[中文阅读](./README_zh.md)
|
||||
|
||||
このフォルダには、[SAT](https://github.com/THUDM/SwissArmyTransformer)の重みを使用した推論コードと、SAT重みのファインチューニングコードが含まれています。
|
||||
このコードは、チームがモデルを訓練する際に使用したフレームワークです。コメントが少ないため、注意深く確認する必要があります。
|
||||
`CogVideoX1.0`バージョンのモデルに関心がある場合は、[こちら](https://github.com/THUDM/CogVideo/releases/tag/v1.0)
|
||||
のSATフォルダを参照してください。このブランチは`CogVideoX1.5`シリーズのモデルのみをサポートしています。
|
||||
|
||||
## 推論モデル
|
||||
|
||||
@ -16,7 +17,8 @@ pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. モデルの重みをダウンロード
|
||||
まず、SATミラーからモデルの重みをダウンロードしてください。
|
||||
|
||||
まず、SATミラーからモデルの重みをダウンロードしてください。
|
||||
|
||||
#### CogVideoX1.5 モデル
|
||||
|
||||
@ -270,7 +272,9 @@ args:
|
||||
force_inference: True
|
||||
```
|
||||
|
||||
+ 複数のプロンプトを含むテキストファイルを使用する場合、`configs/test.txt`を適宜編集してください。1行につき1プロンプトです。プロンプトの書き方が分からない場合は、[こちらのコード](../inference/convert_demo.py)を使用してLLMで補正できます。
|
||||
+ 複数のプロンプトを含むテキストファイルを使用する場合、`configs/test.txt`
|
||||
を適宜編集してください。1行につき1プロンプトです。プロンプトの書き方が分からない場合は、[こちらのコード](../inference/convert_demo.py)
|
||||
を使用してLLMで補正できます。
|
||||
+ コマンドライン入力を使用する場合、以下のように変更します:
|
||||
|
||||
```
|
||||
@ -346,6 +350,7 @@ bash inference.sh
|
||||
fp16:
|
||||
enabled: True # CogVideoX-2B 用は True、CogVideoX-5B 用は False に設定
|
||||
```
|
||||
|
||||
```yaml
|
||||
args:
|
||||
latent_channels: 16
|
||||
@ -364,7 +369,8 @@ args:
|
||||
force_inference: True
|
||||
```
|
||||
|
||||
+ If using a text file to save multiple prompts, modify `configs/test.txt` as needed. One prompt per line. If you are unsure how to write prompts, use [this code](../inference/convert_demo.py) to call an LLM for refinement.
|
||||
+ If using a text file to save multiple prompts, modify `configs/test.txt` as needed. One prompt per line. If you are
|
||||
unsure how to write prompts, use [this code](../inference/convert_demo.py) to call an LLM for refinement.
|
||||
+ To use command-line input, modify:
|
||||
|
||||
```
|
||||
@ -405,13 +411,15 @@ The dataset should be structured as follows:
|
||||
├── ...
|
||||
```
|
||||
|
||||
Each txt file should have the same name as the corresponding video file and contain the label for that video. The videos and labels should correspond one-to-one. Generally, avoid using one video with multiple labels.
|
||||
Each txt file should have the same name as the corresponding video file and contain the label for that video. The videos
|
||||
and labels should correspond one-to-one. Generally, avoid using one video with multiple labels.
|
||||
|
||||
For style fine-tuning, prepare at least 50 videos and labels with a similar style to facilitate fitting.
|
||||
|
||||
### Modifying the Configuration File
|
||||
|
||||
We support two fine-tuning methods: `Lora` and full-parameter fine-tuning. Note that both methods only fine-tune the `transformer` part. The `VAE` part is not modified, and `T5` is only used as an encoder.
|
||||
We support two fine-tuning methods: `Lora` and full-parameter fine-tuning. Note that both methods only fine-tune the
|
||||
`transformer` part. The `VAE` part is not modified, and `T5` is only used as an encoder.
|
||||
Modify the files in `configs/sft.yaml` (full fine-tuning) as follows:
|
||||
|
||||
```yaml
|
||||
@ -463,13 +471,15 @@ model:
|
||||
|
||||
Edit `finetune_single_gpu.sh` or `finetune_multi_gpus.sh` and select the config file. Below are two examples:
|
||||
|
||||
1. If you want to use the `CogVideoX-2B` model with `Lora`, modify `finetune_single_gpu.sh` or `finetune_multi_gpus.sh` as follows:
|
||||
1. If you want to use the `CogVideoX-2B` model with `Lora`, modify `finetune_single_gpu.sh` or `finetune_multi_gpus.sh`
|
||||
as follows:
|
||||
|
||||
```
|
||||
run_cmd="torchrun --standalone --nproc_per_node=8 train_video.py --base configs/cogvideox_2b_lora.yaml configs/sft.yaml --seed $RANDOM"
|
||||
```
|
||||
|
||||
2. If you want to use the `CogVideoX-2B` model with full fine-tuning, modify `finetune_single_gpu.sh` or `finetune_multi_gpus.sh` as follows:
|
||||
2. If you want to use the `CogVideoX-2B` model with full fine-tuning, modify `finetune_single_gpu.sh` or
|
||||
`finetune_multi_gpus.sh` as follows:
|
||||
|
||||
```
|
||||
run_cmd="torchrun --standalone --nproc_per_node=8 train_video.py --base configs/cogvideox_2b.yaml configs/sft.yaml --seed $RANDOM"
|
||||
@ -509,9 +519,11 @@ python ../tools/convert_weight_sat2hf.py
|
||||
### Exporting Lora Weights from SAT to Huggingface Diffusers
|
||||
|
||||
Support is provided for exporting Lora weights from SAT to Huggingface Diffusers format.
|
||||
After training with the above steps, you’ll find the SAT model with Lora weights in {args.save}/1000/1000/mp_rank_00_model_states.pt
|
||||
After training with the above steps, you’ll find the SAT model with Lora weights in
|
||||
{args.save}/1000/1000/mp_rank_00_model_states.pt
|
||||
|
||||
The export script `export_sat_lora_weight.py` is located in the CogVideoX repository under `tools/`. After exporting, use `load_cogvideox_lora.py` for inference.
|
||||
The export script `export_sat_lora_weight.py` is located in the CogVideoX repository under `tools/`. After exporting,
|
||||
use `load_cogvideox_lora.py` for inference.
|
||||
|
||||
Export command:
|
||||
|
||||
@ -519,7 +531,8 @@ Export command:
|
||||
python tools/export_sat_lora_weight.py --sat_pt_path {args.save}/{experiment_name}-09-09-21-10/1000/mp_rank_00_model_states.pt --lora_save_directory {args.save}/export_hf_lora_weights_1/
|
||||
```
|
||||
|
||||
The following model structures were modified during training. Here is the mapping between SAT and HF Lora structures. Lora adds a low-rank weight to the attention structure of the model.
|
||||
The following model structures were modified during training. Here is the mapping between SAT and HF Lora structures.
|
||||
Lora adds a low-rank weight to the attention structure of the model.
|
||||
|
||||
```
|
||||
'attention.query_key_value.matrix_A.0': 'attn1.to_q.lora_A.weight',
|
||||
|
@ -5,8 +5,7 @@
|
||||
[日本語で読む](./README_ja.md)
|
||||
|
||||
本文件夹包含了使用 [SAT](https://github.com/THUDM/SwissArmyTransformer) 权重的推理代码,以及 SAT 权重的微调代码。
|
||||
|
||||
该代码是团队训练模型时使用的框架。注释较少,需要认真研究。
|
||||
如果你关注 `CogVideoX1.0`版本的模型,请查看[这里](https://github.com/THUDM/CogVideo/releases/tag/v1.0)的SAT文件夹,该分支仅支持`CogVideoX1.5`系列模型。
|
||||
|
||||
## 推理模型
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user