update multi gpus finetune script

2025-11-15 14:32:10 +08:00 · 2024-08-09 13:46:06 +08:00 · 2024-08-09 13:46:06 +08:00 · 8c0d0eb427
commit 8c0d0eb427
parent 6fc9de04dc
7 changed files with 29 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -60,6 +60,8 @@ the quality of the generated video.

 ### SAT

+**Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.**
+
 Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
 recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform
 rapid stacking and development.
@ -67,6 +69,8 @@ rapid stacking and development.

 ### Diffusers

+**Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.**
+
 ```
 pip install -r requirements.txt
 ```
--- a/README_zh.md
+++ b/README_zh.md
@ -93,7 +93,7 @@ pip install -r requirements.txt

 CogVideoX是 [清影](https://chatglm.cn/video?fr=osm_cogvideox) 同源的开源版本视频生成模型。

-下表战展示目前我们提供的视频生成模型列表，以及相关基础信息:
+下表展示目前我们提供的视频生成模型列表，以及相关基础信息:

 | 模型名                 | CogVideoX-2B                                                                                                                  | 
 |---------------------|-------------------------------------------------------------------------------------------------------------------------------|
--- a/sat/README.md
+++ b/sat/README.md
@ -117,8 +117,12 @@ bash inference.sh

 ### Preparing the Environment

-Please note that currently, SAT needs to be installed from the source code for proper fine-tuning. We will address this
-issue in future stable releases.
+Please note that currently, SAT needs to be installed from the source code for proper fine-tuning.
+
+You need to get the code from the source to support the fine-tuning functionality, as these features have not yet been
+released in the Pip package.
+
+We will address this issue in future stable releases.

 ```
 git clone https://github.com/THUDM/SwissArmyTransformer.git
@ -197,7 +201,8 @@ model:
 1. Run the inference code to start fine-tuning.

 ```shell
-bash finetune.sh
+bash finetune_single_gpu.sh # Single GPU
+bash finetune_multi_gpus.sh # Multi GPUs
 ```

 ### Converting to Huggingface Diffusers Supported Weights
--- a/sat/README_zh.md
+++ b/sat/README_zh.md
@ -112,7 +112,9 @@ bash inference.sh

 ### 准备环境

-请注意，目前，SAT需要从源码安装，才能正常微调, 我们将会在未来的稳定版本解决这个问题。
+请注意，目前，SAT需要从源码安装，才能正常微调。
+这是因为你需要使用还没发型到pip包版本的最新代码所支持的功能。
+我们将会在未来的稳定版本解决这个问题。

 ```
 git clone https://github.com/THUDM/SwissArmyTransformer.git
@ -189,7 +191,8 @@ model:
 1. 运行推理代码,即可开始微调。

 ```shell
-bash finetune.sh
+bash finetune_single_gpu.sh # Single GPU
+bash finetune_multi_gpus.sh # Multi GPUs
 ```

 ### 转换到 Huggingface Diffusers 库支持的权重
--- a/sat/data_video.py
+++ b/sat/data_video.py
@ -425,7 +425,7 @@ class SFTDataset(Dataset):
                    self.videos_list.append(tensor_frms)

                    # caption
-                    caption_path = os.path.join(root, filename.replace("videos", "labels").replace(".mp4", ".txt"))
+                    caption_path = os.path.join(root, filename.replace(".mp4", ".txt")).replace("videos", "labels")
                    if os.path.exists(caption_path):
                        caption = open(caption_path, "r").read().splitlines()[0]
                    else:
--- a/sat/finetune_multi_gpus.sh
+++ b/sat/finetune_multi_gpus.sh
@ -0,0 +1,10 @@
+#! /bin/bash
+
+echo "RUN on `hostname`, CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"
+
+run_cmd="torchrun --standalone --nproc_per_node=4 train_video.py --base configs/cogvideox_2b_sft.yaml --seed $RANDOM“
+
+echo ${run_cmd}
+eval ${run_cmd}
+
+echo "DONE on `hostname`"
--- a/sat/finetune_single_gpu.sh
+++ b/sat/finetune_single_gpu.sh