From 66369a90aacf360028a19f7f32a0baa0c9c5a472 Mon Sep 17 00:00:00 2001
From: zR <2448370773@qq.com>
Date: Tue, 17 Sep 2024 23:42:35 +0800
Subject: [PATCH] update of readme and hostfile

---
 finetune/README.md         | 211 +++++++++++++++----------------------
 finetune/README_ja.md      | 142 +++++++++++++++++++++++++
 finetune/README_zh.md      |  34 ++++--
 finetune/hostfile.txt      |   4 +-
 inference/cli_demo.py      |  15 ++-
 resources/contribute_ja.md |  47 +++++++++
 6 files changed, 311 insertions(+), 142 deletions(-)
 create mode 100644 finetune/README_ja.md
 create mode 100644 resources/contribute_ja.md

diff --git a/finetune/README.md b/finetune/README.md
index caad58f..d3f0204 100644
--- a/finetune/README.md
+++ b/finetune/README.md
@@ -1,20 +1,34 @@
 # CogVideoX diffusers Fine-tuning Guide
 
-If you want to see the SAT version fine-tuning, please check [here](../sat/README.md). The dataset format is different
-from this version.
+[中文阅读](./README_zh.md)
 
-This tutorial aims to quickly fine-tune the diffusers version of the CogVideoX model.
+[日本語で読む](./README_ja.md)
 
-### Hardware Requirements
+This feature is not fully complete yet. If you want to check the fine-tuning for the SAT version, please
+see [here](../sat/README_zh.md). The dataset format is different from this version.
 
-+ CogVideoX-2B LORA: 1 * A100
+## Hardware Requirements
+
++ CogVideoX-2B LoRA: 1 * A100
 + CogVideoX-2B SFT:  8 * A100
-+ CogVideoX-5B/5B-I2V not yet supported
++ CogVideoX-5B/5B-I2V is not supported yet.
 
-### Prepare the Dataset
+## Install Dependencies
 
-First, you need to prepare the dataset. The format of the dataset is as follows, where `videos.txt` contains paths to
-the videos in the `videos` directory.
+Since the related code has not been merged into the diffusers release, you need to base your fine-tuning on the
+diffusers branch. Please follow the steps below to install dependencies:
+
+```shell
+git clone https://github.com/huggingface/diffusers.git
+cd diffusers
+git checkout cogvideox-lora-and-training
+pip install -e .
+```
+
+## Prepare the Dataset
+
+First, you need to prepare the dataset. The dataset format should be as follows, with `videos.txt` containing the list
+of videos in the `videos` directory:
 
 ```
 .
@@ -23,152 +37,89 @@ the videos in the `videos` directory.
 └── videos.txt
 ```
 
-You can download [Disney Steamboat Willie](https://huggingface.co/datasets/Wild-Heart/Disney-VideoGeneration-Dataset)
-from here.
+You can download
+the [Disney Steamboat Willie](https://huggingface.co/datasets/Wild-Heart/Disney-VideoGeneration-Dataset) dataset from
+here.
 
-The video fine-tuning dataset is used as a test for fine-tuning.
+This video fine-tuning dataset is used as a test for fine-tuning.
 
-### Configuration Files and Execution
+## Configuration Files and Execution
 
-`accelerate` configuration files are as follows:
+The `accelerate` configuration files are as follows:
 
-+ accelerate_config_machine_multi.yaml for multi-GPU use
-+ accelerate_config_machine_single.yaml for single-GPU use
++ `accelerate_config_machine_multi.yaml`: Suitable for multi-GPU use
++ `accelerate_config_machine_single.yaml`: Suitable for single-GPU use
 
-The `finetune` script configuration is as follows:
+The configuration for the `finetune` script is as follows:
 
 ```shell
-export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
+export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True  
+# This command sets the PyTorch CUDA memory allocation strategy to expandable segments to prevent OOM (Out of Memory) errors.
 
-# This command sets PyTorch's CUDA memory allocation strategy to segment-based memory management to prevent OOM (Out of Memory) errors.
+accelerate launch --config_file accelerate_config_machine_single.yaml --multi_gpu # Launch training using Accelerate with the specified config file for multi-GPU.
 
-accelerate launch --config_file accelerate_config_machine_single.yaml --multi_gpu \
+  train_cogvideox_lora.py   # This is the training script for LoRA fine-tuning of the CogVideoX model.
 
-# Use Accelerate to start training, specifying the `accelerate_config_machine_single.yaml` configuration file, and using multiple GPUs.
+  --pretrained_model_name_or_path THUDM/CogVideoX-2b   # Path to the pretrained model you want to fine-tune, pointing to the CogVideoX-2b model.
 
-train_cogvideox_lora.py \
+  --cache_dir ~/.cache   # Directory for caching models downloaded from Hugging Face.
 
-# This is the training script you will execute for LoRA fine-tuning of the CogVideoX model.
+  --enable_tiling   # Enable VAE tiling to reduce memory usage by processing images in smaller chunks.
 
---pretrained_model_name_or_path THUDM/CogVideoX-2b \
+  --enable_slicing   # Enable VAE slicing to split the image into slices along the channel to save memory.
 
-# The path to the pretrained model, pointing to the CogVideoX-5b model you want to fine-tune.
+  --instance_data_root ~/disney/   # Root directory for instance data, i.e., the dataset used for training.
 
---cache_dir ~/.cache \
+  --caption_column prompts.txt   # Specify the column or file containing instance prompts (text descriptions), in this case, the `prompts.txt` file.
 
-# The directory where downloaded models and datasets will be stored.
+  --video_column videos.txt   # Specify the column or file containing video paths, in this case, the `videos.txt` file.
 
---enable_tiling \
+  --validation_prompt "Mickey with the captain and friends:::Mickey and the bear"   # Validation prompts; multiple prompts are separated by the specified delimiter (e.g., `:::`).
 
-# Enable VAE tiling functionality, which reduces memory usage by processing smaller blocks of the image.
+  --validation_prompt_separator :::   # The separator for validation prompts, set to `:::` here.
 
---enable_slicing \
+  --num_validation_videos 1   # Number of videos to generate during validation, set to 1.
 
-# Enable VAE slicing functionality, which slices the image across channels to save memory.
+  --validation_epochs 2   # Number of epochs after which validation will be run, set to every 2 epochs.
 
---instance_data_root ~/disney/ \
+  --seed 3407   # Set a random seed to ensure reproducibility, set to 3407.
 
-# The root directory of the instance data, the folder of the dataset used during training.
+  --rank 128   # Dimension of the LoRA update matrix, controls the size of the LoRA layers, set to 128.
 
---caption_column prompts.txt \
+  --mixed_precision bf16   # Use mixed precision training, set to `bf16` (bfloat16) to reduce memory usage and speed up training.
 
-# Specifies the column or file containing instance prompts (text descriptions), in this case, the `prompts.txt` file.
+  --output_dir cogvideox-lora-single-gpu   # Output directory for storing model predictions and checkpoints.
 
---video_column videos.txt \
+  --height 480   # Height of the input videos, all videos will be resized to 480 pixels.
 
-# Specifies the column or file containing paths to videos, in this case, the `videos.txt` file.
+  --width 720   # Width of the input videos, all videos will be resized to 720 pixels.
 
---validation_prompt "Mickey with the captain and friends:::Mickey and the bear" \
+  --fps 8   # Frame rate of the input videos, all videos will be processed at 8 frames per second.
 
-# The prompt(s) used for validation, multiple prompts should be separated by the specified delimiter (`:::`).
+  --max_num_frames 49   # Maximum number of frames per input video, videos will be truncated to 49 frames.
 
---validation_prompt_separator ::: \
+  --skip_frames_start 0   # Number of frames to skip from the start of each video, set to 0 to not skip any frames.
 
-# The delimiter for validation prompts, set here as `:::`.
+  --skip_frames_end 0   # Number of frames to skip from the end of each video, set to 0 to not skip any frames.
 
---num_validation_videos 1 \
+  --train_batch_size 1   # Training batch size per device, set to 1.
 
-# The number of videos to be generated during validation, set to 1.
+  --num_train_epochs 10   # Total number of training epochs, set to 10.
 
---validation_epochs 2 \
+  --checkpointing_steps 500   # Save checkpoints every 500 steps.
 
-# How many epochs to run validation, set to validate every 2 epochs.
+  --gradient_accumulation_steps 1   # Gradient accumulation steps, perform an update every 1 step.
 
---seed 3407 \
+  --learning_rate 1e-4   # Initial learning rate, set to 1e-4.
 
-# Sets the random seed for reproducible training, set to 3407.
+  --optimizer AdamW   # Optimizer type, using AdamW optimizer.
 
---rank 128 \
-
-# The dimension of the LoRA update matrices, controlling the size of the LoRA layer parameters, set to 128.
-
---mixed_precision bf16 \
-
-# Use mixed precision training, set to `bf16` (bfloat16), which can reduce memory usage and speed up training.
-
---output_dir cogvideox-lora-single-gpu \
-
-# Output directory, where model predictions and checkpoints will be stored.
-
---height 480 \
-
-# The height of input videos, all videos will be resized to 480 pixels.
-
---width 720 \
-
-# The width of input videos, all videos will be resized to 720 pixels.
-
---fps 8 \
-
-# The frame rate of input videos, all videos will be processed at 8 frames per second.
-
---max_num_frames 49 \
-
-# The maximum number of frames for input videos, videos will be truncated to a maximum of 49 frames.
-
---skip_frames_start 0 \
-
-# The number of frames to skip at the beginning of each video, set to 0, indicating no frames are skipped.
-
---skip_frames_end 0 \
-
-# The number of frames to skip at the end of each video, set to 0, indicating no frames are skipped.
-
---train_batch_size 1 \
-
-# The batch size for training, set to 1 per device.
-
---num_train_epochs 10 \
-
-# The total number of epochs for training, set to 10.
-
---checkpointing_steps 500 \
-
-# Save a checkpoint every 500 steps.
-
---gradient_accumulation_steps 1 \
-
-# The number of gradient accumulation steps, indicating that a gradient update is performed every 1 step.
-
---learning_rate 1e-4 \
-
-# The initial learning rate, set to 1e-4.
-
---optimizer AdamW \
-
-# The type of optimizer, choosing AdamW.
-
---adam_beta1 0.9 \
-
-# The beta1 parameter for the Adam optimizer, set to 0.9.
-
---adam_beta2 0.95 \
-
-# The beta2 parameter for the Adam optimizer, set to 0.95.
+  --adam_beta1 0.9   # Beta1 parameter for the Adam optimizer, set to 0.9.
 
+  --adam_beta2 0.95   # Beta2 parameter for the Adam optimizer, set to 0.95.
 ```
 
-### Run the script to start fine-tuning
+## Running the Script to Start Fine-tuning
 
 Single GPU fine-tuning:
 
@@ -179,19 +130,23 @@ bash finetune_single_gpu.sh
 Multi-GPU fine-tuning:
 
 ```shell
-bash finetune_multi_gpus_1.sh # needs to be run on each node
+bash finetune_multi_gpus_1.sh # Needs to be run on each node
 ```
 
-### Best Practices
+## Loading the Fine-tuned Model
 
-+ Include 70 videos with a resolution of `200 x 480 x 720` (frames x height x width). Through data preprocessing's frame
-  skipping, we created two smaller datasets of 49 and 16 frames to speed up experiments, as the CogVideoX team suggests
-  a maximum frame count of 49. We divided the 70 videos into three groups of 10, 25, and 50 videos. These videos are
-  conceptually similar.
-+ 25 or more videos work best when training new concepts and styles.
-+ Now using an identifier token specified through `--id_token` enhances training results. This is similar to Dreambooth
-  training, but regular fine-tuning without this token also works.
-+ The original repository uses `lora_alpha` set to 1. We found this value to be ineffective in multiple runs, likely due
-  to differences in model backend and training setups. Our recommendation is to set lora_alpha to the same as rank or
-  rank // 2.
-+ Using settings with a rank of 64 or above is recommended.
\ No newline at end of file
++ Please refer to [cli_demo.py](../inference/cli_demo.py) for how to load the fine-tuned model.
+
+## Best Practices
+
++ Includes 70 training videos with a resolution of `200 x 480 x 720` (frames x height x width). By skipping frames in
+  the data preprocessing, we created two smaller datasets with 49 and 16 frames to speed up experimentation, as the
+  maximum frame limit recommended by the CogVideoX team is 49 frames. We split the 70 videos into three groups of 10,
+  25, and 50 videos, with similar conceptual nature.
++ Using 25 or more videos works best when training new concepts and styles.
++ It works better to train using identifier tokens specified with `--id_token`. This is similar to Dreambooth training,
+  but regular fine-tuning without such tokens also works.
++ The original repository used `lora_alpha` set to 1. We found this value ineffective across multiple runs, likely due
+  to differences in the backend and training setup. Our recommendation is to set `lora_alpha` equal to rank or rank //
+  2.
++ We recommend using a rank of 64 or higher.
diff --git a/finetune/README_ja.md b/finetune/README_ja.md
new file mode 100644
index 0000000..1c0a021
--- /dev/null
+++ b/finetune/README_ja.md
@@ -0,0 +1,142 @@
+# CogVideoX diffusers 微調整方法
+
+[Read this in English.](./README_zh)
+
+[中文阅读](./README_zh.md)
+
+
+この機能はまだ完全に完成していません。SATバージョンの微調整を確認したい場合は、[こちら](../sat/README_ja.md)を参照してください。本バージョンとは異なるデータセット形式を使用しています。
+
+## ハードウェア要件
+
++ CogVideoX-2B LORA: 1 * A100
++ CogVideoX-2B SFT:  8 * A100
++ CogVideoX-5B/5B-I2V まだサポートしていません
+
+## 依存関係のインストール
+
+関連コードはまだdiffusersのリリース版に統合されていないため、diffusersブランチを使用して微調整を行う必要があります。以下の手順に従って依存関係をインストールしてください：
+
+```shell
+git clone https://github.com/huggingface/diffusers.git
+cd diffusers
+git checkout cogvideox-lora-and-training
+pip install -e .
+```
+
+## データセットの準備
+
+まず、データセットを準備する必要があります。データセットの形式は以下のようになります。
+
+```
+.
+├── prompts.txt
+├── videos
+└── videos.txt
+```
+
+[ディズニースチームボートウィリー](https://huggingface.co/datasets/Wild-Heart/Disney-VideoGeneration-Dataset)をここからダウンロードできます。
+
+ビデオ微調整データセットはテスト用として使用されます。
+
+## 設定ファイルと実行
+
+`accelerate` 設定ファイルは以下の通りです:
+
++ accelerate_config_machine_multi.yaml 複数GPU向け
++ accelerate_config_machine_single.yaml 単一GPU向け
+
+`finetune` スクリプト設定ファイルの例：
+
+```shell
+export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True  
+# このコマンドは、OOM（メモリ不足）エラーを防ぐために、CUDAメモリ割り当てを拡張セグメントに設定します。
+
+accelerate launch --config_file accelerate_config_machine_single.yaml --multi_gpu # 複数のGPUで `accelerate` を使用してトレーニングを開始します。指定された設定ファイルを使用します。
+
+  train_cogvideox_lora.py   # LoRA微調整用に CogVideoX モデルをトレーニングするスクリプトです。
+
+  --pretrained_model_name_or_path THUDM/CogVideoX-2b   # 事前学習済みモデルのパスです。
+
+  --cache_dir ~/.cache   # Hugging Faceからダウンロードされたモデルとデータセットのキャッシュディレクトリです。
+
+  --enable_tiling   # VAEタイル化機能を有効にし、メモリ使用量を削減します。
+
+  --enable_slicing   # VAEスライス機能を有効にして、チャネルでのスライス処理を行い、メモリを節約します。
+
+  --instance_data_root ~/disney/   # インスタンスデータのルートディレクトリです。
+
+  --caption_column prompts.txt   # テキストプロンプトが含まれているファイルや列を指定します。
+
+  --video_column videos.txt   # ビデオパスが含まれているファイルや列を指定します。
+
+  --validation_prompt "Mickey with the captain and friends:::Mickey and the bear"   # 検証用のプロンプトを指定します。複数のプロンプトを指定するには `:::` 区切り文字を使用します。
+
+  --validation_prompt_separator :::   # 検証プロンプトの区切り文字を `:::` に設定します。
+
+  --num_validation_videos 1   # 検証中に生成するビデオの数を1に設定します。
+
+  --validation_epochs 2   # 何エポックごとに検証を行うかを2に設定します。
+
+  --seed 3407   # ランダムシードを3407に設定し、トレーニングの再現性を確保します。
+
+  --rank 128   # LoRAの更新マトリックスの次元を128に設定します。
+
+  --mixed_precision bf16   # 混合精度トレーニングを `bf16` (bfloat16) に設定します。
+
+  --output_dir cogvideox-lora-single-gpu   # 出力ディレクトリを指定します。
+
+  --height 480   # 入力ビデオの高さを480ピクセルに設定します。
+
+  --width 720   # 入力ビデオの幅を720ピクセルに設定します。
+
+  --fps 8   # 入力ビデオのフレームレートを8 fpsに設定します。
+
+  --max_num_frames 49   # 入力ビデオの最大フレーム数を49に設定します。
+
+  --skip_frames_start 0   # 各ビデオの最初のフレームをスキップしません。
+
+  --skip_frames_end 0   # 各ビデオの最後のフレームをスキップしません。
+
+  --train_batch_size 1   # トレーニングバッチサイズを1に設定します。
+
+  --num_train_epochs 10   # トレーニングのエポック数を10に設定します。
+
+  --checkpointing_steps 500   # 500ステップごとにチェックポイントを保存します。
+
+  --gradient_accumulation_steps 1   # 1ステップごとに勾配を蓄積して更新します。
+
+  --learning_rate 1e-4   # 初期学習率を1e-4に設定します。
+
+  --optimizer AdamW   # AdamWオプティマイザーを使用します。
+
+  --adam_beta1 0.9   # Adamのbeta1パラメータを0.9に設定します。
+
+  --adam_beta2 0.95   # Adamのbeta2パラメータを0.95に設定します。
+```
+
+## 微調整を開始
+
+単一GPU微調整：
+
+```shell
+bash finetune_single_gpu.sh
+```
+
+複数GPU微調整：
+
+```shell
+bash finetune_multi_gpus_1.sh # 各ノードで実行する必要があります。
+```
+
+## 微調整済みモデルのロード
+
++ 微調整済みのモデルをロードする方法については、[cli_demo.py](../inference/cli_demo.py) を参照してください。
+
+## ベストプラクティス
+
++ 解像度が `200 x 480 x 720`（フレーム数 x 高さ x 幅）のトレーニングビデオが70本含まれています。データ前処理でフレームをスキップすることで、49フレームと16フレームの小さなデータセットを作成しました。これは実験を加速するためのもので、CogVideoXチームが推奨する最大フレーム数制限は49フレームです。
++ 25本以上のビデオが新しい概念やスタイルのトレーニングに最適です。
++ 現在、`--id_token` を指定して識別トークンを使用してトレーニングする方が効果的です。これはDreamboothトレーニングに似ていますが、通常の微調整でも機能します。
++ 元のリポジトリでは `lora_alpha` を1に設定していましたが、複数の実行でこの値が効果的でないことがわかりました。モデルのバックエンドやトレーニング設定によるかもしれません。私たちの提案は、lora_alphaをrankと同じか、rank // 2に設定することです。
++ Rank 64以上の設定を推奨します。
diff --git a/finetune/README_zh.md b/finetune/README_zh.md
index 195bc30..3385e9e 100644
--- a/finetune/README_zh.md
+++ b/finetune/README_zh.md
@@ -1,16 +1,29 @@
 # CogVideoX diffusers 微调方案
 
-如果您想查看SAT版本微调，请查看[这里](../sat/README_zh.md)。其数据集格式与本版本不同。
+[Read this in English](./README_zh.md)
 
-本教程旨在快速微调 diffusers 版本 CogVideoX 模型。
+[日本語で読む](./README_ja.md)
 
-### 硬件要求
+本功能尚未完全完善，如果您想查看SAT版本微调，请查看[这里](../sat/README_zh.md)。其数据集格式与本版本不同。
+
+## 硬件要求
 
 + CogVideoX-2B LORA: 1 * A100
 + CogVideoX-2B SFT:  8 * A100
 + CogVideoX-5B/5B-I2V 暂未支持
 
-### 准备数据集
+## 安装依赖
+
+由于相关代码还没有被合并到diffusers发行版，你需要基于diffusers分支进行微调。请按照以下步骤安装依赖：
+
+```shell
+git clone https://github.com/huggingface/diffusers.git
+cd diffusers
+git checkout cogvideox-lora-and-training
+pip install -e .
+```
+
+## 准备数据集
 
 首先，你需要准备数据集，数据集格式如下，其中，videos.txt 存放 videos 中的视频。
 
@@ -25,7 +38,7 @@
 
 视频微调数据集作为测试微调。
 
-### 配置文件和运行
+## 配置文件和运行
 
 `accelerate` 配置文件如下:
 
@@ -132,7 +145,7 @@ accelerate launch --config_file accelerate_config_machine_single.yaml --multi_gp
   # Adam 优化器的 beta2 参数，设置为 0.95。
 ```
 
-### 运行脚本，开始微调
+## 运行脚本，开始微调
 
 单卡微调：
 
@@ -146,7 +159,11 @@ bash finetune_single_gpu.sh
 bash finetune_multi_gpus_1.sh #需要在每个节点运行
 ```
 
-### 最佳实践
+## 载入微调的模型
+
++ 请关注[cli_demo.py](../inference/cli_demo.py) 以了解如何加载微调的模型。
+
+## 最佳实践
 
 + 包含70个分辨率为 `200 x 480 x 720`（帧数 x 高 x
   宽）的训练视频。通过数据预处理中的帧跳过，我们创建了两个较小的49帧和16帧数据集，以加快实验速度，因为CogVideoX团队建议的最大帧数限制是49帧。我们将70个视频分成三组，分别为10、25和50个视频。这些视频的概念性质相似。
@@ -156,6 +173,3 @@ bash finetune_multi_gpus_1.sh #需要在每个节点运行
   lora_alpha 设置为与 rank 相同或 rank // 2。
 + 建议使用 rank 为 64 及以上的设置。
 
-
-
-
diff --git a/finetune/hostfile.txt b/finetune/hostfile.txt
index 0dc339a..d0b8045 100644
--- a/finetune/hostfile.txt
+++ b/finetune/hostfile.txt
@@ -1,4 +1,2 @@
 node1 slots=8
-node2 slots=8
-node3 slots=8
-node4 slots=8
\ No newline at end of file
+node2 slots=8
\ No newline at end of file
diff --git a/inference/cli_demo.py b/inference/cli_demo.py
index 4b1e1d1..9f5263e 100644
--- a/inference/cli_demo.py
+++ b/inference/cli_demo.py
@@ -35,6 +35,8 @@ from diffusers.utils import export_to_video, load_image, load_video
 def generate_video(
     prompt: str,
     model_path: str,
+    lora_path: str = None,
+    lora_rank: int = 128,
     output_path: str = "./output.mp4",
     image_or_video_path: str = "",
     num_inference_steps: int = 50,
@@ -50,6 +52,8 @@ def generate_video(
     Parameters:
     - prompt (str): The description of the video to be generated.
     - model_path (str): The path of the pre-trained model to be used.
+    - lora_path (str): The path of the LoRA weights to be used.
+    - lora_rank (int): The rank of the LoRA weights.
     - output_path (str): The path where the generated video will be saved.
     - num_inference_steps (int): Number of steps for the inference process. More steps can result in better quality.
     - guidance_scale (float): The scale for classifier-free guidance. Higher values can lead to better alignment with the prompt.
@@ -75,6 +79,11 @@ def generate_video(
         pipe = CogVideoXVideoToVideoPipeline.from_pretrained(model_path, torch_dtype=dtype)
         video = load_video(image_or_video_path)
 
+    # If you're using with lora, add this code
+    if lora_path:
+        pipe.load_lora_weights(lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="test_1")
+        pipe.fuse_lora(lora_scale=1 / lora_rank)
+
     # 2. Set Scheduler.
     # Can be changed to `CogVideoXDPMScheduler` or `CogVideoXDDIMScheduler`.
     # We recommend using `CogVideoXDDIMScheduler` for CogVideoX-2B.
@@ -145,6 +154,8 @@ if __name__ == "__main__":
     parser.add_argument(
         "--model_path", type=str, default="THUDM/CogVideoX-5b", help="The path of the pre-trained model to be used"
     )
+    parser.add_argument("--lora_path", type=str, default=None, help="The path of the LoRA weights to be used")
+    parser.add_argument("--lora_rank", type=int, default=128, help="The rank of the LoRA weights")
     parser.add_argument(
         "--output_path", type=str, default="./output.mp4", help="The path where the generated video will be saved"
     )
@@ -166,8 +177,10 @@ if __name__ == "__main__":
     generate_video(
         prompt=args.prompt,
         model_path=args.model_path,
-        image_or_video_path=args.image_or_video_path,
+        lora_path=args.lora_path,
+        lora_rank=args.lora_rank,
         output_path=args.output_path,
+        image_or_video_path=args.image_or_video_path,
         num_inference_steps=args.num_inference_steps,
         guidance_scale=args.guidance_scale,
         num_videos_per_prompt=args.num_videos_per_prompt,
diff --git a/resources/contribute_ja.md b/resources/contribute_ja.md
new file mode 100644
index 0000000..80ddc27
--- /dev/null
+++ b/resources/contribute_ja.md
@@ -0,0 +1,47 @@
+# コントリビューションガイド
+
+本プロジェクトにはまだ多くの未完成の部分があります。
+
+以下の分野でリポジトリへの貢献をお待ちしています。上記の作業を完了し、PRを提出してコミュニティと共有する意志がある場合、レビュー後、プロジェクトのホームページで貢献を認識します。
+
+## モデルアルゴリズム
+
+- モデル量子化推論のサポート (Int4量子化プロジェクト)
+- モデルのファインチューニングデータロードの最適化（既存のdecordツールの置き換え）
+
+## モデルエンジニアリング
+
+- モデルのファインチューニング例 / 最適なプロンプトの実践
+- 異なるデバイスでの推論適応（例: MLXフレームワーク）
+- モデルに関連するツール
+- CogVideoXオープンソースモデルを使用した、完全にオープンソースの最小プロジェクト
+
+## コード標準
+
+良いコードスタイルは一種の芸術です。本プロジェクトにはコードスタイルを標準化するための `pyproject.toml`
+設定ファイルを用意しています。以下の仕様に従ってコードを整理してください。
+
+1. `ruff` ツールをインストールする
+
+```shell
+pip install ruff
+```
+
+次に、`ruff` ツールを実行します
+
+```shell
+ruff check tools sat inference
+```
+
+コードスタイルを確認します。問題がある場合は、`ruff format` コマンドを使用して自動修正できます。
+
+```shell
+ruff format tools sat inference
+```
+
+コードが標準に準拠したら、エラーはなくなるはずです。
+
+## 命名規則
+
+1. 英語名を使用してください。ピンインや他の言語の名前を使用しないでください。すべてのコメントは英語で記載してください。
+2. PEP8仕様に厳密に従い、単語をアンダースコアで区切ってください。a、b、cのような名前は使用しないでください。