update readme

This commit is contained in:
杨卓毅 2024-08-07 01:57:06 +08:00
parent 3f2568d36f
commit 031a8b18cb
5 changed files with 72 additions and 7 deletions

View File

@ -28,6 +28,22 @@
**More powerful models with larger parameter sizes are on the way~ Stay tuned!**
## Quick Start
### SAT
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
(18 GB for inference, 40GB for lora finetune)
### Diffusers
```
pip install -r requirements.txt
```
Then follow [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
(36GB for inference, smaller memory and fine-tuned code are under development)
## CogVideoX-2B Gallery
<div align="center">
@ -79,8 +95,8 @@ of the **CogVideoX** open-source model.
### Inference
+ [cli_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
+ [cli_vae_demo](inference/cli_vae_demo.py): Executing the VAE inference code alone currently requires 71GB of memory, but it will be optimized in the future.
+ [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
+ [diffusers_vae_demo](inference/cli_vae_demo.py): Executing the VAE inference code alone currently requires 71GB of memory, but it will be optimized in the future.
+ [convert_demo](inference/convert_demo.py): How to convert user input into a format suitable for CogVideoX. Because CogVideoX is trained on long caption, we need to convert the input text to be consistent with the training distribution using a LLM. By default, the script uses GLM4, but it can also be replaced with any other LLM such as GPT, Gemini, etc.
+ [gradio_demo](gradio_demo.py): A simple gradio web UI demonstrating how to use the CogVideoX-2B model to generate videos.
@ -96,9 +112,7 @@ of the **CogVideoX** open-source model.
### sat
+ [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform
rapid stacking and development.
+ [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
### Tools

View File

@ -26,6 +26,37 @@
- 🌱 **Source**: ```2022/5/19```: 我们开源了 CogVideo 视频生成模型(现在你可以在 `CogVideo` 分支中看到),这是首个开源的基于 Transformer 的大型文本生成视频模型,您可以访问 [ICLR'23 论文](https://arxiv.org/abs/2205.15868) 查看技术细节。
**性能更强,参数量更大的模型正在到来的路上~,欢迎关注**
## Quick Start
### SAT
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
(18 GB for inference, 40GB for lora finetune)
### Diffusers
```
pip install -r requirements.txt
```
Then follow [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
(36GB for inference, smaller memory and fine-tuned code are under development)
## 快速开始
### SAT
查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于 CogVideoX 模型结构进行改进,创新的研究者使用改代码以更好的进行快速的堆叠和开发。
(18 GB 推理, 40GB lora微调)
### Diffusers
```
pip install -r requirements.txt
```
查看[diffusers_demo](inference/cli_demo.py)包含对推理代码更详细的解释包括各种关键的参数。36GB 推理,显存优化以及微调代码正在开发)
## CogVideoX-2B 视频作品
<div align="center">

View File

@ -1,4 +1,5 @@
git+https://github.com/huggingface/diffusers.git@d1c575ad7ee0390c2735f50cc59a79aae666567a#egg=diffusers
SwissArmyTransformer
torch==2.4.0
torchvision==0.19.0
streamlit==1.37.0

View File

@ -1,7 +1,6 @@
# SAT CogVideoX-2B
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the
fine-tuning code for SAT weights.
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the fine-tuning code for SAT weights.
This code is the framework used by the team to train the model. It has few comments and requires careful study.
@ -100,6 +99,14 @@ bash inference.sh
## Fine-Tuning the Model
### Preparing the Environment
```
git clone https://github.com/THUDM/SwissArmyTransformer.git
cd SwissArmyTransformer
pip install -e .
```
### Preparing the Dataset
The dataset format should be as follows:
@ -145,6 +152,8 @@ the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows.
valid_data: [ "your val data path" ] # Training and validation sets can be the same
split: 1,0,0 # Ratio of training, validation, and test sets
num_workers: 8 # Number of worker threads for data loading
force_train: True # Allow missing keys when loading ckpt (refer to T5 and VAE which are loaded independently)
only_log_video_latents: True # Avoid using VAE decoder when eval to save memory
```
If you wish to use Lora fine-tuning, you also need to modify:

View File

@ -99,6 +99,14 @@ bash inference.sh
## 微调模型
### 准备环境
```
git clone https://github.com/THUDM/SwissArmyTransformer.git
cd SwissArmyTransformer
pip install -e .
```
### 准备数据集
数据集格式应该如下:
@ -143,6 +151,8 @@ Encoder 使用。
valid_data: [ "your val data path" ] # 训练集和验证集可以相同
split: 1,0,0 # 训练集,验证集,测试集比例
num_workers: 8 # 数据加载器的工作线程数
force_train: True # 在加载checkpoint时允许missing keys (T5 和 VAE 单独加载)
only_log_video_latents: True # 避免VAE decode带来的显存开销
```
如果你希望使用 Lora 微调,你还需要修改: