mirror of
https://github.com/THUDM/CogVideo.git
synced 2025-04-05 19:41:59 +08:00
update readme
This commit is contained in:
parent
3f2568d36f
commit
031a8b18cb
24
README.md
24
README.md
@ -28,6 +28,22 @@
|
||||
|
||||
**More powerful models with larger parameter sizes are on the way~ Stay tuned!**
|
||||
|
||||
## Quick Start
|
||||
|
||||
### SAT
|
||||
|
||||
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
|
||||
(18 GB for inference, 40GB for lora finetune)
|
||||
|
||||
### Diffusers
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Then follow [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
|
||||
(36GB for inference, smaller memory and fine-tuned code are under development)
|
||||
|
||||
## CogVideoX-2B Gallery
|
||||
|
||||
<div align="center">
|
||||
@ -79,8 +95,8 @@ of the **CogVideoX** open-source model.
|
||||
|
||||
### Inference
|
||||
|
||||
+ [cli_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
|
||||
+ [cli_vae_demo](inference/cli_vae_demo.py): Executing the VAE inference code alone currently requires 71GB of memory, but it will be optimized in the future.
|
||||
+ [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
|
||||
+ [diffusers_vae_demo](inference/cli_vae_demo.py): Executing the VAE inference code alone currently requires 71GB of memory, but it will be optimized in the future.
|
||||
+ [convert_demo](inference/convert_demo.py): How to convert user input into a format suitable for CogVideoX. Because CogVideoX is trained on long caption, we need to convert the input text to be consistent with the training distribution using a LLM. By default, the script uses GLM4, but it can also be replaced with any other LLM such as GPT, Gemini, etc.
|
||||
+ [gradio_demo](gradio_demo.py): A simple gradio web UI demonstrating how to use the CogVideoX-2B model to generate videos.
|
||||
|
||||
@ -96,9 +112,7 @@ of the **CogVideoX** open-source model.
|
||||
|
||||
### sat
|
||||
|
||||
+ [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
|
||||
recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform
|
||||
rapid stacking and development.
|
||||
+ [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
|
||||
|
||||
### Tools
|
||||
|
||||
|
31
README_zh.md
31
README_zh.md
@ -26,6 +26,37 @@
|
||||
- 🌱 **Source**: ```2022/5/19```: 我们开源了 CogVideo 视频生成模型(现在你可以在 `CogVideo` 分支中看到),这是首个开源的基于 Transformer 的大型文本生成视频模型,您可以访问 [ICLR'23 论文](https://arxiv.org/abs/2205.15868) 查看技术细节。
|
||||
**性能更强,参数量更大的模型正在到来的路上~,欢迎关注**
|
||||
|
||||
## Quick Start
|
||||
|
||||
### SAT
|
||||
|
||||
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
|
||||
(18 GB for inference, 40GB for lora finetune)
|
||||
|
||||
### Diffusers
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Then follow [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
|
||||
(36GB for inference, smaller memory and fine-tuned code are under development)
|
||||
|
||||
## 快速开始
|
||||
|
||||
### SAT
|
||||
|
||||
查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于 CogVideoX 模型结构进行改进,创新的研究者使用改代码以更好的进行快速的堆叠和开发。
|
||||
(18 GB 推理, 40GB lora微调)
|
||||
|
||||
### Diffusers
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
查看[diffusers_demo](inference/cli_demo.py):包含对推理代码更详细的解释,包括各种关键的参数。(36GB 推理,显存优化以及微调代码正在开发)
|
||||
|
||||
## CogVideoX-2B 视频作品
|
||||
|
||||
<div align="center">
|
||||
|
@ -1,4 +1,5 @@
|
||||
git+https://github.com/huggingface/diffusers.git@d1c575ad7ee0390c2735f50cc59a79aae666567a#egg=diffusers
|
||||
SwissArmyTransformer
|
||||
torch==2.4.0
|
||||
torchvision==0.19.0
|
||||
streamlit==1.37.0
|
||||
|
@ -1,7 +1,6 @@
|
||||
# SAT CogVideoX-2B
|
||||
|
||||
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the
|
||||
fine-tuning code for SAT weights.
|
||||
This folder contains the inference code using [SAT](https://github.com/THUDM/SwissArmyTransformer) weights and the fine-tuning code for SAT weights.
|
||||
|
||||
This code is the framework used by the team to train the model. It has few comments and requires careful study.
|
||||
|
||||
@ -100,6 +99,14 @@ bash inference.sh
|
||||
|
||||
## Fine-Tuning the Model
|
||||
|
||||
### Preparing the Environment
|
||||
|
||||
```
|
||||
git clone https://github.com/THUDM/SwissArmyTransformer.git
|
||||
cd SwissArmyTransformer
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
### Preparing the Dataset
|
||||
|
||||
The dataset format should be as follows:
|
||||
@ -145,6 +152,8 @@ the `configs/cogvideox_2b_sft.yaml` (for full fine-tuning) as follows.
|
||||
valid_data: [ "your val data path" ] # Training and validation sets can be the same
|
||||
split: 1,0,0 # Ratio of training, validation, and test sets
|
||||
num_workers: 8 # Number of worker threads for data loading
|
||||
force_train: True # Allow missing keys when loading ckpt (refer to T5 and VAE which are loaded independently)
|
||||
only_log_video_latents: True # Avoid using VAE decoder when eval to save memory
|
||||
```
|
||||
|
||||
If you wish to use Lora fine-tuning, you also need to modify:
|
||||
|
@ -99,6 +99,14 @@ bash inference.sh
|
||||
|
||||
## 微调模型
|
||||
|
||||
### 准备环境
|
||||
|
||||
```
|
||||
git clone https://github.com/THUDM/SwissArmyTransformer.git
|
||||
cd SwissArmyTransformer
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
### 准备数据集
|
||||
|
||||
数据集格式应该如下:
|
||||
@ -143,6 +151,8 @@ Encoder 使用。
|
||||
valid_data: [ "your val data path" ] # 训练集和验证集可以相同
|
||||
split: 1,0,0 # 训练集,验证集,测试集比例
|
||||
num_workers: 8 # 数据加载器的工作线程数
|
||||
force_train: True # 在加载checkpoint时允许missing keys (T5 和 VAE 单独加载)
|
||||
only_log_video_latents: True # 避免VAE decode带来的显存开销
|
||||
```
|
||||
|
||||
如果你希望使用 Lora 微调,你还需要修改:
|
||||
|
Loading…
x
Reference in New Issue
Block a user