mirror of
https://github.com/THUDM/CogVideo.git
synced 2025-04-06 03:57:56 +08:00
Table of Contents
This commit is contained in:
parent
66702c6240
commit
6b2287b454
103
README.md
103
README.md
@ -24,16 +24,38 @@
|
||||
the video almost losslessly.
|
||||
- 🔥 **News**: ``2024/8/6``: We have open-sourced **CogVideoX-2B**,the first model in the CogVideoX series of video
|
||||
generation models.
|
||||
- 🌱 **Source**: ```2022/5/19```: We have open-sourced **CogVideo** (now you can see in `CogVideo` branch),the **first** open-sourced pretrained text-to-video model, and you can check [ICLR'23 CogVideo Paper](https://arxiv.org/abs/2205.15868) for technical details.
|
||||
- 🌱 **Source**: ```2022/5/19```: We have open-sourced **CogVideo** (now you can see in `CogVideo` branch),the **first**
|
||||
open-sourced pretrained text-to-video model, and you can
|
||||
check [ICLR'23 CogVideo Paper](https://arxiv.org/abs/2205.15868) for technical details.
|
||||
|
||||
**More powerful models with larger parameter sizes are on the way~ Stay tuned!**
|
||||
|
||||
## Table of Contents
|
||||
|
||||
Jump to a specific section:
|
||||
|
||||
- [Quick Start](#Quick-Start)
|
||||
- [SAT](#sat)
|
||||
- [Diffusers](#Diffusers)
|
||||
- [CogVideoX-2B Video Works](#cogvideox-2b-gallery)
|
||||
- [Introduction to the CogVideoX Model](#Model-Introduction)
|
||||
- [Full Project Structure](#project-structure)
|
||||
- [Inference](#inference)
|
||||
- [SAT](#sat)
|
||||
- [Tools](#tools)
|
||||
- [Introduction to CogVideo(ICLR'23) Model](#cogvideoiclr23)
|
||||
- [Citations](#Citation)
|
||||
- [Open Source Project Plan](#Open-Source-Project-Plan)
|
||||
- [Model License](#Model-License)
|
||||
|
||||
## Quick Start
|
||||
|
||||
### SAT
|
||||
|
||||
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
|
||||
(18 GB for inference, 40GB for lora finetune)
|
||||
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
|
||||
recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform
|
||||
rapid stacking and development.
|
||||
(18 GB for inference, 40GB for lora finetune)
|
||||
|
||||
### Diffusers
|
||||
|
||||
@ -41,8 +63,9 @@ Follow instructions in [sat_demo](sat/README.md): Contains the inference code an
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Then follow [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
|
||||
(36GB for inference, smaller memory and fine-tuned code are under development)
|
||||
Then follow [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the
|
||||
significance of common parameters.
|
||||
(36GB for inference, smaller memory and fine-tuned code are under development)
|
||||
|
||||
## CogVideoX-2B Gallery
|
||||
|
||||
@ -95,16 +118,23 @@ of the **CogVideoX** open-source model.
|
||||
|
||||
### Inference
|
||||
|
||||
+ [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
|
||||
+ [diffusers_vae_demo](inference/cli_vae_demo.py): Executing the VAE inference code alone currently requires 71GB of memory, but it will be optimized in the future.
|
||||
+ [convert_demo](inference/convert_demo.py): How to convert user input into a format suitable for CogVideoX. Because CogVideoX is trained on long caption, we need to convert the input text to be consistent with the training distribution using a LLM. By default, the script uses GLM4, but it can also be replaced with any other LLM such as GPT, Gemini, etc.
|
||||
+ [gradio_demo](gradio_demo.py): A simple gradio web UI demonstrating how to use the CogVideoX-2B model to generate videos.
|
||||
+ [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the
|
||||
significance of common parameters.
|
||||
+ [diffusers_vae_demo](inference/cli_vae_demo.py): Executing the VAE inference code alone currently requires 71GB of
|
||||
memory, but it will be optimized in the future.
|
||||
+ [convert_demo](inference/convert_demo.py): How to convert user input into a format suitable for CogVideoX. Because
|
||||
CogVideoX is trained on long caption, we need to convert the input text to be consistent with the training
|
||||
distribution using a LLM. By default, the script uses GLM4, but it can also be replaced with any other LLM such as
|
||||
GPT, Gemini, etc.
|
||||
+ [gradio_demo](gradio_demo.py): A simple gradio web UI demonstrating how to use the CogVideoX-2B model to generate
|
||||
videos.
|
||||
|
||||
<div style="text-align: center;">
|
||||
<img src="resources/gradio_demo.png" style="width: 100%; height: auto;" />
|
||||
</div>
|
||||
|
||||
+ [web_demo](inference/web_demo.py): A simple streamlit web application demonstrating how to use the CogVideoX-2B model to generate videos.
|
||||
+ [web_demo](inference/web_demo.py): A simple streamlit web application demonstrating how to use the CogVideoX-2B model
|
||||
to generate videos.
|
||||
|
||||
<div style="text-align: center;">
|
||||
<img src="resources/web_demo.png" style="width: 100%; height: auto;" />
|
||||
@ -112,7 +142,9 @@ of the **CogVideoX** open-source model.
|
||||
|
||||
### sat
|
||||
|
||||
+ [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
|
||||
+ [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to
|
||||
improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking
|
||||
and development.
|
||||
|
||||
### Tools
|
||||
|
||||
@ -121,28 +153,11 @@ This folder contains some tools for model conversion / caption generation, etc.
|
||||
+ [convert_weight_sat2hf](tools/convert_weight_sat2hf.py): Convert SAT model weights to Huggingface model weights.
|
||||
+ [caption_demo](tools/caption): Caption tool, a model that understands videos and outputs them in text.
|
||||
|
||||
## Project Plan
|
||||
|
||||
- [x] Open source CogVideoX model
|
||||
- [x] Open source 3D Causal VAE used in CogVideoX.
|
||||
- [x] CogVideoX model inference example (CLI / Web Demo)
|
||||
- [x] CogVideoX online experience demo (Huggingface Space)
|
||||
- [x] CogVideoX open source model API interface example (Huggingface)
|
||||
- [x] CogVideoX model fine-tuning example (SAT)
|
||||
- [ ] CogVideoX model fine-tuning example (Huggingface / SAT)
|
||||
- [ ] Open source CogVideoX-Pro (adapted for CogVideoX-2B suite)
|
||||
- [x] Release CogVideoX technical report
|
||||
|
||||
We welcome your contributions. You can click [here](resources/contribute.md) for more information.
|
||||
|
||||
## Model License
|
||||
|
||||
The code in this repository is released under the [Apache 2.0 License](LICENSE).
|
||||
|
||||
The model weights and implementation code are released under the [CogVideoX LICENSE](MODEL_LICENSE).
|
||||
|
||||
## CogVideo(ICLR'23)
|
||||
The official repo for the paper: [CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers](https://arxiv.org/abs/2205.15868) is on the [CogVideo branch](https://github.com/THUDM/CogVideo/tree/CogVideo)
|
||||
|
||||
The official repo for the
|
||||
paper: [CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers](https://arxiv.org/abs/2205.15868)
|
||||
is on the [CogVideo branch](https://github.com/THUDM/CogVideo/tree/CogVideo)
|
||||
|
||||
**CogVideo is able to generate relatively high-frame-rate videos.**
|
||||
A 4-second clip of 32 frames is shown below.
|
||||
@ -155,8 +170,8 @@ A 4-second clip of 32 frames is shown below.
|
||||
</div>
|
||||
|
||||
|
||||
The demo for CogVideo is at [https://models.aminer.cn/cogvideo](https://models.aminer.cn/cogvideo/), where you can get hands-on practice on text-to-video generation. *The original input is in Chinese.*
|
||||
|
||||
The demo for CogVideo is at [https://models.aminer.cn/cogvideo](https://models.aminer.cn/cogvideo/), where you can get
|
||||
hands-on practice on text-to-video generation. *The original input is in Chinese.*
|
||||
|
||||
## Citation
|
||||
|
||||
@ -175,3 +190,23 @@ The demo for CogVideo is at [https://models.aminer.cn/cogvideo](https://models.a
|
||||
year={2022}
|
||||
}
|
||||
```
|
||||
|
||||
## Open Source Project Plan
|
||||
|
||||
- [x] Open source CogVideoX model
|
||||
- [x] Open source 3D Causal VAE used in CogVideoX.
|
||||
- [x] CogVideoX model inference example (CLI / Web Demo)
|
||||
- [x] CogVideoX online experience demo (Huggingface Space)
|
||||
- [x] CogVideoX open source model API interface example (Huggingface)
|
||||
- [x] CogVideoX model fine-tuning example (SAT)
|
||||
- [ ] CogVideoX model fine-tuning example (Huggingface / SAT)
|
||||
- [ ] Open source CogVideoX-Pro (adapted for CogVideoX-2B suite)
|
||||
- [x] Release CogVideoX technical report
|
||||
|
||||
We welcome your contributions. You can click [here](resources/contribute.md) for more information.
|
||||
|
||||
## Model License
|
||||
|
||||
The code in this repository is released under the [Apache 2.0 License](LICENSE).
|
||||
|
||||
The model weights and implementation code are released under the [CogVideoX LICENSE](MODEL_LICENSE).
|
||||
|
56
README_zh.md
56
README_zh.md
@ -26,6 +26,23 @@
|
||||
- 🌱 **Source**: ```2022/5/19```: 我们开源了 CogVideo 视频生成模型(现在你可以在 `CogVideo` 分支中看到),这是首个开源的基于 Transformer 的大型文本生成视频模型,您可以访问 [ICLR'23 论文](https://arxiv.org/abs/2205.15868) 查看技术细节。
|
||||
**性能更强,参数量更大的模型正在到来的路上~,欢迎关注**
|
||||
|
||||
## 目录
|
||||
|
||||
跳转到指定部分:
|
||||
|
||||
- [快速开始](#快速开始)
|
||||
- [SAT](#sat)
|
||||
- [Diffusers](#Diffusers)
|
||||
- [CogVideoX-2B 视频作品](#cogvideox-2b-视频作品)
|
||||
- [CogVideoX模型介绍](#模型介绍)
|
||||
- [完整项目代码结构](#完整项目代码结构)
|
||||
- [Inference](#inference)
|
||||
- [SAT](#sat)
|
||||
- [Tools](#tools)
|
||||
- [开源项目规划](#开源项目规划)
|
||||
- [模型协议](#模型协议)
|
||||
- [CogVideo(ICLR'23)模型介绍](#cogvideoiclr23)
|
||||
- [引用](#引用)
|
||||
|
||||
## 快速开始
|
||||
|
||||
@ -84,7 +101,7 @@ CogVideoX是 [清影](https://chatglm.cn/video?fr=osm_cogvideox) 同源的开源
|
||||
| 下载地址 (Diffusers 模型) | 🤗 [Huggingface](https://huggingface.co/THUDM/CogVideoX-2B) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/CogVideoX-2b) |
|
||||
| 下载地址 (SAT 模型) | [SAT](./sat/README_zh.md) |
|
||||
|
||||
## 项目结构
|
||||
## 完整项目代码结构
|
||||
|
||||
本开源仓库将带领开发者快速上手 **CogVideoX** 开源模型的基础调用方式、微调示例。
|
||||
|
||||
@ -117,24 +134,6 @@ CogVideoX是 [清影](https://chatglm.cn/video?fr=osm_cogvideox) 同源的开源
|
||||
+ [convert_weight_sat2hf](tools/convert_weight_sat2hf.py): 将 SAT 模型权重转换为 Huggingface 模型权重。
|
||||
+ [caption_demo](tools/caption/README_zh.md): Caption 工具,对视频理解并用文字输出的模型。
|
||||
|
||||
## 项目规划
|
||||
|
||||
- [x] CogVideoX 模型开源
|
||||
- [x] CogVideoX 模型推理示例 (CLI / Web Demo)
|
||||
- [x] CogVideoX 在线体验示例 (Huggingface Space)
|
||||
- [x] CogVideoX 开源模型API接口示例 (Huggingface)
|
||||
- [x] CogVideoX 模型微调示例 (SAT)
|
||||
- [ ] CogVideoX 模型微调示例 (Huggingface / SAT)
|
||||
- [ ] CogVideoX-Pro 开源(适配 CogVideoX-2B 套件)
|
||||
- [ ] CogVideoX 技术报告公开
|
||||
|
||||
我们欢迎您的贡献,您可以点击[这里](resources/contribute_zh.md)查看更多信息。
|
||||
|
||||
## 模型协议
|
||||
|
||||
本仓库代码使用 [Apache 2.0 协议](LICENSE) 发布。
|
||||
|
||||
本模型权重和模型实现代码根据 [CogVideoX LICENSE](MODEL_LICENSE) 许可证发布。
|
||||
|
||||
## CogVideo(ICLR'23)
|
||||
[CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers](https://arxiv.org/abs/2205.15868) 的官方repo位于[CogVideo branch](https://github.com/THUDM/CogVideo/tree/CogVideo)。
|
||||
@ -169,3 +168,22 @@ CogVideo的demo网站在[https://models.aminer.cn/cogvideo](https://models.amine
|
||||
year={2022}
|
||||
}
|
||||
```
|
||||
|
||||
## 开源项目规划
|
||||
|
||||
- [x] CogVideoX 模型开源
|
||||
- [x] CogVideoX 模型推理示例 (CLI / Web Demo)
|
||||
- [x] CogVideoX 在线体验示例 (Huggingface Space)
|
||||
- [x] CogVideoX 开源模型API接口示例 (Huggingface)
|
||||
- [x] CogVideoX 模型微调示例 (SAT)
|
||||
- [ ] CogVideoX 模型微调示例 (Huggingface / SAT)
|
||||
- [ ] CogVideoX-Pro 开源(适配 CogVideoX-2B 套件)
|
||||
- [X] CogVideoX 技术报告公开
|
||||
|
||||
我们欢迎您的贡献,您可以点击[这里](resources/contribute_zh.md)查看更多信息。
|
||||
|
||||
## 模型协议
|
||||
|
||||
本仓库代码使用 [Apache 2.0 协议](LICENSE) 发布。
|
||||
|
||||
本模型权重和模型实现代码根据 [CogVideoX LICENSE](MODEL_LICENSE) 许可证发布。
|
||||
|
Loading…
x
Reference in New Issue
Block a user