ADD PAPER

This commit is contained in:
zR 2024-08-06 02:41:08 +08:00
parent 2da594c831
commit ad855f622c
5 changed files with 46 additions and 41 deletions

View File

@ -4,11 +4,13 @@
<div align="center">
<img src=resources/logo.svg width="50%"/>
</div>
<p align="center">
🤗 Experience on <a href="https://huggingface.co/spaces/THUDM/CogVideoX" target="_blank">CogVideoX Huggingface Space</a>
</p>
</div>
<p align="center">
📚 Check here to view <a href="resources/CogVideoX.pdf" target="_blank">Paper</a>
</p>
<p align="center">
👋 Join our <a href="resources/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/Ewaabk6s" target="_blank">Discord</a>
</p>
@ -55,18 +57,18 @@ to [清影](https://chatglm.cn/video).
The table below shows the list of video generation models we currently provide,
along with related basic information:
| Model Name | CogVideoX-2B |
|-------------------------------------------|--------------------------------------------------------------|
| Prompt Language | English |
| GPU Memory Required for Inference (FP16) | 36GB (will be optimized before the PR is merged) |
| GPU Memory Required for Fine-tuning(bs=1) | 46.2GB |
| Prompt Max Length | 226 Tokens |
| Video Length | 6 seconds |
| Frames Per Second | 8 frames |
| Resolution | 720 * 480 |
| Quantized Inference | Not Supported |
| Multi-card Inference | Not Supported |
| Download Link | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
| Model Name | CogVideoX-2B |
|-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| Prompt Language | English |
| GPU Memory Required for Inference (FP16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) |
| GPU Memory Required for Fine-tuning(bs=1) | 42GB |
| Prompt Max Length | 226 Tokens |
| Video Length | 6 seconds |
| Frames Per Second | 8 frames |
| Resolution | 720 * 480 |
| Quantized Inference | Not Supported |
| Multi-card Inference | Not Supported |
| Download Link | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
## Project Structure
@ -89,7 +91,7 @@ of the **CogVideoX** open-source model.
### sat
+ [sat_demo](sat/configs/README_zh.md): Contains the inference code and fine-tuning code of SAT weights. It is
+ [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform
rapid stacking and development.

View File

@ -5,11 +5,13 @@
<div align="center">
<img src=resources/logo.svg width="50%"/>
</div>
<p align="center">
🤗 在 <a href="https://huggingface.co/spaces/THUDM/CogVideoX" target="_blank">CogVideoX Huggingface Space</a> 体验视频生成模型
</p>
</div>
<p align="center">
📚 查看 <a href="resources/CogVideoX.pdf" target="_blank">论文</a>
</p>
<p align="center">
👋 加入我们的 <a href="resources/WECHAT.md" target="_blank">微信</a><a href="https://discord.gg/Ewaabk6s" target="_blank">Discord</a>
</p>
@ -52,18 +54,18 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
下表战展示目前我们提供的视频生成模型列表,以及相关基础信息:
| 模型名字 | CogVideoX-2B |
|----------------|--------------------------------------------------------------|
| 提示词语言 | English |
| 推理显存消耗 (FP-16) | 36GB |
| 微调显存消耗 (bs=1) | 46.2GB |
| 提示词长度上限 | 226 Tokens |
| 视频长度 | 6 seconds |
| 帧率(每秒) | 8 frames |
| 视频分辨率 | 720 * 480 |
| 量化推理 | 不支持 |
| 多卡推理 | 不支持 |
| 权重地址 | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
| 模型名字 | CogVideoX-2B |
|----------------|-------------------------------------------------------------------------------------------------------------------------------------|
| 提示词语言 | English |
| 推理显存消耗 (FP-16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) |
| 微调显存消耗 (bs=1) | 42GB |
| 提示词长度上限 | 226 Tokens |
| 视频长度 | 6 seconds |
| 帧率(每秒) | 8 frames |
| 视频分辨率 | 720 * 480 |
| 量化推理 | 不支持 |
| 多卡推理 | 不支持 |
| 权重地址 | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
## 项目结构
@ -77,12 +79,12 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
+ [web_demo](inference/web_demo.py): 一个简单的streamlit网页应用展示如何使用 CogVideoX-2B 模型生成视频。
<div style="text-align: center;">
<img src="resources/web_demo.png" style="width: 100%%; height: auto;" />
<img src="resources/web_demo.png" style="width: 100%; height: auto;" />
</div>
### sat
+ [sat_demo](sat/configs/README_zh.md): 包含了 SAT 权重的推理代码和微调代码,推荐基于 CogVideoX
+ [sat_demo](sat/README_zh.md): 包含了 SAT 权重的推理代码和微调代码,推荐基于 CogVideoX
模型结构进行改进,创新的研究者使用改代码以更好的进行快速的堆叠和开发。
### tools

BIN
resources/CogVideoX.pdf Normal file

Binary file not shown.

View File

@ -5,7 +5,7 @@ args:
batch_size: 1
input_type: txt
input_file: test.txt
sampling_num_frames: 13
sampling_num_frames: 13 # Must be 11,13 or 19
sampling_fps: 8
fp16: True
output_dir: outputs/
@ -82,13 +82,13 @@ model:
target: sgm.modules.GeneralConditioner
params:
emb_models:
- is_trainable: false
input_key: txt
ucg_rate: 0.1
target: sgm.modules.encoders.modules.FrozenT5Embedder
params:
model_dir: "google/t5-v1_1-xxl"
max_length: 226
- is_trainable: false
input_key: txt
ucg_rate: 0.1
target: sgm.modules.encoders.modules.FrozenT5Embedder
params:
model_dir: "google/t5-v1_1-xxl"
max_length: 226
first_stage_config:
target: vae_modules.autoencoder.VideoAutoencoderInferenceWrapper

View File

@ -177,12 +177,13 @@ def sampling_main(args, model_cls):
latent = 1.0 / model.scale_factor * samples_z
recons = []
for i in range(6):
loop_num = (T - 1) // 2
for i in range(loop_num):
if i == 0:
start_frame, end_frame = 0, 3
else:
start_frame, end_frame = i * 2 + 1, i * 2 + 3
if i == 5:
if i == loop_num - 1:
clear_fake_cp_cache = True
else:
clear_fake_cp_cache = False