mirror of
https://github.com/THUDM/CogVideo.git
synced 2025-04-05 11:28:37 +08:00
ADD PAPER
This commit is contained in:
parent
2da594c831
commit
ad855f622c
32
README.md
32
README.md
@ -4,11 +4,13 @@
|
||||
|
||||
<div align="center">
|
||||
<img src=resources/logo.svg width="50%"/>
|
||||
</div>
|
||||
<p align="center">
|
||||
|
||||
🤗 Experience on <a href="https://huggingface.co/spaces/THUDM/CogVideoX" target="_blank">CogVideoX Huggingface Space</a>
|
||||
</p>
|
||||
</div>
|
||||
<p align="center">
|
||||
📚 Check here to view <a href="resources/CogVideoX.pdf" target="_blank">Paper</a>
|
||||
</p>
|
||||
<p align="center">
|
||||
👋 Join our <a href="resources/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/Ewaabk6s" target="_blank">Discord</a>
|
||||
</p>
|
||||
@ -55,18 +57,18 @@ to [清影](https://chatglm.cn/video).
|
||||
The table below shows the list of video generation models we currently provide,
|
||||
along with related basic information:
|
||||
|
||||
| Model Name | CogVideoX-2B |
|
||||
|-------------------------------------------|--------------------------------------------------------------|
|
||||
| Prompt Language | English |
|
||||
| GPU Memory Required for Inference (FP16) | 36GB (will be optimized before the PR is merged) |
|
||||
| GPU Memory Required for Fine-tuning(bs=1) | 46.2GB |
|
||||
| Prompt Max Length | 226 Tokens |
|
||||
| Video Length | 6 seconds |
|
||||
| Frames Per Second | 8 frames |
|
||||
| Resolution | 720 * 480 |
|
||||
| Quantized Inference | Not Supported |
|
||||
| Multi-card Inference | Not Supported |
|
||||
| Download Link | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
|
||||
| Model Name | CogVideoX-2B |
|
||||
|-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| Prompt Language | English |
|
||||
| GPU Memory Required for Inference (FP16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) |
|
||||
| GPU Memory Required for Fine-tuning(bs=1) | 42GB |
|
||||
| Prompt Max Length | 226 Tokens |
|
||||
| Video Length | 6 seconds |
|
||||
| Frames Per Second | 8 frames |
|
||||
| Resolution | 720 * 480 |
|
||||
| Quantized Inference | Not Supported |
|
||||
| Multi-card Inference | Not Supported |
|
||||
| Download Link | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
|
||||
|
||||
## Project Structure
|
||||
|
||||
@ -89,7 +91,7 @@ of the **CogVideoX** open-source model.
|
||||
|
||||
### sat
|
||||
|
||||
+ [sat_demo](sat/configs/README_zh.md): Contains the inference code and fine-tuning code of SAT weights. It is
|
||||
+ [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is
|
||||
recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform
|
||||
rapid stacking and development.
|
||||
|
||||
|
34
README_zh.md
34
README_zh.md
@ -5,11 +5,13 @@
|
||||
|
||||
<div align="center">
|
||||
<img src=resources/logo.svg width="50%"/>
|
||||
</div>
|
||||
<p align="center">
|
||||
|
||||
🤗 在 <a href="https://huggingface.co/spaces/THUDM/CogVideoX" target="_blank">CogVideoX Huggingface Space</a> 体验视频生成模型
|
||||
</p>
|
||||
</div>
|
||||
<p align="center">
|
||||
📚 查看 <a href="resources/CogVideoX.pdf" target="_blank">论文</a>
|
||||
</p>
|
||||
<p align="center">
|
||||
👋 加入我们的 <a href="resources/WECHAT.md" target="_blank">微信</a> 和 <a href="https://discord.gg/Ewaabk6s" target="_blank">Discord</a>
|
||||
</p>
|
||||
@ -52,18 +54,18 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
|
||||
|
||||
下表战展示目前我们提供的视频生成模型列表,以及相关基础信息:
|
||||
|
||||
| 模型名字 | CogVideoX-2B |
|
||||
|----------------|--------------------------------------------------------------|
|
||||
| 提示词语言 | English |
|
||||
| 推理显存消耗 (FP-16) | 36GB |
|
||||
| 微调显存消耗 (bs=1) | 46.2GB |
|
||||
| 提示词长度上限 | 226 Tokens |
|
||||
| 视频长度 | 6 seconds |
|
||||
| 帧率(每秒) | 8 frames |
|
||||
| 视频分辨率 | 720 * 480 |
|
||||
| 量化推理 | 不支持 |
|
||||
| 多卡推理 | 不支持 |
|
||||
| 权重地址 | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
|
||||
| 模型名字 | CogVideoX-2B |
|
||||
|----------------|-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 提示词语言 | English |
|
||||
| 推理显存消耗 (FP-16) | 36GB using diffusers (will be optimized before the PR is merged) and 25G using [SAT](https://github.com/THUDM/SwissArmyTransformer) |
|
||||
| 微调显存消耗 (bs=1) | 42GB |
|
||||
| 提示词长度上限 | 226 Tokens |
|
||||
| 视频长度 | 6 seconds |
|
||||
| 帧率(每秒) | 8 frames |
|
||||
| 视频分辨率 | 720 * 480 |
|
||||
| 量化推理 | 不支持 |
|
||||
| 多卡推理 | 不支持 |
|
||||
| 权重地址 | 🤗 [CogVideoX-2B](https://huggingface.co/THUDM/CogVideoX-2B) |
|
||||
|
||||
## 项目结构
|
||||
|
||||
@ -77,12 +79,12 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
|
||||
+ [web_demo](inference/web_demo.py): 一个简单的streamlit网页应用,展示如何使用 CogVideoX-2B 模型生成视频。
|
||||
|
||||
<div style="text-align: center;">
|
||||
<img src="resources/web_demo.png" style="width: 100%%; height: auto;" />
|
||||
<img src="resources/web_demo.png" style="width: 100%; height: auto;" />
|
||||
</div>
|
||||
|
||||
### sat
|
||||
|
||||
+ [sat_demo](sat/configs/README_zh.md): 包含了 SAT 权重的推理代码和微调代码,推荐基于 CogVideoX
|
||||
+ [sat_demo](sat/README_zh.md): 包含了 SAT 权重的推理代码和微调代码,推荐基于 CogVideoX
|
||||
模型结构进行改进,创新的研究者使用改代码以更好的进行快速的堆叠和开发。
|
||||
|
||||
### tools
|
||||
|
BIN
resources/CogVideoX.pdf
Normal file
BIN
resources/CogVideoX.pdf
Normal file
Binary file not shown.
@ -5,7 +5,7 @@ args:
|
||||
batch_size: 1
|
||||
input_type: txt
|
||||
input_file: test.txt
|
||||
sampling_num_frames: 13
|
||||
sampling_num_frames: 13 # Must be 11,13 or 19
|
||||
sampling_fps: 8
|
||||
fp16: True
|
||||
output_dir: outputs/
|
||||
@ -82,13 +82,13 @@ model:
|
||||
target: sgm.modules.GeneralConditioner
|
||||
params:
|
||||
emb_models:
|
||||
- is_trainable: false
|
||||
input_key: txt
|
||||
ucg_rate: 0.1
|
||||
target: sgm.modules.encoders.modules.FrozenT5Embedder
|
||||
params:
|
||||
model_dir: "google/t5-v1_1-xxl"
|
||||
max_length: 226
|
||||
- is_trainable: false
|
||||
input_key: txt
|
||||
ucg_rate: 0.1
|
||||
target: sgm.modules.encoders.modules.FrozenT5Embedder
|
||||
params:
|
||||
model_dir: "google/t5-v1_1-xxl"
|
||||
max_length: 226
|
||||
|
||||
first_stage_config:
|
||||
target: vae_modules.autoencoder.VideoAutoencoderInferenceWrapper
|
||||
|
@ -177,12 +177,13 @@ def sampling_main(args, model_cls):
|
||||
latent = 1.0 / model.scale_factor * samples_z
|
||||
|
||||
recons = []
|
||||
for i in range(6):
|
||||
loop_num = (T - 1) // 2
|
||||
for i in range(loop_num):
|
||||
if i == 0:
|
||||
start_frame, end_frame = 0, 3
|
||||
else:
|
||||
start_frame, end_frame = i * 2 + 1, i * 2 + 3
|
||||
if i == 5:
|
||||
if i == loop_num - 1:
|
||||
clear_fake_cp_cache = True
|
||||
else:
|
||||
clear_fake_cp_cache = False
|
||||
|
Loading…
x
Reference in New Issue
Block a user