From 00054de87f0510ef040a538ee512fd5d2cd92ab3 Mon Sep 17 00:00:00 2001
From: zR <2448370773@qq.com>
Date: Fri, 9 Aug 2024 19:48:47 +0800
Subject: [PATCH 1/3] update community framework
---
README.md | 21 ++++++++++++++++-----
README_zh.md | 37 ++++++++++++++++++++++---------------
2 files changed, 38 insertions(+), 20 deletions(-)
diff --git a/README.md b/README.md
index 0801638..49c8256 100644
--- a/README.md
+++ b/README.md
@@ -20,9 +20,11 @@
## Update and News
-- ð¥ **News**: ```2024/8/7```: CogVideoX has been integrated into `diffusers` version 0.30.0. Inference can now be performed
+- ð¥ **News**: ```2024/8/7```: CogVideoX has been integrated into `diffusers` version 0.30.0. Inference can now be
+ performed
on a single 3090 GPU. For more details, please refer to the [code](inference/cli_demo.py).
-- ð¥ **News**: ```2024/8/6```: We have also open-sourced **3D Causal VAE** used in **CogVideoX-2B**, which can reconstruct
+- ð¥ **News**: ```2024/8/6```: We have also open-sourced **3D Causal VAE** used in **CogVideoX-2B**, which can
+ reconstruct
the video almost losslessly.
- ð¥ **News**: ```2024/8/6```: We have open-sourced **CogVideoX-2B**ïŒthe first model in the CogVideoX series of video
generation models.
@@ -54,9 +56,9 @@ Jump to a specific section:
### Prompt Optimization
-Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use the GLM-4 model to
-optimize the prompt. This is crucial because the model is trained with long prompts, and a good prompt directly affects
-the quality of the generated video.
+Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use large models like
+GLM-4 (or other comparable products, such as GPT-4) to optimize the model. This is crucial because the model is trained
+with long prompts, and a good prompt directly impacts the quality of the video generation.
### SAT
@@ -123,6 +125,15 @@ along with related basic information:
| Download Link (HF diffusers Model) | ð€ [Huggingface](https://huggingface.co/THUDM/CogVideoX-2B) [ð€ ModelScope](https://modelscope.cn/models/ZhipuAI/CogVideoX-2b) [ð« WiseModel](https://wisemodel.cn/models/ZhipuAI/CogVideoX-2b) |
| Download Link (SAT Model) | [SAT](./sat/README.md) |
+## Friendly Links
+
+We highly welcome contributions from the community and actively contribute to the open-source community. The following
+works have already been adapted for CogVideoX, and we invite everyone to use them:
+
++ [Xorbits Inference](https://github.com/xorbitsai/inference): A powerful and comprehensive distributed inference
+ framework, allowing you to easily deploy your own models or the latest cutting-edge open-source models with just one
+ click.
+
## Project Structure
This open-source repository will guide developers to quickly get started with the basic usage and fine-tuning examples
diff --git a/README_zh.md b/README_zh.md
index 5bd0e7e..4525f51 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -21,7 +21,8 @@
## 项ç®æŽæ°
-- ð¥ **News**: ```2024/8/7```: CogVideoX å·²ç»å并å
¥ `diffusers` 0.30.0çæ¬ïŒååŒ 3090å¯ä»¥æšçïŒè¯Šæ
请è§[代ç ](inference/cli_demo.py)ã
+- ð¥ **News**: ```2024/8/7```: CogVideoX å·²ç»å并å
¥ `diffusers`
+ 0.30.0çæ¬ïŒååŒ 3090å¯ä»¥æšçïŒè¯Šæ
请è§[代ç ](inference/cli_demo.py)ã
- ð¥ **News**: ```2024/8/6```: æ们åŒæº **3D Causal VAE**ïŒçšäº **CogVideoX-2B**ïŒå¯ä»¥å ä¹æ æå°éæè§é¢ã
- ð¥ **News**: ```2024/8/6```: æ们åŒæº CogVideoX ç³»åè§é¢çææš¡åç第äžäžªæš¡å, **CogVideoX-2B**ã
- ð± **Source**: ```2022/5/19```: æ们åŒæºäº CogVideo è§é¢çææš¡åïŒç°åšäœ å¯ä»¥åš `CogVideo` åæ¯äžçå°ïŒïŒè¿æ¯éŠäžªåŒæºçåºäº
@@ -50,8 +51,8 @@
### æ瀺è¯äŒå
-åšåŒå§è¿è¡æš¡åä¹åïŒè¯·åè[è¿é](inference/convert_demo.py) æ¥çæ们æ¯æä¹äœ¿çšGLM-4倧暡å对暡åè¿è¡äŒåçïŒè¿åŸéèŠïŒ
-ç±äºæš¡åæ¯åšé¿æ瀺è¯äžè®ç»çïŒäžé¢å¥œççŽæ¥åœ±åäºè§é¢çæç莚éã
+åšåŒå§è¿è¡æš¡åä¹åïŒè¯·åè[è¿é](inference/convert_demo.py) æ¥çæ们æ¯æä¹äœ¿çšGLM-4(æè
å级å«çå
¶ä»äº§åïŒäŸåŠGPT-4)倧暡å对暡åè¿è¡äŒåçïŒè¿åŸéèŠïŒ
+ç±äºæš¡åæ¯åšé¿æ瀺è¯äžè®ç»çïŒäžäžªå¥œçæ瀺è¯çŽæ¥åœ±åäºè§é¢çæç莚éã
### SAT
@@ -95,19 +96,25 @@ CogVideoXæ¯ [æž
圱](https://chatglm.cn/video?fr=osm_cogvideox) åæºçåŒæº
äžè¡šå±ç€ºç®åæ们æäŸçè§é¢çææš¡ååè¡šïŒä»¥åçžå
³åºç¡ä¿¡æ¯:
-| æš¡åå | CogVideoX-2B |
-|---------------------|-------------------------------------------------------------------------------------------------------------------------------|
-| æ瀺è¯è¯èš | English |
-| åGPUæšç (FP-16) æŸåæ¶è | 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer)
23.9GB using diffusers |
-| å€GPUæšç (FP-16) æŸåæ¶è | 20GB minimum per GPU using diffusers |
-| 埮è°æŸåæ¶è (bs=1) | 42GB |
-| æ瀺è¯é¿åºŠäžé | 226 Tokens |
-| è§é¢é¿åºŠ | 6 seconds |
-| 垧çïŒæ¯ç§ïŒ | 8 frames |
-| è§é¢å蟚ç | 720 * 480 |
-| éåæšç | äžæ¯æ |
+| æš¡åå | CogVideoX-2B |
+|---------------------|---------------------------------------------------------------------------------------------------------------------------------|
+| æ瀺è¯è¯èš | English |
+| åGPUæšç (FP-16) æŸåæ¶è | 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer)
23.9GB using diffusers |
+| å€GPUæšç (FP-16) æŸåæ¶è | 20GB minimum per GPU using diffusers |
+| 埮è°æŸåæ¶è (bs=1) | 42GB |
+| æ瀺è¯é¿åºŠäžé | 226 Tokens |
+| è§é¢é¿åºŠ | 6 seconds |
+| 垧çïŒæ¯ç§ïŒ | 8 frames |
+| è§é¢å蟚ç | 720 * 480 |
+| éåæšç | äžæ¯æ |
| äžèœœå°å (Diffusers æš¡å) | ð€ [Huggingface](https://huggingface.co/THUDM/CogVideoX-2B) [ð€ ModelScope](https://modelscope.cn/models/ZhipuAI/CogVideoX-2b) |
-| äžèœœå°å (SAT æš¡å) | [SAT](./sat/README_zh.md) |
+| äžèœœå°å (SAT æš¡å) | [SAT](./sat/README_zh.md) |
+
+## åæ
éŸæ¥
+
+æ们é垞欢è¿æ¥èªç€Ÿåºç莡ç®ïŒå¹¶ç§¯æç莡ç®åŒæºç€Ÿåºã以äžäœåå·²ç»å¯¹CogVideoXè¿è¡äºéé
ïŒæ¬¢è¿å€§å®¶äœ¿çš:
+
++ [Xorbits Inference](https://github.com/xorbitsai/inference): æ§èœåŒºå€§äžåèœå
šé¢çååžåŒæšçæ¡æ¶ïŒèœ»æŸäžé®éšçœ²äœ èªå·±çæš¡åæå
眮çå沿åŒæºæš¡åã
## å®æŽé¡¹ç®ä»£ç ç»æ
From 032180bb734828d620630385a7f3e2e8689ce84a Mon Sep 17 00:00:00 2001
From: zR <2448370773@qq.com>
Date: Fri, 9 Aug 2024 20:36:17 +0800
Subject: [PATCH 2/3] update vae demo
---
README_ja.md | 66 ++++++++++++++++++-----------
inference/cli_demo.py | 18 ++++----
inference/encoded.pt | Bin 0 -> 2247580 bytes
inference/gradio_web_demo.py | 79 +++++++++++++++++++----------------
sat/README_ja.md | 17 +++++---
sat/README_zh.md | 1 -
6 files changed, 105 insertions(+), 76 deletions(-)
create mode 100644 inference/encoded.pt
diff --git a/README_ja.md b/README_ja.md
index f697e48..de7dabf 100644
--- a/README_ja.md
+++ b/README_ja.md
@@ -21,10 +21,13 @@
## æŽæ°ãšãã¥ãŒã¹
-- ð¥ **ãã¥ãŒã¹**: ```2024/8/7```: CogVideoX 㯠`diffusers` ããŒãžã§ã³ 0.30.0 ã«çµ±åãããŸãããåäžã® 3090 GPU ã§æšè«ãå®è¡ã§ããŸãã詳现ã«ã€ããŠã¯ [ã³ãŒã](inference/cli_demo.py) ãåç
§ããŠãã ããã
+- ð¥ **ãã¥ãŒã¹**: ```2024/8/7```: CogVideoX 㯠`diffusers` ããŒãžã§ã³ 0.30.0 ã«çµ±åãããŸãããåäžã® 3090 GPU
+ ã§æšè«ãå®è¡ã§ããŸãã詳现ã«ã€ããŠã¯ [ã³ãŒã](inference/cli_demo.py) ãåç
§ããŠãã ããã
- ð¥ **ãã¥ãŒã¹**: ```2024/8/6```: **CogVideoX-2B** ã§äœ¿çšããã **3D Causal VAE** ããªãŒãã³ãœãŒã¹åããŸãããããã«ããããããªãã»ãŒç¡æ倱ã§åæ§ç¯ã§ããŸãã
- ð¥ **ãã¥ãŒã¹**: ```2024/8/6```: **CogVideoX-2B**ãCogVideoXã·ãªãŒãºã®ãããªçæã¢ãã«ã®æåã®ã¢ãã«ããªãŒãã³ãœãŒã¹åããŸããã
-- ð± **ãœãŒã¹**: ```2022/5/19```: **CogVideo** (çŸåš `CogVideo` ãã©ã³ãã§ç¢ºèªã§ããŸã) ããªãŒãã³ãœãŒã¹åããŸãããããã¯ãæåã®ãªãŒãã³ãœãŒã¹ã®äºååŠç¿æžã¿ããã¹ããããããªçæã¢ãã«ã§ãããæè¡çãªè©³çŽ°ã«ã€ããŠã¯ [ICLR'23 CogVideo è«æ](https://arxiv.org/abs/2205.15868) ãã芧ãã ããã
+- ð± **ãœãŒã¹**: ```2022/5/19```: **CogVideo** (çŸåš `CogVideo` ãã©ã³ãã§ç¢ºèªã§ããŸã)
+ ããªãŒãã³ãœãŒã¹åããŸãããããã¯ãæåã®ãªãŒãã³ãœãŒã¹ã®äºååŠç¿æžã¿ããã¹ããããããªçæã¢ãã«ã§ãããæè¡çãªè©³çŽ°ã«ã€ããŠã¯ [ICLR'23 CogVideo è«æ](https://arxiv.org/abs/2205.15868)
+ ãã芧ãã ããã
**ãã匷åãªã¢ãã«ãããã倧ããªãã©ã¡ãŒã¿ãµã€ãºã§ç»å Žäºå®ã§ããã楜ãã¿ã«ïŒ**
@@ -50,11 +53,13 @@
### ããã³ããã®æé©å
-ã¢ãã«ãå®è¡ããåã«ã[ãã®ã¬ã€ã](inference/convert_demo.py) ãåç
§ããŠãGLM-4 ã¢ãã«ã䜿çšããŠããã³ãããæé©åããæ¹æ³ã確èªããŠãã ãããããã¯éèŠã§ããã¢ãã«ã¯é·ãããã³ããã§ãã¬ãŒãã³ã°ãããŠãããããè¯ãããã³ããã¯çæããããããªã®å質ã«çŽæ¥åœ±é¿ããŸãã
+ã¢ãã«ãå®è¡ããåã«ã[ãã¡ã](inference/convert_demo.py)
+ãåèã«ããŠãGLM-4ïŒãŸãã¯åçã®è£œåãäŸãã°GPT-4ïŒã®å€§èŠæš¡ã¢ãã«ã䜿çšããŠã©ã®ããã«ã¢ãã«ãæé©åããããã確èªãã ãããããã¯éåžžã«éèŠã§ããã¢ãã«ã¯é·ãããã³ããã§ãã¬ãŒãã³ã°ãããŠãããããè¯ãããã³ããããããªçæã®å質ã«çŽæ¥åœ±é¿ãäžããŸãã
### SAT
-[sat_demo](sat/README.md) ã®æ瀺ã«åŸã£ãŠãã ãã: SATãŠã§ã€ãã®æšè«ã³ãŒããšåŸ®èª¿æŽã³ãŒããå«ãŸããŠããŸããCogVideoXã¢ãã«æ§é ã«åºã¥ããŠæ¹åããããšããå§ãããŸããé©æ°çãªç 究è
ã¯ããã®ã³ãŒãã䜿çšããŠè¿
éãªã¹ã¿ããã³ã°ãšéçºãè¡ãããšãã§ããŸãã
+[sat_demo](sat/README.md) ã®æ瀺ã«åŸã£ãŠãã ãã:
+SATãŠã§ã€ãã®æšè«ã³ãŒããšåŸ®èª¿æŽã³ãŒããå«ãŸããŠããŸããCogVideoXã¢ãã«æ§é ã«åºã¥ããŠæ¹åããããšããå§ãããŸããé©æ°çãªç 究è
ã¯ããã®ã³ãŒãã䜿çšããŠè¿
éãªã¹ã¿ããã³ã°ãšéçºãè¡ãããšãã§ããŸãã
(æšè«ã«ã¯18GBãlora埮調æŽã«ã¯40GBãå¿
èŠã§ã)
### Diffusers
@@ -94,19 +99,26 @@ CogVideoXã¯ã[æž
圱](https://chatglm.cn/video?fr=osm_cogvideox) ãšåæºã®
以äžã®è¡šã¯ãçŸåšæäŸããŠãããããªçæã¢ãã«ã®ãªã¹ããšé¢é£ããåºæ¬æ
å ±ã瀺ããŠããŸã:
-| ã¢ãã«å | CogVideoX-2B |
-|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| ããã³ããèšèª | è±èª |
-| åäžGPUæšè« (FP16) | 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer)
23.9GB using diffusers |
+| ã¢ãã«å | CogVideoX-2B |
+|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| ããã³ããèšèª | è±èª |
+| åäžGPUæšè« (FP16) | 18GB using [SAT](https://github.com/THUDM/SwissArmyTransformer)
23.9GB using diffusers |
| è€æ°GPUæšè« (FP16) | 20GB minimum per GPU using diffusers |
-| 埮調æŽã«å¿
èŠãªGPUã¡ã¢ãª(bs=1) | 40GB |
-| ããã³ããã®æå€§é· | 226 ããŒã¯ã³ |
-| ãããªã®é·ã | 6ç§ |
-| ãã¬ãŒã ã¬ãŒã | 8ãã¬ãŒã |
-| 解å床 | 720 * 480 |
-| éååæšè« | ãµããŒããããŠããŸãã |
-| ããŠã³ããŒããªã³ã¯ (HF diffusers ã¢ãã«) | ð€ [Huggingface](https://huggingface.co/THUDM/CogVideoX-2B) [ð€ ModelScope](https://modelscope.cn/models/ZhipuAI/CogVideoX-2b) [ð« WiseModel](https://wisemodel.cn/models/ZhipuAI/CogVideoX-2b) |
-| ããŠã³ããŒããªã³ã¯ (SAT ã¢ãã«) | [SAT](./sat/README.md) |
+| 埮調æŽã«å¿
èŠãªGPUã¡ã¢ãª(bs=1) | 40GB |
+| ããã³ããã®æå€§é· | 226 ããŒã¯ã³ |
+| ãããªã®é·ã | 6ç§ |
+| ãã¬ãŒã ã¬ãŒã | 8ãã¬ãŒã |
+| 解å床 | 720 * 480 |
+| éååæšè« | ãµããŒããããŠããŸãã |
+| ããŠã³ããŒããªã³ã¯ (HF diffusers ã¢ãã«) | ð€ [Huggingface](https://huggingface.co/THUDM/CogVideoX-2B) [ð€ ModelScope](https://modelscope.cn/models/ZhipuAI/CogVideoX-2b) [ð« WiseModel](https://wisemodel.cn/models/ZhipuAI/CogVideoX-2b) |
+| ããŠã³ããŒããªã³ã¯ (SAT ã¢ãã«) | [SAT](./sat/README.md) |
+
+## å奜çãªã³ã¯
+
+ã³ãã¥ããã£ããã®è²¢ç®ã倧æè¿ããç§ãã¡ããªãŒãã³ãœãŒã¹ã³ãã¥ããã£ã«ç©æ¥µçã«è²¢ç®ããŠããŸãã以äžã®äœåã¯ãã§ã«CogVideoXã«å¯Ÿå¿ããŠããããã²ãå©çšãã ããïŒ
+
++ [Xorbits Inference](https://github.com/xorbitsai/inference):
+ 匷åã§å
æ¬çãªåæ£æšè«ãã¬ãŒã ã¯ãŒã¯ã§ãããã¯ã³ã¯ãªãã¯ã§ç¬èªã®ã¢ãã«ãææ°ã®ãªãŒãã³ãœãŒã¹ã¢ãã«ãç°¡åã«ãããã€ã§ããŸãã
## ãããžã§ã¯ãæ§é
@@ -116,14 +128,17 @@ CogVideoXã¯ã[æž
圱](https://chatglm.cn/video?fr=osm_cogvideox) ãšåæºã®
+ [diffusers_demo](inference/cli_demo.py): æšè«ã³ãŒãã®è©³çŽ°ãªèª¬æãå«ãŸããŠãããäžè¬çãªãã©ã¡ãŒã¿ã®æå³ã«ã€ããŠãèšåããŠããŸãã
+ [diffusers_vae_demo](inference/cli_vae_demo.py): VAEæšè«ã³ãŒãã®å®è¡ã«ã¯çŸåš71GBã®ã¡ã¢ãªãå¿
èŠã§ãããå°æ¥çã«ã¯æé©åãããäºå®ã§ãã
-+ [convert_demo](inference/convert_demo.py): ãŠãŒã¶ãŒå
¥åãCogVideoXã«é©ãã圢åŒã«å€æããæ¹æ³ãCogVideoXã¯é·ããã£ãã·ã§ã³ã§ãã¬ãŒãã³ã°ãããŠãããããå
¥åããã¹ããLLMã䜿çšããŠãã¬ãŒãã³ã°ååžãšäžèŽãããå¿
èŠããããŸããããã©ã«ãã§ã¯GLM4ã䜿çšããŸãããGPTãGeminiãªã©ã®ä»ã®LLMã«çœ®ãæããããšãã§ããŸãã
-+ [gradio_web_demo](inference/gradio_web_demo.py): CogVideoX-2Bã¢ãã«ã䜿çšããŠãããªãçæããæ¹æ³ã瀺ãã·ã³ãã«ãªgradio Web UIã
++ [convert_demo](inference/convert_demo.py):
+ ãŠãŒã¶ãŒå
¥åãCogVideoXã«é©ãã圢åŒã«å€æããæ¹æ³ãCogVideoXã¯é·ããã£ãã·ã§ã³ã§ãã¬ãŒãã³ã°ãããŠãããããå
¥åããã¹ããLLMã䜿çšããŠãã¬ãŒãã³ã°ååžãšäžèŽãããå¿
èŠããããŸããããã©ã«ãã§ã¯GLM4ã䜿çšããŸãããGPTãGeminiãªã©ã®ä»ã®LLMã«çœ®ãæããããšãã§ããŸãã
++ [gradio_web_demo](inference/gradio_web_demo.py): CogVideoX-2Bã¢ãã«ã䜿çšããŠãããªãçæããæ¹æ³ã瀺ãã·ã³ãã«ãªgradio
+ Web UIã
oq1jt+ns7$
ziT+>~2&Se<9h&aPvo*1LZu0kJ@q=luA@7_Pwb4daXoJ4N5dWe1WEx2Z6sdWcKwtS!
z-}>17gMFef&yItxGYx96tmtVC3d~CAliH&?Jmm(NPN;$M1w&9#jt^#_0NNhBbz%5H
zI`pw26xV@UkKX#4rVXwHhf#?hLtS|Qz2#gKjdKx#t)_3=LXH~j7xRAfTgxI4Ydb9a
z4_-H$8Rd96tEFKyKk6jZeb3!ORuU66*I;np3!R3SD5|7I?iKcr5!^!on-o1*a+GTY
zv1M^Ss-aD)gdXf5YJqq`QLS$J$S8W6vnZ`*pmH69-l`P6;WkuO`|!x^V5R}EZbS8f
zCM`CSwXQ=EyOPx(;hq xskpP9WJxPdhv`Y
z7nNi-5y(`G%$9-=u@ao#AKz4nnn*RG+L^>2|H*wiCvzAk5?kb#+mLA{%9~ro0r`dr
zcuJn62H;PIGMPS!mPkz#bbl9Lqf}F3_ewevzGr835-Zk59i-m!T!!e~$VPL#PiF~J
zQP038jwJ%jBO=5CQ1Nr%)8)i8yiR{HP4B^bx{!;!2JN{AH*=13Dka`iG01Co1gY#+
z5s8+YGt<~`(oIqJR6^xYlD-~|
zq6`9A{|RD!jC!s+9b7Ex;yi5)HoBz4<!`t^Hk3qr+Rv+G{eg7-utsBn&Y#-0tTT
z%b4I1J;W^(csF1e;^Tv9m`
!dF
zAvHCSlciu}ZRAFFr
waw5R9=-1#IFtrLjfJ3SJ6+DA67shpU_=1cOMKbl9+4)
zHLuLb^&