# CogVideo & CogVideoX [Read this in English](./README_zh.md) [äžæé 读](./README_zh.md)
ð€ Huggingface Space ãŸã㯠ð€ ModelScope Space 㧠CogVideoX-5B ã¢ãã«ããªã³ã©ã€ã³ã§äœéšããŠãã ãã
ð è«æãšäœ¿çšããã¥ã¡ã³ãã衚瀺ããŸãã
ð WeChat ãš Discord ã«åå
ð æž åœ± ãš APIãã©ãããã©ãŒã ã蚪åããŠããã倧èŠæš¡ãªåçšãããªçæã¢ãã«ãäœéš.
## æŽæ°ãšãã¥ãŒã¹ - ð¥ð¥ **ãã¥ãŒã¹**: ```2024/10/13```: ã³ã¹ãåæžã®ãããåäžã®4090 GPUã§`CogVideoX-5B` ã埮調æŽã§ãããã¬ãŒã ã¯ãŒã¯ [cogvideox-factory](https://github.com/a-r-r-o-w/cogvideox-factory) ããªãªãŒã¹ãããŸãããè€æ°ã®è§£å床ã§ã®åŸ®èª¿æŽã«å¯Ÿå¿ããŠããŸãããã²ãå©çšãã ããïŒ- ð¥**ãã¥ãŒã¹**: ```2024/10/10```: æè¡å ±åæžãæŽæ°ãããã詳现ãªãã¬ãŒãã³ã°æ å ±ãšãã¢ãè¿œå ããŸããã - ð¥ **ãã¥ãŒã¹**: ```2024/10/10```: æè¡å ±åæžãæŽæ°ããŸããã[ãã¡ã](https://arxiv.org/pdf/2408.06072) ãã¯ãªãã¯ããŠã芧ãã ãããããã«ãã¬ãŒãã³ã°ã®è©³çŽ°ãšãã¢ãè¿œå ããŸããããã¢ãèŠãã«ã¯[ãã¡ã](https://yzy-thu.github.io/CogVideoX-demo/) ãã¯ãªãã¯ããŠãã ããã - ð¥**ãã¥ãŒã¹**: ```2024/10/09```: é£æžã®[æè¡ããã¥ã¡ã³ã](https://zhipu-ai.feishu.cn/wiki/DHCjw1TrJiTyeukfc9RceoSRnCh) ã§CogVideoXã®åŸ®èª¿æŽã¬ã€ããå ¬éããŠããŸããåé ã®èªç±åºŠãããã«é«ãããããå ¬éãããŠããããã¥ã¡ã³ãå ã®ãã¹ãŠã®äŸãå®å šã«åçŸå¯èœã§ãã - ð¥**ãã¥ãŒã¹**: ```2024/9/19```: CogVideoXã·ãªãŒãºã®ç»åçæãããªã¢ãã« **CogVideoX-5B-I2V** ããªãŒãã³ãœãŒã¹åããŸããããã®ã¢ãã«ã¯ãç»åãèæ¯å ¥åãšããŠäœ¿çšããããã³ããã¯ãŒããšçµã¿åãããŠãããªãçæããããšãã§ããããé«ãå¶åŸ¡æ§ãæäŸããŸããããã«ãããCogVideoXã·ãªãŒãºã®ã¢ãã«ã¯ãããã¹ããããããªçæããããªã®ç¶ç¶ãç»åãããããªçæã®3ã€ã®ã¿ã¹ã¯ããµããŒãããããã«ãªããŸããããªã³ã©ã€ã³ã§ã®[äœéš](https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space) ãã楜ãã¿ãã ããã - ð¥ð¥ **ãã¥ãŒã¹**: ```2024/9/19```: CogVideoXã®ãã¬ãŒãã³ã°ããã»ã¹ã§ãããªããŒã¿ãããã¹ãèšè¿°ã«å€æããããã«äœ¿çšããããã£ãã·ã§ã³ã¢ãã« [CogVLM2-Caption](https://huggingface.co/THUDM/cogvlm2-llama3-caption) ããªãŒãã³ãœãŒã¹åããŸãããããŠã³ããŒãããŠãå©çšãã ããã - ð¥ ```2024/8/27```: CogVideoXã·ãªãŒãºã®ãã倧ããªã¢ãã« **CogVideoX-5B** ããªãŒãã³ãœãŒã¹åããŸãããã¢ãã«ã®æšè«æ§èœãå€§å¹ ã«æé©åããæšè«ã®ããŒãã«ãå€§å¹ ã«äžããŸããã`GTX 1080TI` ãªã©ã®æ§åGPU㧠**CogVideoX-2B** ãã`RTX 3060` ãªã©ã®ãã¹ã¯ãããGPU㧠**CogVideoX-5B** ã¢ãã«ãå®è¡ã§ããŸããäŸåé¢ä¿ãæŽæ°ã»ã€ã³ã¹ããŒã«ããããã«ã[èŠä»¶](requirements.txt) ãå³å®ããæšè«ã³ãŒã㯠[cli_demo](inference/cli_demo.py) ãåç §ããŠãã ãããããã«ã**CogVideoX-2B** ã¢ãã«ã®ãªãŒãã³ãœãŒã¹ã©ã€ã»ã³ã¹ã **Apache 2.0 ã©ã€ã»ã³ã¹** ã«å€æŽãããŸããã - ð¥ ```2024/8/6```: **CogVideoX-2B** çšã® **3D Causal VAE** ããªãŒãã³ãœãŒã¹åããŸãããããã«ããããããªãã»ãŒç¡æ倱ã§åæ§ç¯ããããšãã§ããŸãã - ð¥ ```2024/8/6```: CogVideoXã·ãªãŒãºã®ãããªçæã¢ãã«ã®æåã®ã¢ãã«ã**CogVideoX-2B** ããªãŒãã³ãœãŒã¹åããŸããã - ð± **ãœãŒã¹**: ```2022/5/19```: CogVideoãããªçæã¢ãã«ããªãŒãã³ãœãŒã¹åããŸããïŒçŸåšã`CogVideo` ãã©ã³ãã§ç¢ºèªã§ããŸãïŒãããã¯ããã©ã³ã¹ãã©ãŒããŒã«åºã¥ãåã®ãªãŒãã³ãœãŒã¹å€§èŠæš¡ããã¹ãçæãããªã¢ãã«ã§ããæè¡çãªè©³çŽ°ã«ã€ããŠã¯ã[ICLR'23è«æ](https://arxiv.org/abs/2205.15868) ãã芧ãã ããã **ãã匷åãªã¢ãã«ãããã倧ããªãã©ã¡ãŒã¿ãµã€ãºã§ç»å Žäºå®ã§ããã楜ãã¿ã«ïŒ** ## ç®æ¬¡ ç¹å®ã®ã»ã¯ã·ã§ã³ã«ãžã£ã³ãïŒ - [ã¯ã€ãã¯ã¹ã¿ãŒã](#ã¯ã€ãã¯ã¹ã¿ãŒã) - [SAT](#sat) - [Diffusers](#Diffusers) - [CogVideoX-2B ã®ã£ã©ãªãŒ](#CogVideoX-2B-ã®ã£ã©ãªãŒ) - [ã¢ãã«çŽ¹ä»](#ã¢ãã«çŽ¹ä») - [ãããžã§ã¯ãæ§é ](#ãããžã§ã¯ãæ§é ) - [æšè«](#æšè«) - [sat](#sat) - [ããŒã«](#ããŒã«) - [ãããžã§ã¯ãèšç»](#ãããžã§ã¯ãèšç») - [ã¢ãã«ã©ã€ã»ã³ã¹](#ã¢ãã«ã©ã€ã»ã³ã¹) - [CogVideo(ICLR'23)ã¢ãã«çŽ¹ä»](#CogVideoICLR23) - [åŒçš](#åŒçš) ## ã¯ã€ãã¯ã¹ã¿ãŒã ### ããã³ããã®æé©å ã¢ãã«ãå®è¡ããåã«ã[ãã¡ã](inference/convert_demo.py) ãåèã«ããŠãGLM-4ïŒãŸãã¯åçã®è£œåãäŸãã°GPT-4ïŒã®å€§èŠæš¡ã¢ãã«ã䜿çšããŠã©ã®ããã«ã¢ãã«ãæé©åããããã確èªãã ãããããã¯éåžžã«éèŠã§ããã¢ãã«ã¯é·ãããã³ããã§ãã¬ãŒãã³ã°ãããŠãããããè¯ãããã³ããããããªçæã®å質ã«çŽæ¥åœ±é¿ãäžããŸãã ### SAT [sat_demo](sat/README.md) ã®æ瀺ã«åŸã£ãŠãã ãã: SATãŠã§ã€ãã®æšè«ã³ãŒããšåŸ®èª¿æŽã³ãŒããå«ãŸããŠããŸããCogVideoXã¢ãã«æ§é ã«åºã¥ããŠæ¹åããããšããå§ãããŸããé©æ°çãªç 究è ã¯ããã®ã³ãŒãã䜿çšããŠè¿ éãªã¹ã¿ããã³ã°ãšéçºãè¡ãããšãã§ããŸãã ### Diffusers ``` pip install -r requirements.txt ``` 次㫠[diffusers_demo](inference/cli_demo.py) ãåç §ããŠãã ãã: æšè«ã³ãŒãã®è©³çŽ°ãªèª¬æãå«ãŸããŠãããäžè¬çãªãã©ã¡ãŒã¿ã®æå³ã«ã€ããŠãèšåããŠããŸãã éååæšè«ã®è©³çŽ°ã«ã€ããŠã¯ã[diffusers-torchao](https://github.com/sayakpaul/diffusers-torchao/) ãåç §ããŠãã ãããDiffusers ãš TorchAO ã䜿çšããããšã§ãéååæšè«ãå¯èœãšãªããã¡ã¢ãªå¹çã®è¯ãæšè«ããã³ã³ãã€ã«æã«å Žåã«ãã£ãŠã¯é床ã®åäžãæåŸ ã§ããŸããA100 ããã³ H100 äžã§ã®ããŸããŸãªèšå®ã«ãããã¡ã¢ãªããã³æéã®ãã³ãããŒã¯ã®å®å šãªãªã¹ãã¯ã[diffusers-torchao](https://github.com/sayakpaul/diffusers-torchao) ã«å ¬éãããŠããŸãã ## Gallery ### CogVideoX-5Bã¢ãã«å | CogVideoX-2B | CogVideoX-5B | CogVideoX-5B-I2V |
---|---|---|---|
æšè«ç²ŸåºŠ | FP16*(æšå¥š), BF16, FP32, FP8*, INT8, INT4ã¯éå¯Ÿå¿ | BF16(æšå¥š), FP16, FP32, FP8*, INT8, INT4ã¯éå¯Ÿå¿ | |
åäžGPUã®ã¡ã¢ãªæ¶è²» |
SAT FP16: 18GB diffusers FP16: 4GBãã* diffusers INT8(torchao): 3.6GBãã* |
SAT BF16: 26GB diffusers BF16 : 5GBãã* diffusers INT8(torchao): 4.4GBãã* |
|
ãã«ãGPUã®ã¡ã¢ãªæ¶è²» | FP16: 10GB* using diffusers |
BF16: 15GB* using diffusers |
|
æšè«é床 (ã¹ããã = 50, FP/BF16) |
åäžA100: çŽ90ç§ åäžH100: çŽ45ç§ |
åäžA100: çŽ180ç§ åäžH100: çŽ90ç§ |
|
ãã¡ã€ã³ãã¥ãŒãã³ã°ç²ŸåºŠ | FP16 | BF16 | |
ãã¡ã€ã³ãã¥ãŒãã³ã°æã®ã¡ã¢ãªæ¶è²» | 47 GB (bs=1, LORA) 61 GB (bs=2, LORA) 62GB (bs=1, SFT) |
63 GB (bs=1, LORA) 80 GB (bs=2, LORA) 75GB (bs=1, SFT) |
78 GB (bs=1, LORA) 75GB (bs=1, SFT, 16GPU) |
ããã³ããèšèª | è±èª* | ||
ããã³ããã®æ倧ããŒã¯ã³æ° | 226ããŒã¯ã³ | ||
ãããªã®é·ã | 6ç§ | ||
ãã¬ãŒã ã¬ãŒã | 8ãã¬ãŒã /ç§ | ||
ãããªè§£å床 | 720 * 480ãä»ã®è§£å床ã¯é察å¿(ãã¡ã€ã³ãã¥ãŒãã³ã°å«ã) | ||
äœçœ®ãšã³ã³ãŒãã£ã³ã° | 3d_sincos_pos_embed | 3d_sincos_pos_embed | 3d_rope_pos_embed + learnable_pos_embed |
ããŠã³ããŒããªã³ã¯ (Diffusers) | ð€ HuggingFace ð€ ModelScope ð£ WiseModel |
ð€ HuggingFace ð€ ModelScope ð£ WiseModel |
ð€ HuggingFace ð€ ModelScope ð£ WiseModel |
ããŠã³ããŒããªã³ã¯ (SAT) | SAT |