26 KiB
CogVideo & CogVideoX
ð€ Huggingface Space ãŸã㯠ð€ ModelScope Space 㧠CogVideoX-5B ã¢ãã«ããªã³ã©ã€ã³ã§äœéšããŠãã ãã
ð è«æãšäœ¿çšããã¥ã¡ã³ãã衚瀺ããŸãã
ð WeChat ãš Discord ã«åå
ð æž åœ± ãš APIãã©ãããã©ãŒã ã蚪åããŠããã倧èŠæš¡ãªåçšãããªçæã¢ãã«ãäœéš.
æŽæ°ãšãã¥ãŒã¹
- ð¥ð¥ ãã¥ãŒã¹:
2024/10/13
: ã³ã¹ãåæžã®ãããåäžã®4090 GPUã§CogVideoX-5B
ã埮調æŽã§ãããã¬ãŒã ã¯ãŒã¯ cogvideox-factory ããªãªãŒã¹ãããŸãããè€æ°ã®è§£å床ã§ã®åŸ®èª¿æŽã«å¯Ÿå¿ããŠããŸãããã²ãå©çšãã ããïŒ- ð¥ãã¥ãŒã¹:2024/10/10
: æè¡å ±åæžãæŽæ°ãããã詳现ãªãã¬ãŒãã³ã°æ å ±ãšãã¢ãè¿œå ããŸããã - ð¥ãã¥ãŒã¹:
2024/10/09
: é£æžã®æè¡ããã¥ã¡ã³ãã§CogVideoXã®åŸ®èª¿æŽã¬ã€ããå ¬éããŠããŸããåé ã®èªç±åºŠãããã«é«ãããããå ¬éãããŠããããã¥ã¡ã³ãå ã®ãã¹ãŠã®äŸãå®å šã«åçŸå¯èœã§ãã - ð¥ãã¥ãŒã¹:
2024/9/19
: CogVideoXã·ãªãŒãºã®ç»åçæãããªã¢ãã« CogVideoX-5B-I2V ããªãŒãã³ãœãŒã¹åããŸããããã®ã¢ãã«ã¯ãç»åãèæ¯å ¥åãšããŠäœ¿çšããããã³ããã¯ãŒããšçµã¿åãããŠãããªãçæããããšãã§ããããé«ãå¶åŸ¡æ§ãæäŸããŸããããã«ãããCogVideoXã·ãªãŒãºã®ã¢ãã«ã¯ãããã¹ããããããªçæããããªã®ç¶ç¶ãç»åãããããªçæã®3ã€ã®ã¿ã¹ã¯ããµããŒãããããã«ãªããŸããããªã³ã©ã€ã³ã§ã®äœéš ãã楜ãã¿ãã ããã - ð¥ð¥ ãã¥ãŒã¹:
2024/9/19
: CogVideoXã®ãã¬ãŒãã³ã°ããã»ã¹ã§ãããªããŒã¿ãããã¹ãèšè¿°ã«å€æããããã«äœ¿çšããããã£ãã·ã§ã³ã¢ãã« CogVLM2-Caption ããªãŒãã³ãœãŒã¹åããŸãããããŠã³ããŒãããŠãå©çšãã ããã - ð¥
2024/8/27
: CogVideoXã·ãªãŒãºã®ãã倧ããªã¢ãã« CogVideoX-5B ããªãŒãã³ãœãŒã¹åããŸãããã¢ãã«ã®æšè«æ§èœãå€§å¹ ã«æé©åããæšè«ã®ããŒãã«ãå€§å¹ ã«äžããŸãããGTX 1080TI
ãªã©ã®æ§åGPU㧠CogVideoX-2B ããRTX 3060
ãªã©ã®ãã¹ã¯ãããGPU㧠CogVideoX-5B ã¢ãã«ãå®è¡ã§ããŸããäŸåé¢ä¿ãæŽæ°ã»ã€ã³ã¹ããŒã«ããããã«ãèŠä»¶ ãå³å®ããæšè«ã³ãŒã㯠cli_demo ãåç §ããŠãã ãããããã«ãCogVideoX-2B ã¢ãã«ã®ãªãŒãã³ãœãŒã¹ã©ã€ã»ã³ã¹ã Apache 2.0 ã©ã€ã»ã³ã¹ ã«å€æŽãããŸããã - ð¥
2024/8/6
: CogVideoX-2B çšã® 3D Causal VAE ããªãŒãã³ãœãŒã¹åããŸãããããã«ããããããªãã»ãŒç¡æ倱ã§åæ§ç¯ããããšãã§ããŸãã - ð¥
2024/8/6
: CogVideoXã·ãªãŒãºã®ãããªçæã¢ãã«ã®æåã®ã¢ãã«ãCogVideoX-2B ããªãŒãã³ãœãŒã¹åããŸããã - ð± ãœãŒã¹:
2022/5/19
: CogVideoãããªçæã¢ãã«ããªãŒãã³ãœãŒã¹åããŸããïŒçŸåšãCogVideo
ãã©ã³ãã§ç¢ºèªã§ããŸãïŒãããã¯ããã©ã³ã¹ãã©ãŒããŒã«åºã¥ãåã®ãªãŒãã³ãœãŒã¹å€§èŠæš¡ããã¹ãçæãããªã¢ãã«ã§ããæè¡çãªè©³çŽ°ã«ã€ããŠã¯ãICLR'23è«æ ãã芧ãã ããã
ãã匷åãªã¢ãã«ãããã倧ããªãã©ã¡ãŒã¿ãµã€ãºã§ç»å Žäºå®ã§ããã楜ãã¿ã«ïŒ
ç®æ¬¡
ç¹å®ã®ã»ã¯ã·ã§ã³ã«ãžã£ã³ãïŒ
- ã¯ã€ãã¯ã¹ã¿ãŒã
- CogVideoX-2B ã®ã£ã©ãªãŒ
- ã¢ãã«çŽ¹ä»
- ãããžã§ã¯ãæ§é
- ãããžã§ã¯ãèšç»
- ã¢ãã«ã©ã€ã»ã³ã¹
- CogVideo(ICLR'23)ã¢ãã«çŽ¹ä»
- åŒçš
ã¯ã€ãã¯ã¹ã¿ãŒã
ããã³ããã®æé©å
ã¢ãã«ãå®è¡ããåã«ããã¡ã ãåèã«ããŠãGLM-4ïŒãŸãã¯åçã®è£œåãäŸãã°GPT-4ïŒã®å€§èŠæš¡ã¢ãã«ã䜿çšããŠã©ã®ããã«ã¢ãã«ãæé©åããããã確èªãã ãããããã¯éåžžã«éèŠã§ããã¢ãã«ã¯é·ãããã³ããã§ãã¬ãŒãã³ã°ãããŠãããããè¯ãããã³ããããããªçæã®å質ã«çŽæ¥åœ±é¿ãäžããŸãã
SAT
sat_demo ã®æ瀺ã«åŸã£ãŠãã ãã: SATãŠã§ã€ãã®æšè«ã³ãŒããšåŸ®èª¿æŽã³ãŒããå«ãŸããŠããŸããCogVideoXã¢ãã«æ§é ã«åºã¥ããŠæ¹åããããšããå§ãããŸããé©æ°çãªç 究è ã¯ããã®ã³ãŒãã䜿çšããŠè¿ éãªã¹ã¿ããã³ã°ãšéçºãè¡ãããšãã§ããŸãã
Diffusers
pip install -r requirements.txt
次㫠diffusers_demo ãåç §ããŠãã ãã: æšè«ã³ãŒãã®è©³çŽ°ãªèª¬æãå«ãŸããŠãããäžè¬çãªãã©ã¡ãŒã¿ã®æå³ã«ã€ããŠãèšåããŠããŸãã
éååæšè«ã®è©³çŽ°ã«ã€ããŠã¯ãdiffusers-torchao ãåç §ããŠãã ãããDiffusers ãš TorchAO ã䜿çšããããšã§ãéååæšè«ãå¯èœãšãªããã¡ã¢ãªå¹çã®è¯ãæšè«ããã³ã³ãã€ã«æã«å Žåã«ãã£ãŠã¯é床ã®åäžãæåŸ ã§ããŸããA100 ããã³ H100 äžã§ã®ããŸããŸãªèšå®ã«ãããã¡ã¢ãªããã³æéã®ãã³ãããŒã¯ã®å®å šãªãªã¹ãã¯ãdiffusers-torchao ã«å ¬éãããŠããŸãã
Gallery
CogVideoX-5B
CogVideoX-2B
ã®ã£ã©ãªãŒã®å¯Ÿå¿ããããã³ããã¯ãŒãã衚瀺ããã«ã¯ããã¡ããã¯ãªãã¯ããŠãã ãã
ã¢ãã«çŽ¹ä»
CogVideoXã¯ãæž åœ± ãšåæºã®ãªãŒãã³ãœãŒã¹çãããªçæã¢ãã«ã§ãã 以äžã®è¡šã«ãæäŸããŠãããããªçæã¢ãã«ã®åºæ¬æ å ±ã瀺ããŸã:
ã¢ãã«å | CogVideoX-2B | CogVideoX-5B | CogVideoX-5B-I2V |
---|---|---|---|
æšè«ç²ŸåºŠ | FP16*(æšå¥š), BF16, FP32, FP8*, INT8, INT4ã¯éå¯Ÿå¿ | BF16(æšå¥š), FP16, FP32, FP8*, INT8, INT4ã¯éå¯Ÿå¿ | |
åäžGPUã®ã¡ã¢ãªæ¶è²» |
SAT FP16: 18GB diffusers FP16: 4GBãã* diffusers INT8(torchao): 3.6GBãã* |
SAT BF16: 26GB diffusers BF16 : 5GBãã* diffusers INT8(torchao): 4.4GBãã* |
|
ãã«ãGPUã®ã¡ã¢ãªæ¶è²» | FP16: 10GB* using diffusers |
BF16: 15GB* using diffusers |
|
æšè«é床 (ã¹ããã = 50, FP/BF16) |
åäžA100: çŽ90ç§ åäžH100: çŽ45ç§ |
åäžA100: çŽ180ç§ åäžH100: çŽ90ç§ |
|
ãã¡ã€ã³ãã¥ãŒãã³ã°ç²ŸåºŠ | FP16 | BF16 | |
ãã¡ã€ã³ãã¥ãŒãã³ã°æã®ã¡ã¢ãªæ¶è²» | 47 GB (bs=1, LORA) 61 GB (bs=2, LORA) 62GB (bs=1, SFT) |
63 GB (bs=1, LORA) 80 GB (bs=2, LORA) 75GB (bs=1, SFT) |
78 GB (bs=1, LORA) 75GB (bs=1, SFT, 16GPU) |
ããã³ããèšèª | è±èª* | ||
ããã³ããã®æ倧ããŒã¯ã³æ° | 226ããŒã¯ã³ | ||
ãããªã®é·ã | 6ç§ | ||
ãã¬ãŒã ã¬ãŒã | 8ãã¬ãŒã /ç§ | ||
ãããªè§£å床 | 720 * 480ãä»ã®è§£å床ã¯é察å¿(ãã¡ã€ã³ãã¥ãŒãã³ã°å«ã) | ||
äœçœ®ãšã³ã³ãŒãã£ã³ã° | 3d_sincos_pos_embed | 3d_sincos_pos_embed | 3d_rope_pos_embed + learnable_pos_embed |
ããŠã³ããŒããªã³ã¯ (Diffusers) | ð€ HuggingFace ð€ ModelScope ð£ WiseModel |
ð€ HuggingFace ð€ ModelScope ð£ WiseModel |
ð€ HuggingFace ð€ ModelScope ð£ WiseModel |
ããŠã³ããŒããªã³ã¯ (SAT) | SAT |
ããŒã¿è§£èª¬
- diffusersã©ã€ãã©ãªã䜿çšããŠãã¹ãããéã«ã¯ã
diffusers
ã©ã€ãã©ãªãæäŸããå šãŠã®æé©åãæå¹ã«ãªã£ãŠããŸãããã®æ¹æ³ã¯ NVIDIA A100 / H100以å€ã®ããã€ã¹ã§ã®ã¡ã¢ãª/ã¡ã¢ãªæ¶è²»ã®ãã¹ãã¯è¡ã£ãŠããŸãããéåžžããã®æ¹æ³ã¯NVIDIA Ampereã¢ãŒããã¯ã㣠以äžã®å šãŠã®ããã€ã¹ã«é©å¿ã§ããŸããæé©åãç¡å¹ã«ãããšãã¡ã¢ãªæ¶è²»ã¯åå¢ããããŒã¯ã¡ã¢ãªäœ¿çšéã¯è¡šã®3åã«ãªããŸãããé床ã¯çŽ3ã4ååäžããŸãã以äžã®æé©åãéšåçã«ç¡å¹ã«ããããšãå¯èœã§ã:
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
- ãã«ãGPUã§æšè«ããå Žåã
enable_sequential_cpu_offload()
æé©åãç¡å¹ã«ããå¿ èŠããããŸãã - INT8ã¢ãã«ã䜿çšãããšæšè«é床ãäœäžããŸãããããã¯ã¡ã¢ãªã®å°ãªãGPUã§æ£åžžã«æšè«ãè¡ãããããªå質ã®æ倱ãæå°éã«æããããã®æªçœ®ã§ããæšè«é床ã¯å€§å¹ ã«äœäžããŸãã
- CogVideoX-2Bã¢ãã«ã¯
FP16
粟床ã§ãã¬ãŒãã³ã°ãããŠãããCogVideoX-5Bã¢ãã«ã¯BF16
粟床ã§ãã¬ãŒãã³ã°ãããŠããŸããæšè«æã«ã¯ã¢ãã«ããã¬ãŒãã³ã°ããã粟床ã䜿çšããããšããå§ãããŸãã - PytorchAOããã³Optimum-quanto
ã¯ãCogVideoXã®ã¡ã¢ãªèŠä»¶ãåæžããããã«ããã¹ããšã³ã³ãŒãããã©ã³ã¹ãã©ãŒããããã³VAEã¢ãžã¥ãŒã«ãéååããããã«äœ¿çšã§ããŸããããã«ãããç¡æã®T4
Colabãããå°ãªãã¡ã¢ãªã®GPUã§ã¢ãã«ãå®è¡ããããšãå¯èœã«ãªããŸããåæ§ã«éèŠãªã®ã¯ãTorchAOã®éååã¯
torch.compile
ãšå®å šã«äºææ§ããããæšè«é床ãå€§å¹ ã«åäžãããããšãã§ããç¹ã§ããNVIDIA H100
ããã³ãã以äžã®ããã€ã¹ã§ã¯FP8
粟床ã䜿çšããå¿ èŠããããŸããããã«ã¯ãtorch
ãtorchao
ãdiffusers
ãaccelerate
Pythonããã±ãŒãžã®ãœãŒã¹ã³ãŒãããã®ã€ã³ã¹ããŒã«ãå¿ èŠã§ããCUDA 12.4
ã®äœ¿çšããå§ãããŸãã - æšè«é床ãã¹ããåæ§ã«ãäžèšã®ã¡ã¢ãªæé©åæ¹æ³ã䜿çšããŠããŸããã¡ã¢ãªæé©åã䜿çšããªãå Žåãæšè«é床ã¯çŽ10ïŒ
åäžããŸãã
diffusers
ããŒãžã§ã³ã®ã¢ãã«ã®ã¿ãéååããµããŒãããŠããŸãã - ã¢ãã«ã¯è±èªå ¥åã®ã¿ããµããŒãããŠãããä»ã®èšèªã¯å€§èŠæš¡ã¢ãã«ã®æ¹åãéããŠè±èªã«ç¿»èš³ã§ããŸãã
- ã¢ãã«ã®ãã¡ã€ã³ãã¥ãŒãã³ã°ã«äœ¿çšãããã¡ã¢ãªã¯
8 * H100
ç°å¢ã§ãã¹ããããŠããŸããããã°ã©ã ã¯èªåçã«Zero 2
æé©åã䜿çšããŠããŸããè¡šã«å ·äœçãªGPUæ°ãèšèŒãããŠããå Žåããã¡ã€ã³ãã¥ãŒãã³ã°ã«ã¯ãã®æ°ä»¥äžã®GPUãå¿ èŠã§ãã
å奜çãªã³ã¯
ã³ãã¥ããã£ããã®è²¢ç®ã倧æè¿ããç§ãã¡ããªãŒãã³ãœãŒã¹ã³ãã¥ããã£ã«ç©æ¥µçã«è²¢ç®ããŠããŸãã以äžã®äœåã¯ãã§ã«CogVideoXã«å¯Ÿå¿ããŠããããã²ãå©çšãã ããïŒ
- CogVideoX-Fun: CogVideoX-Funã¯ãCogVideoXã¢ãŒããã¯ãã£ãåºã«ããæ¹è¯ãã€ãã©ã€ã³ã§ãèªç±ãªè§£å床ãšè€æ°ã®èµ·åæ¹æ³ããµããŒãããŠããŸãã
- CogStudio: CogVideo ã® Gradio Web UI ã®å¥ã®ãªããžããªãããé«æ©èœãª Web UI ããµããŒãããŸãã
- Xorbits Inference: 匷åã§å æ¬çãªåæ£æšè«ãã¬ãŒã ã¯ãŒã¯ã§ãããã¯ã³ã¯ãªãã¯ã§ç¬èªã®ã¢ãã«ãææ°ã®ãªãŒãã³ãœãŒã¹ã¢ãã«ãç°¡åã«ãããã€ã§ããŸãã
- ComfyUI-CogVideoXWrapper ComfyUIãã¬ãŒã ã¯ãŒã¯ã䜿çšããŠãCogVideoXãã¯ãŒã¯ãããŒã«çµ±åããŸãã
- VideoSys: VideoSysã¯ã䜿ããããé«æ§èœãªãããªçæã€ã³ãã©ãæäŸããææ°ã®ã¢ãã«ãæè¡ãç¶ç¶çã«çµ±åããŠããŸãã
- AutoDLã€ã¡ãŒãž: ã³ãã¥ããã£ã¡ã³ããŒãæäŸããHuggingface Spaceã€ã¡ãŒãžã®ã¯ã³ã¯ãªãã¯ãããã€ã¡ã³ãã
- ã€ã³ããªã¢ãã¶ã€ã³åŸ®èª¿æŽã¢ãã«: ã¯ãCogVideoXãåºç€ã«ãã埮調æŽã¢ãã«ã§ãã€ã³ããªã¢ãã¶ã€ã³å°çšã«èšèšãããŠããŸãã
- xDiT: xDiTã¯ãè€æ°ã®GPUã¯ã©ã¹ã¿ãŒäžã§DiTsã䞊åæšè«ããããã®ãšã³ãžã³ã§ããxDiTã¯ãªã¢ã«ã¿ã€ã ã®ç»åããã³ãããªçæãµãŒãã¹ããµããŒãããŠããŸãã
ãããžã§ã¯ãæ§é
ãã®ãªãŒãã³ãœãŒã¹ãªããžããªã¯ãCogVideoX ãªãŒãã³ãœãŒã¹ã¢ãã«ã®åºæ¬çãªäœ¿çšæ¹æ³ãšåŸ®èª¿æŽã®äŸãè¿ éã«éå§ããããã®ã¬ã€ãã§ãã
Colabã§ã®ã¯ã€ãã¯ã¹ã¿ãŒã
ç¡æã®Colab T4äžã§çŽæ¥å®è¡ã§ãã3ã€ã®ãããžã§ã¯ããæäŸããŠããŸãã
- CogVideoX-5B-T2V-Colab.ipynb: CogVideoX-5B ããã¹ããããããªãžã®çæçšColabã³ãŒãã
- CogVideoX-5B-T2V-Int8-Colab.ipynb: CogVideoX-5B ããã¹ããããããªãžã®éååæšè«çšColabã³ãŒãã1åã®å®è¡ã«çŽ30åããããŸãã
- CogVideoX-5B-I2V-Colab.ipynb: CogVideoX-5B ç»åãããããªãžã®çæçšColabã³ãŒãã
- CogVideoX-5B-V2V-Colab.ipynb: CogVideoX-5B ãããªãããããªãžã®çæçšColabã³ãŒãã
Inference
- cli_demo: æšè«ã³ãŒãã®è©³çŽ°ãªèª¬æãå«ãŸããŠãããäžè¬çãªãã©ã¡ãŒã¿ã®æå³ã«ã€ããŠãèšåããŠããŸãã
- cli_demo_quantization: éååã¢ãã«æšè«ã³ãŒãã§ãäœã¡ã¢ãªã®ããã€ã¹ã§ãå®è¡å¯èœã§ãããŸãããã®ã³ãŒããå€æŽããŠãFP8 粟床㮠CogVideoX ã¢ãã«ã®å®è¡ããµããŒãããããšãã§ããŸãã
- diffusers_vae_demo: VAEæšè«ã³ãŒãã®å®è¡ã«ã¯çŸåš71GBã®ã¡ã¢ãªãå¿ èŠã§ãããå°æ¥çã«ã¯æé©åãããäºå®ã§ãã
- space demo: Huggingface SpaceãšåãGUIã³ãŒãã§ããã¬ãŒã è£éãè¶ è§£åããŒã«ãçµã¿èŸŒãŸããŠããŸãã
- convert_demo: ãŠãŒã¶ãŒå ¥åãCogVideoXã«é©ãã圢åŒã«å€æããæ¹æ³ãCogVideoXã¯é·ããã£ãã·ã§ã³ã§ãã¬ãŒãã³ã°ãããŠãããããå ¥åããã¹ããLLMã䜿çšããŠãã¬ãŒãã³ã°ååžãšäžèŽãããå¿ èŠããããŸããããã©ã«ãã§ã¯GLM-4ã䜿çšããŸãããGPTãGeminiãªã©ã®ä»ã®LLMã«çœ®ãæããããšãã§ããŸãã
- gradio_web_demo: CogVideoX-2B / 5B ã¢ãã«ã䜿çšããŠåç»ãçæããæ¹æ³ã瀺ããã·ã³ãã«ãª Gradio Web UI ãã¢ã§ããç§ãã¡ã® Huggingface Space ãšåæ§ã«ããã®ã¹ã¯ãªããã䜿çšã㊠Web ãã¢ãèµ·åããããšãã§ããŸãã
finetune
- train_cogvideox_lora: CogVideoX diffusers 埮調æŽæ¹æ³ã®è©³çŽ°ãªèª¬æãå«ãŸããŠããŸãããã®ã³ãŒãã䜿çšããŠãèªåã®ããŒã¿ã»ãã㧠CogVideoX ã埮調æŽããããšãã§ããŸãã
sat
- sat_demo: SATãŠã§ã€ãã®æšè«ã³ãŒããšåŸ®èª¿æŽã³ãŒããå«ãŸããŠããŸããCogVideoXã¢ãã«æ§é ã«åºã¥ããŠæ¹åããããšããå§ãããŸããé©æ°çãªç 究è ã¯ããã®ã³ãŒãã䜿çšããŠè¿ éãªã¹ã¿ããã³ã°ãšéçºãè¡ãããšãã§ããŸãã
ããŒã«
ãã®ãã©ã«ãã«ã¯ãã¢ãã«å€æ/ãã£ãã·ã§ã³çæãªã©ã®ããŒã«ãå«ãŸããŠããŸãã
- convert_weight_sat2hf: SAT ã¢ãã«ã®éã¿ã Huggingface ã¢ãã«ã®éã¿ã«å€æããŸãã
- caption_demo: Caption ããŒã«ããããªãç解ããŠããã¹ãã§åºåããã¢ãã«ã
- export_sat_lora_weight: SAT ãã¡ã€ã³ãã¥ãŒãã³ã°ã¢ãã«ã®ãšã¯ã¹ããŒãããŒã«ãSAT Lora Adapter ã diffusers 圢åŒã§ãšã¯ã¹ããŒãããŸãã
- load_cogvideox_lora: diffusers çã®ãã¡ã€ã³ãã¥ãŒãã³ã°ããã Lora Adapter ãããŒãããããã®ããŒã«ã³ãŒãã
- llm_flux_cogvideox: ãªãŒãã³ãœãŒã¹ã®ããŒã«ã«å€§èŠæš¡èšèªã¢ãã« + Flux + CogVideoX ã䜿çšããŠèªåçã«åç»ãçæããŸãã
- parallel_inference_xditïŒ xDiT ã«ãã£ãŠãµããŒãããããããªçæããã»ã¹ãè€æ°ã® GPU ã§äžŠååããŸãã
- cogvideox-factory: CogVideoXã®äœã³ã¹ã埮調æŽãã¬ãŒã ã¯ãŒã¯ã§ã
diffusers
ããŒãžã§ã³ã®ã¢ãã«ã«é©å¿ããŠããŸããããå€ãã®è§£å床ã«å¯Ÿå¿ããåäžã®4090 GPUã§CogVideoX-5Bã®åŸ®èª¿æŽãå¯èœã§ãã
CogVideo(ICLR'23)
è«æã®å ¬åŒãªããžããª: CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers 㯠CogVideo branch ã«ãããŸãã
CogVideoã¯æ¯èŒçé«ãã¬ãŒã ã¬ãŒãã®ãããªãçæããããšãã§ããŸãã 32ãã¬ãŒã ã®4ç§éã®ã¯ãªããã以äžã«ç€ºãããŠããŸãã
CogVideoã®ãã¢ã¯ https://models.aminer.cn/cogvideo ã§äœéšã§ããŸãã å ã®å ¥åã¯äžåœèªã§ãã
åŒçš
ð ç§ãã¡ã®ä»äºã圹ç«ã€ãšæãããå Žåããã²ã¹ã¿ãŒãä»ããŠããã ããè«æãåŒçšããŠãã ããã
@article{yang2024cogvideox,
title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer},
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
journal={arXiv preprint arXiv:2408.06072},
year={2024}
}
@article{hong2022cogvideo,
title={CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers},
author={Hong, Wenyi and Ding, Ming and Zheng, Wendi and Liu, Xinghan and Tang, Jie},
journal={arXiv preprint arXiv:2205.15868},
year={2022}
}
ããªãã®è²¢ç®ããåŸ ã¡ããŠããŸãïŒè©³çŽ°ã¯ãã¡ããã¯ãªãã¯ããŠãã ããã
ã©ã€ã»ã³ã¹å¥çŽ
ãã®ãªããžããªã®ã³ãŒã㯠Apache 2.0 License ã®äžã§å ¬éãããŠããŸãã
CogVideoX-2B ã¢ãã« (察å¿ããTransformersã¢ãžã¥ãŒã«ãVAEã¢ãžã¥ãŒã«ãå«ã) 㯠Apache 2.0 License ã®äžã§å ¬éãããŠããŸãã
CogVideoX-5B ã¢ãã«ïŒTransformers ã¢ãžã¥ãŒã«ãç»åçæãããªãšããã¹ãçæãããªã®ããŒãžã§ã³ãå«ãïŒ ã¯ CogVideoX LICENSE ã®äžã§å ¬éãããŠããŸãã