From 8e8275d2e894e38214e94690c74a0a10411d2adb Mon Sep 17 00:00:00 2001 From: zR <2448370773@qq.com> Date: Tue, 6 Aug 2024 17:15:44 +0800 Subject: [PATCH] update cogvideo and paper citation --- README.md | 54 ++++++++++++++++++++++++++++++++++++++-------------- README_zh.md | 42 +++++++++++++++++++++++++++++++++------- 2 files changed, 75 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 27b0187..e84a1a6 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# CogVideoX +# CogVideo && CogVideoX [中文阅读](./README_zh.md) @@ -15,7 +15,7 @@ 👋 Join our WeChat and Discord
-📍 Visit 清影 and API Platform to experience larger-scale commercial video generation models. +📍 Visit 清影 and API Platform to experience larger-scale commercial video generation models.
## Update and News @@ -24,7 +24,7 @@ the video almost losslessly. - 🔥 **News**: ``2024/8/6``: We have open-sourced **CogVideoX-2B**,the first model in the CogVideoX series of video generation models. -- 🌱 **Source**: ```2022/5/19```: We have open-sourced CogVideo (now you can see in `CogVideo` branch),a Transformer based text-to-video model, and you can check [ICLR'23 CogVideo Paper](https://arxiv.org/abs/2205.15868) for technical details. +- 🌱 **Source**: ```2022/5/19```: We have open-sourced CogVideo (now you can see in `CogVideo` branch),the **first** open-sourced pretrained text-to-video model, and you can check [ICLR'23 CogVideo Paper](https://arxiv.org/abs/2205.15868) for technical details. **More powerful models with larger parameter sizes are on the way~ Stay tuned!** @@ -53,7 +53,7 @@ ## Model Introduction CogVideoX is an open-source version of the video generation model, which is homologous -to [清影](https://chatglm.cn/video). +to [清影](https://chatglm.cn/video?fr=osm_cogvideox). The table below shows the list of video generation models we currently provide, along with related basic information: @@ -79,13 +79,10 @@ of the **CogVideoX** open-source model. ### Inference -+ [cli_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of - common parameters. -+ [cli_vae_demo](inference/cli_vae_demo.py): Executing the VAE inference code alone currently requires 71GB of memory, - but it will be optimized in the future. -+ [convert_demo](inference/converter_demo.py): How to convert user input into a format suitable for CogVideoX. -+ [web_demo](inference/web_demo.py): A simple streamlit web application demonstrating how to use the CogVideoX-2B model - to generate videos. ++ [cli_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters. ++ [cli_vae_demo](inference/cli_vae_demo.py): Executing the VAE inference code alone currently requires 71GB of memory, but it will be optimized in the future. ++ [convert_demo](inference/convert_demo.py): How to convert user input into a format suitable for CogVideoX. Because CogVideoX is trained on long caption, we need to convert the input text to be consistent with the training distribution using a LLM. By default, the script uses GLM4, but it can also be replaced with any other LLM such as GPT, Gemini, etc. ++ [web_demo](inference/web_demo.py): A simple streamlit web application demonstrating how to use the CogVideoX-2B model to generate videos.-📍 前往 清影 和 API平台 体验更大规模的商业版视频生成模型。 +📍 前往 清影 和 API平台 体验更大规模的商业版视频生成模型。
## 项目更新 - 🔥 **News**: ``2024/8/6``: 我们开源 **3D Causal VAE**,用于 **CogVideoX-2B**,可以几乎无损地重构视频。 - 🔥 **News**: ``2024/8/6``: 我们开源 CogVideoX 系列视频生成模型的第一个模型, **CogVideoX-2B**。 -- 🌱 **Source**: ```2022/5/19```: 我们开源了 CogVideo 视频生成模型(现在你可以在 `CogVideo` 分支中看到),这是一个基于 Transformer 的文本生成视频模型,您可以访问 [ICLR'23 论文](https://arxiv.org/abs/2205.15868) 查看技术细节。 +- 🌱 **Source**: ```2022/5/19```: 我们开源了 CogVideo 视频生成模型(现在你可以在 `CogVideo` 分支中看到),这是首个开源的基于 Transformer 的大型文本生成视频模型,您可以访问 [ICLR'23 论文](https://arxiv.org/abs/2205.15868) 查看技术细节。 **性能更强,参数量更大的模型正在到来的路上~,欢迎关注** ## CogVideoX-2B 视频作品 @@ -50,7 +50,7 @@ ## 模型介绍 -CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生成模型。 +CogVideoX是 [清影](https://chatglm.cn/video?fr=osm_cogvideox) 同源的开源版本视频生成模型。 下表战展示目前我们提供的视频生成模型列表,以及相关基础信息: @@ -76,7 +76,7 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生 + [cli_demo](inference/cli_demo.py): 更详细的推理代码讲解,常见参数的意义,在这里都会提及。 + [cli_vae_demo](inference/cli_vae_demo.py): 单独执行VAE的推理代码,目前需要71GB显存,将来会优化。 -+ [convert_demo](inference/converter_demo.py): 如何将用户的输入转换成适合 CogVideoX的长输入。 ++ [convert_demo](inference/convert_demo.py): 如何将用户的输入转换成适合 CogVideoX的长输入。因为CogVideoX是在长文本上训练的,所以我们需要把输入文本的分布通过LLM转换为和训练一致的长文本。脚本中默认使用GLM4,也可以替换为GPT、Gemini等任意大语言模型。 + [web_demo](inference/web_demo.py): 一个简单的streamlit网页应用,展示如何使用 CogVideoX-2B 模型生成视频。