upload venhancer

2025-06-18 22:49:16 +08:00 · 2024-08-20 15:43:49 +08:00 · 2024-08-20 15:43:49 +08:00 · 17e6ed8685
commit 17e6ed8685
parent a490c3c895
6 changed files with 303 additions and 6 deletions
--- a/README.md
+++ b/README.md
@ -22,7 +22,10 @@
 ## Update and News
- 🔥🔥 **News**: ```2024/8/15```: The `SwissArmyTransformer` dependency in CogVideoX has been upgraded to `0.4.12`.
+- 🔥🔥 **News**: ```2024/8/20```: [VEnhancer](https://github.com/Vchitect/VEnhancer) now supports enhancing videos generated by
  CogVideoX, achieving higher resolution and higher quality video rendering. We welcome you to try it out by following
  the [tutorial](tools/venhancer/README_zh.md).
 - 🔥 **News**: ```2024/8/15```: The `SwissArmyTransformer` dependency in CogVideoX has been upgraded to `0.4.12`.
  Fine-tuning
  no longer requires installing `SwissArmyTransformer` from source. Additionally, the `Tied VAE` technique has been
  applied in the implementation within the `diffusers` library. Please install `diffusers` and `accelerate` libraries
@ -34,8 +37,7 @@
  performed
  on a single 3090 GPU. For more details, please refer to the [code](inference/cli_demo.py).
 - 🔥 **News**: ```2024/8/6```: We have also open-sourced **3D Causal VAE** used in **CogVideoX-2B**, which can
-  reconstruct
+  reconstruct the video almost losslessly.
  the video almost losslessly.
 - 🔥 **News**: ```2024/8/6```: We have open-sourced **CogVideoX-2B**，the first model in the CogVideoX series of video
  generation models.
 - 🌱 **Source**: ```2022/5/19```: We have open-sourced **CogVideo** (now you can see in `CogVideo` branch)，the **first**
--- a/README_ja.md
+++ b/README_ja.md
@ -22,7 +22,10 @@
 ## 更新とニュース
- 🔥🔥 **ニュース**: 2024/8/15: CogVideoX の依存関係である`SwissArmyTransformer`の依存が`0.4.12`
+- 🔥🔥 **ニュース**: ```2024/8/20```: [VEnhancer](https://github.com/Vchitect/VEnhancer) は CogVideoX
  が生成したビデオの強化をサポートしました。より高い解像度とより高品質なビデオレンダリングを実現します。[チュートリアル](tools/venhancer/README_ja.md)
  に従って、ぜひお試しください。
 - 🔥**ニュース**: 2024/8/15: CogVideoX の依存関係である`SwissArmyTransformer`の依存が`0.4.12`
  にアップグレードされました。これにより、微調整の際に`SwissArmyTransformer`
  をソースコードからインストールする必要がなくなりました。同時に、`Tied VAE` 技術が `diffusers`
  ライブラリの実装に適用されました。`diffusers` と `accelerate` ライブラリをソースコードからインストールしてください。CogVdideoX
--- a/README_zh.md
+++ b/README_zh.md
@ -23,9 +23,12 @@
 ## 项目更新
- 🔥🔥 **News**: ```2024/8/15```: CogVideoX 依赖中`SwissArmyTransformer`依赖升级到`0.4.12`,
+- 🔥🔥**News**: ```2024/8/20```: [VEnhancer](https://github.com/Vchitect/VEnhancer) 已经支持对 CogVideoX
  生成的视频进行增强，实现更高分辨率，更高质量的视频渲染。欢迎大家按照[教程](tools/venhancer/README_zh.md)体验使用。
 - 🔥**News**: ```2024/8/15```: CogVideoX 依赖中`SwissArmyTransformer`依赖升级到`0.4.12`,
  微调不再需要从源代码安装`SwissArmyTransformer`。同时，`Tied VAE` 技术已经被应用到 `diffusers`
-  库中的实现，请从源代码安装 `diffusers` 和 `accelerate` 库，推理 CogVdideoX 仅需 12GB显存。推理代码需要修改，请查看 [cli_demo](inference/cli_demo.py)
+  库中的实现，请从源代码安装 `diffusers` 和 `accelerate` 库，推理 CogVdideoX 仅需
  12GB显存。推理代码需要修改，请查看 [cli_demo](inference/cli_demo.py)
 - 🔥 **News**: ```2024/8/12```: CogVideoX 论文已上传到arxiv，欢迎查看[论文](https://arxiv.org/abs/2408.06072)。
 - 🔥 **News**: ```2024/8/7```: CogVideoX 已经合并入 `diffusers`
  0.30.0版本，单张3090可以推理，详情请见[代码](inference/cli_demo.py)。
--- a/tools/venhancer/README.md
+++ b/tools/venhancer/README.md
@ -0,0 +1,98 @@
 # Enhance CogVideoX Generated Videos with VEnhancer
 This tutorial will guide you through using the VEnhancer tool to enhance videos generated by CogVideoX, including
 achieving higher frame rates and higher resolutions.
 ## Model Introduction
 VEnhancer implements spatial super-resolution, temporal super-resolution (frame interpolation), and video refinement in
 a unified framework. It can flexibly adapt to different upsampling factors (e.g., 1x~8x) for spatial or temporal
 super-resolution. Additionally, it provides flexible control to modify the refinement strength, enabling it to handle
 diverse video artifacts.
 VEnhancer follows the design of ControlNet, copying the architecture and weights of the multi-frame encoder and middle
 block from a pre-trained video diffusion model to build a trainable conditional network. This video ControlNet accepts
 low-resolution keyframes and noisy full-frame latents as inputs. In addition to the time step t and prompt, our proposed
 video-aware conditioning also includes noise augmentation level σ and downscaling factor s as additional network
 conditioning inputs.
 ## Hardware Requirements
 + Operating System: Linux (requires xformers dependency)
 + Hardware: NVIDIA GPU with at least 60GB of VRAM per card. Machines such as H100, A100 are recommended.
 ## Quick Start
 1. Clone the repository and install dependencies as per the official instructions:
 ```shell
 git clone https://github.com/Vchitect/VEnhancer.git
 cd VEnhancer
 ## Torch and other dependencies can use those from CogVideoX. If you need to create a new environment, use the following commands:
 pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
 ## Install required dependencies
 pip install -r requirements.txt
 ```
 Where:
 - `input_path` is the path to the input video
 - `prompt` is the description of the video content. The prompt used by this tool should be shorter, not exceeding 77
  words. You may need to simplify the prompt used for generating the CogVideoX video.
 - `up_scale` is the upsampling factor, which can be set to 2, 4, or 8
 - `target_fps` is the target frame rate for the video. Typically, 16 fps is already smooth, with 24 fps as the default
  value.
 - `noise_aug` controls the strength of noise augmentation, typically set to 250
 - `steps` indicates the number of optimization steps, usually set to 15. If you want faster model generation, you can
  reduce this number, but the quality will significantly decrease.
 The code will automatically download the required models from Hugging Face during execution.
 Typical runtime logs are as follows:
 ```shell
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
 2024-08-20 13:25:17,553 - video_to_video - INFO - checkpoint_path: ./ckpts/venhancer_paper.pt
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/open_clip/factory.py:88: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=map_location)
 2024-08-20 13:25:37,486 - video_to_video - INFO - Build encoder with FrozenOpenCLIPEmbedder
 /share/home/zyx/Code/VEnhancer/video_to_video/video_to_video_model.py:35: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  load_dict = torch.load(cfg.model_path, map_location='cpu')
 2024-08-20 13:25:55,391 - video_to_video - INFO - Load model path ./ckpts/venhancer_paper.pt, with local status <All keys matched successfully>
 2024-08-20 13:25:55,392 - video_to_video - INFO - Build diffusion with GaussianDiffusion
 2024-08-20 13:26:16,092 - video_to_video - INFO - input video path: inputs/000000.mp4
 2024-08-20 13:26:16,093 - video_to_video - INFO - text: Wide-angle aerial shot at dawn,soft morning light casting long shadows,an elderly man walking his dog through a quiet,foggy park,trees and benches in the background,peaceful and serene atmosphere
 2024-08-20 13:26:16,156 - video_to_video - INFO - input frames length: 49
 2024-08-20 13:26:16,156 - video_to_video - INFO - input fps: 8.0
 2024-08-20 13:26:16,156 - video_to_video - INFO - target_fps: 24.0
 2024-08-20 13:26:16,311 - video_to_video - INFO - input resolution: (480, 720)
 2024-08-20 13:26:16,312 - video_to_video - INFO - target resolution: (1320, 1982)
 2024-08-20 13:26:16,312 - video_to_video - INFO - noise augmentation: 250
 2024-08-20 13:26:16,312 - video_to_video - INFO - scale s is set to: 8
 2024-08-20 13:26:16,399 - video_to_video - INFO - video_data shape: torch.Size([145, 3, 1320, 1982])
 /share/home/zyx/Code/VEnhancer/video_to_video/video_to_video_model.py:108: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=True):
 2024-08-20 13:27:19,605 - video_to_video - INFO - step: 0
 2024-08-20 13:30:12,020 - video_to_video - INFO - step: 1
 2024-08-20 13:33:04,956 - video_to_video - INFO - step: 2
 2024-08-20 13:35:58,691 - video_to_video - INFO - step: 3
 2024-08-20 13:38:51,254 - video_to_video - INFO - step: 4
 2024-08-20 13:41:44,150 - video_to_video - INFO - step: 5
 2024-08-20 13:44:37,017 - video_to_video - INFO - step: 6
 2024-08-20 13:47:30,037 - video_to_video - INFO - step: 7
 2024-08-20 13:50:22,838 - video_to_video - INFO - step: 8
 2024-08-20 13:53:15,844 - video_to_video - INFO - step: 9
 2024-08-20 13:56:08,657 - video_to_video - INFO - step: 10
 2024-08-20 13:59:01,648 - video_to_video - INFO - step: 11
 2024-08-20 14:01:54,541 - video_to_video - INFO - step: 12
 2024-08-20 14:04:47,488 - video_to_video - INFO - step: 13
 2024-08-20 14:10:13,637 - video_to_video - INFO - sampling, finished.
 ```
 Running on a single A100 GPU, enhancing each 6-second CogVideoX generated video with default settings will consume 60GB
 of VRAM and take 40-50 minutes.
--- a/tools/venhancer/README_ja.md
+++ b/tools/venhancer/README_ja.md
@ -0,0 +1,91 @@
 # VEnhancer で CogVideoX によって生成されたビデオを強化する
 このチュートリアルでは、VEnhancer ツールを使用して、CogVideoX で生成されたビデオを強化し、より高いフレームレートと高い解像度を実現する方法を説明します。
 ## モデルの紹介
 VEnhancer は、空間超解像、時間超解像（フレーム補間）、およびビデオのリファインメントを統一されたフレームワークで実現します。空間または時間の超解像のために、さまざまなアップサンプリング係数（例：1x〜8x）に柔軟に対応できます。さらに、多様なビデオアーティファクトを処理するために、リファインメント強度を変更する柔軟な制御を提供します。
 VEnhancer は ControlNet の設計に従い、事前訓練されたビデオ拡散モデルのマルチフレームエンコーダーとミドルブロックのアーキテクチャとウェイトをコピーして、トレーニング可能な条件ネットワークを構築します。このビデオ ControlNet は、低解像度のキーフレームとノイズを含む完全なフレームを入力として受け取ります。さらに、タイムステップ t とプロンプトに加えて、提案されたビデオ対応条件により、ノイズ増幅レベル σ およびダウンスケーリングファクター s が追加のネットワーク条件として使用されます。
 ## ハードウェア要件
 + オペレーティングシステム: Linux (xformers 依存関係が必要)
 + ハードウェア: 単一カードあたり少なくとも 60GB の VRAM を持つ NVIDIA GPU。H100、A100 などのマシンを推奨します。
 ## クイックスタート
 1. 公式の指示に従ってリポジトリをクローンし、依存関係をインストールします。
 ```shell
 git clone https://github.com/Vchitect/VEnhancer.git
 cd VEnhancer
 ## Torch などの依存関係は CogVideoX の依存関係を使用できます。新しい環境を作成する必要がある場合は、以下のコマンドを使用してください。
 pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
 ## 必須の依存関係をインストールします。
 pip install -r requirements.txt
 ```
 2. コードを実行します。
 ```shell
 python enhance_a_video.py --up_scale 4 --target_fps 24 --noise_aug 250 --solver_mode 'fast' --steps 15 --input_path inputs/000000.mp4 --prompt 'Wide-angle aerial shot at dawn, soft morning light casting long shadows, an elderly man walking his dog through a quiet, foggy park, trees and benches in the background, peaceful and serene atmosphere' --save_dir 'results/'
 ```
 次の設定を行います：
 - `input_path` は入力ビデオのパスです。
 - `prompt` はビデオの内容を説明するプロンプトです。このツールで使用されるプロンプトは短く、77 単語を超えないようにする必要があります。CogVideoX の生成ビデオのプロンプトを適宜簡略化することをお勧めします。
 - `up_scale` はアップサンプリング係数で、2、4、8 に設定できます。
 - `target_fps` はビデオの目標フレームレートです。通常、16 fps であれば十分にスムーズですが、デフォルト値は 24 fps です。
 - `noise_aug` はノイズ増幅の強度を制御し、通常は 250 に設定します。
 - `steps` は最適化ステップ数を示します。通常 15 に設定されますが、より速いモデル生成を望む場合はこの値を減らすことができますが、品質が大幅に低下します。
 コードの実行中に、必要なモデルは Hugging Face から自動的にダウンロードされます。
 ```shell
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
 2024-08-20 13:25:17,553 - video_to_video - INFO - checkpoint_path: ./ckpts/venhancer_paper.pt
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/open_clip/factory.py:88: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=map_location)
 2024-08-20 13:25:37,486 - video_to_video - INFO - Build encoder with FrozenOpenCLIPEmbedder
 /share/home/zyx/Code/VEnhancer/video_to_video/video_to_video_model.py:35: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  load_dict = torch.load(cfg.model_path, map_location='cpu')
 2024-08-20 13:25:55,391 - video_to_video - INFO - Load model path ./ckpts/venhancer_paper.pt, with local status <All keys matched successfully>
 2024-08-20 13:25:55,392 - video_to_video - INFO - Build diffusion with GaussianDiffusion
 2024-08-20 13:26:16,092 - video_to_video - INFO - input video path: inputs/000000.mp4
 2024-08-20 13:26:16,093 - video_to_video - INFO - text: Wide-angle aerial shot at dawn,soft morning light casting long shadows,an elderly man walking his dog through a quiet,foggy park,trees and benches in the background,peaceful and serene atmosphere
 2024-08-20 13:26:16,156 - video_to_video - INFO - input frames length: 49
 2024-08-20 13:26:16,156 - video_to_video - INFO - input fps: 8.0
 2024-08-20 13:26:16,156 - video_to_video - INFO - target_fps: 24.0
 2024-08-20 13:26:16,311 - video_to_video - INFO - input resolution: (480, 720)
 2024-08-20 13:26:16,312 - video_to_video - INFO - target resolution: (1320, 1982)
 2024-08-20 13:26:16,312 - video_to_video - INFO - noise augmentation: 250
 2024-08-20 13:26:16,312 - video_to_video - INFO - scale s is set to: 8
 2024-08-20 13:26:16,399 - video_to_video - INFO - video_data shape: torch.Size([145, 3, 1320, 1982])
 /share/home/zyx/Code/VEnhancer/video_to_video/video_to_video_model.py:108: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=True):
 2024-08-20 13:27:19,605 - video_to_video - INFO - step: 0
 2024-08-20 13:30:12,020 - video_to_video - INFO - step: 1
 2024-08-20 13:33:04,956 - video_to_video - INFO - step: 2
 2024-08-20 13:35:58,691 - video_to_video - INFO - step: 3
 2024-08-20 13:38:51,254 - video_to_video - INFO - step: 4
 2024-08-20 13:41:44,150 - video_to_video - INFO - step: 5
 2024-08-20 13:44:37,017 - video_to_video - INFO - step: 6
 2024-08-20 13:47:30,037 - video_to_video - INFO - step: 7
 2024-08-20 13:50:22,838 - video_to_video - INFO - step: 8
 2024-08-20 13:53:15,844 - video_to_video - INFO - step: 9
 2024-08-20 13:56:08,657 - video_to_video - INFO - step: 10
 2024-08-20 13:59:01,648 - video_to_video - INFO - step: 11
 2024-08-20 14:01:54,541 - video_to_video - INFO - step: 12
 2024-08-20 14:04:47,488 - video_to_video - INFO - step: 13
 2024-08-20 14:10:13,637 - video_to_video - INFO - sampling, finished.
 ```
 A100 GPU を単一で使用している場合、CogVideoX によって生成された 6 秒間のビデオを強化するには、デフォルト設定で 60GB の VRAM を消費し、40〜50 分かかります。
--- a/tools/venhancer/README_zh.md
+++ b/tools/venhancer/README_zh.md
@ -0,0 +1,100 @@
 # 使用 VEnhancer 对 CogVdieoX 生成视频进行增强
 本教程将要使用 VEnhancer 工具 对 CogVdieoX 生成视频进行增强, 包括更高的帧率和更高的分辨率
 ## 模型介绍
 VEnhancer 在一个统一的框架中实现了空间超分辨率、时间超分辨率（帧插值）和视频优化。它可以灵活地适应不同的上采样因子（例如，1x~
 8x）用于空间或时间超分辨率。此外，它提供了灵活的控制，以修改优化强度，从而处理多样化的视频伪影。
 VEnhancer 遵循 ControlNet 的设计，复制了预训练的视频扩散模型的多帧编码器和中间块的架构和权重，构建了一个可训练的条件网络。这个视频
 ControlNet 接受低分辨率关键帧和包含噪声的完整帧作为输入。此外，除了时间步 t 和提示词外，我们提出的视频感知条件还将噪声增强的噪声级别
 σ 和降尺度因子 s 作为附加的网络条件输入。
 ## 硬件需求
 + 操作系统: Linux (需要依赖xformers)
 + 硬件: NVIDIA GPU 并至少保证单卡显存超过60G，推荐使用 H100，A100等机器。
 ## 快速上手
 1. 按照官方指引克隆仓库并安装依赖
 ```shell
 git clone https://github.com/Vchitect/VEnhancer.git
 cd VEnhancer
 ## torch等依赖可以使用CogVideoX的依赖，如果你需要创建一个新的环境，可以使用以下命令
 pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
 ## 安装必须的依赖
 pip install -r requirements.txt
 ```
 2. 运行代码
 ```shell
 python enhance_a_video.py \
 --up_scale 4 --target_fps 24 --noise_aug 250 \
 --solver_mode 'fast' --steps 15 \
 --input_path inputs/000000.mp4 \
 --prompt 'Wide-angle aerial shot at dawn,soft morning light casting long shadows,an elderly man walking his dog through a quiet,foggy park,trees and benches in the background,peaceful and serene atmosphere' \
 --save_dir 'results/' 
 ```
 其中:
 - `input_path` 是输入视频的路径
 - `prompt` 是描述视频内容的提示词，本工具使用的提示词更短，不能超过77个单词，您可以适当简化 CogVideoX 生成视频的提示词。
 - `up_scale` 是上采样因子，可以设置为 2, 4, 8
 - `target_fps` 是目标视频的帧率，通常来说，16帧就已经流畅，24帧是默认值
 - `noise_aug` 是噪声增强的强度，通常设置为250
 - `step` 是优化步数，通常设置为15，如果你想更快的生成模型，可以调低，但是质量会大幅下降。
 代码运行过程中，会自动从Huggingface拉取需要的模型
 运行日志通常如下:
 ```shell
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
 2024-08-20 13:25:17,553 - video_to_video - INFO - checkpoint_path: ./ckpts/venhancer_paper.pt
 /share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/open_clip/factory.py:88: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=map_location)
 2024-08-20 13:25:37,486 - video_to_video - INFO - Build encoder with FrozenOpenCLIPEmbedder
 /share/home/zyx/Code/VEnhancer/video_to_video/video_to_video_model.py:35: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  load_dict = torch.load(cfg.model_path, map_location='cpu')
 2024-08-20 13:25:55,391 - video_to_video - INFO - Load model path ./ckpts/venhancer_paper.pt, with local status <All keys matched successfully>
 2024-08-20 13:25:55,392 - video_to_video - INFO - Build diffusion with GaussianDiffusion
 2024-08-20 13:26:16,092 - video_to_video - INFO - input video path: inputs/000000.mp4
 2024-08-20 13:26:16,093 - video_to_video - INFO - text: Wide-angle aerial shot at dawn,soft morning light casting long shadows,an elderly man walking his dog through a quiet,foggy park,trees and benches in the background,peaceful and serene atmosphere
 2024-08-20 13:26:16,156 - video_to_video - INFO - input frames length: 49
 2024-08-20 13:26:16,156 - video_to_video - INFO - input fps: 8.0
 2024-08-20 13:26:16,156 - video_to_video - INFO - target_fps: 24.0
 2024-08-20 13:26:16,311 - video_to_video - INFO - input resolution: (480, 720)
 2024-08-20 13:26:16,312 - video_to_video - INFO - target resolution: (1320, 1982)
 2024-08-20 13:26:16,312 - video_to_video - INFO - noise augmentation: 250
 2024-08-20 13:26:16,312 - video_to_video - INFO - scale s is set to: 8
 2024-08-20 13:26:16,399 - video_to_video - INFO - video_data shape: torch.Size([145, 3, 1320, 1982])
 /share/home/zyx/Code/VEnhancer/video_to_video/video_to_video_model.py:108: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=True):
 2024-08-20 13:27:19,605 - video_to_video - INFO - step: 0
 2024-08-20 13:30:12,020 - video_to_video - INFO - step: 1
 2024-08-20 13:33:04,956 - video_to_video - INFO - step: 2
 2024-08-20 13:35:58,691 - video_to_video - INFO - step: 3
 2024-08-20 13:38:51,254 - video_to_video - INFO - step: 4
 2024-08-20 13:41:44,150 - video_to_video - INFO - step: 5
 2024-08-20 13:44:37,017 - video_to_video - INFO - step: 6
 2024-08-20 13:47:30,037 - video_to_video - INFO - step: 7
 2024-08-20 13:50:22,838 - video_to_video - INFO - step: 8
 2024-08-20 13:53:15,844 - video_to_video - INFO - step: 9
 2024-08-20 13:56:08,657 - video_to_video - INFO - step: 10
 2024-08-20 13:59:01,648 - video_to_video - INFO - step: 11
 2024-08-20 14:01:54,541 - video_to_video - INFO - step: 12
 2024-08-20 14:04:47,488 - video_to_video - INFO - step: 13
 2024-08-20 14:10:13,637 - video_to_video - INFO - sampling, finished.
 ```
 使用A100单卡运行，对于每个CogVideoX生产的6秒视频，按照默认配置，会消耗60G显存，并用时40-50分钟。