415 Commits

Author SHA1 Message Date
Yuxuan Zhang
aa12ed37f5
Merge branch 'main' into moviepy-v2 2025-01-20 21:46:07 +08:00
Yuxuan Zhang
c1ca70ba67
Merge pull request #654 from THUDM/CogVideoX_dev
Support SFT using ZeRO
2025-01-20 11:15:50 +08:00
OleehyO
bf73742c05 docs: enhance CLI demo documentation 2025-01-16 09:34:52 +00:00
OleehyO
bf9c351a10 deps: upgrade diffusers to >=0.32.1 2025-01-16 09:08:44 +00:00
OleehyO
0e78f20629 Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev 2025-01-14 04:00:11 +00:00
Yuxuan Zhang
4615479b51 move to tools 2025-01-14 11:33:02 +08:00
Yuxuan Zhang
7993670957 zero_to_bf16 2025-01-14 11:31:25 +08:00
OleehyO
4878edd0cf fix: correct do_validation argument parsing 2025-01-13 12:48:21 +00:00
Yuxuan Zhang
78275b0480 add comment of bash scripts 2025-01-13 20:02:06 +08:00
OleehyO
455b44a7b5 chore: code cleanup and parameter optimization
- Remove redundant comments and debug information
- Adjust default parameters in training scripts
- Clean up code in lora_trainer and trainer implementations
2025-01-13 11:56:28 +00:00
OleehyO
954ba28d3c Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev 2025-01-13 11:48:24 +00:00
OleehyO
4f1cc66815 fix: correct LoRA loading and resolution dimensions
- Fix LoRA loading by specifying 'transformer' component
- Swap width/height order in RESOLUTION_MAP to match actual usage
2025-01-13 10:49:46 +00:00
zR
1534bf33eb add pipeline 2025-01-12 19:27:21 +08:00
OleehyO
86a0226f80 Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev 2025-01-12 08:52:07 +00:00
OleehyO
70c899f444 chore: update default training configurations 2025-01-12 08:50:15 +00:00
OleehyO
b362663679 fix: normalize image tensors in I2VDataset 2025-01-12 06:01:48 +00:00
OleehyO
30ba1085ff Merge remote-tracking branch 'upstream/main' into dev 2025-01-12 05:58:07 +00:00
OleehyO
3252614569 Add pydantic dependency 2025-01-12 05:56:24 +00:00
OleehyO
f66f1647e2
Merge pull request #657 from ZGCTroy/main
fix bug of i2v finetune
2025-01-12 13:55:12 +08:00
OleehyO
f5169385bd docs: add SFT support documentation in multilingual README 2025-01-12 05:53:13 +00:00
OleehyO
795dd144a4 Rename lora training scripts as ddp 2025-01-12 05:36:32 +00:00
OleehyO
fdb9820949 feat: support DeepSpeed ZeRO-3 and optimize peak memory usage
- Add DeepSpeed ZeRO-3 configuration support
- Optimize memory usage during training
- Rename training scripts to reflect ZeRO usage
- Update related configuration files and trainers
2025-01-12 05:33:56 +00:00
Zheng Guang Cong
09a49d3546
fix bug of i2v; video is already 0-255
video is already 0-255 and should not be multiplied 255 any more
2025-01-11 17:29:27 +08:00
Zheng Guang Cong
cd861bbe1e
Update i2v_dataset.py
image should also be transformed to [-1, 1]
2025-01-11 17:24:35 +08:00
Zheng Guang Cong
35383e2db3
fix potential bug of i2v
Image value is in [0, 255] and should be transformed into [-1, 1], similar to video.
2025-01-11 17:08:25 +08:00
zR
7dc8516bcb add comment as #653 2025-01-11 12:53:32 +08:00
OleehyO
2f275e82b5 Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev 2025-01-11 02:16:09 +00:00
OleehyO
caa24bdc36 feat: add SFT support with ZeRO optimization strategies
- Add SFT (Supervised Fine-Tuning) trainers for all model variants:
  - CogVideoX I2V and T2V
  - CogVideoX-1.5 I2V and T2V
- Add DeepSpeed ZeRO configuration files:
  - ZeRO-2 with and without CPU offload
  - ZeRO-3 with and without CPU offload
- Add base accelerate config for distributed training
- Update trainer.py to support SFT training mode

This enables full-parameter fine-tuning with memory-efficient distributed training using DeepSpeed ZeRO optimization.
2025-01-11 02:13:32 +00:00
OleehyO
e213b6c083 fix: pad latent frames to match patch_size_t requirements 2025-01-11 02:08:07 +00:00
Erfan Asgari
70ca65300c
upgrade to moviepy v2 2025-01-11 00:18:24 +03:30
OleehyO
f6d722cec7 fix: remove copying first video frame as conditioning image 2025-01-09 15:52:51 +00:00
OleehyO
07766001f6 feat(dataset): pad short videos by repeating last frame
When loading videos with fewer frames than max_num_frames, repeat the last
frame to reach the required length instead of failing. This ensures consistent
tensor dimensions across the dataset while preserving as much original video
content as possible.
2025-01-08 02:14:56 +00:00
Yuxuan Zhang
8f1829f1cd
Merge pull request #642 from THUDM/CogVideoX_dev
New Lora 20250108
2025-01-08 09:51:39 +08:00
zR
045e1b308b readme 2025-01-08 09:50:08 +08:00
OleehyO
249fadfb76 docs: add hardware requirements for model training
Add a table in README files showing hardware requirements for training
different CogVideoX models, including:
- Memory requirements for each model variant
- Supported training types (LoRA)
- Training resolutions
- Mixed precision settings

Updated in all language versions (EN/ZH/JA).
2025-01-08 01:39:37 +00:00
OleehyO
10de04fc08 perf: cast VAE and text encoder to target dtype before precomputing cache
Before precomputing the latent cache and text embeddings, cast the VAE and
text encoder to the target training dtype (fp16/bf16) instead of keeping them
in fp32. This reduces memory usage during the precomputation phase.

The change occurs in prepare_dataset() where the models are moved to device
and cast to weight_dtype before being used to generate the cache.
2025-01-08 01:38:13 +00:00
OleehyO
0e21d41b12 Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev 2025-01-07 09:51:48 +00:00
OleehyO
392e37021a Add video path to error message for better debugging 2025-01-07 09:50:21 +00:00
zR
11935892ae remove --image_column 2025-01-07 16:37:11 +08:00
OleehyO
ee1f666206 docs: update READMEs with auto first-frame extraction feature 2025-01-07 06:45:10 +00:00
OleehyO
e084a4a270 feat: auto-extract first frames as conditioning images for i2v model
When training i2v models without specifying image_column, automatically extract
and use first frames from training videos as conditioning images. This includes:

- Add load_images_from_videos() utility function to extract and cache first frames
- Update BaseI2VDataset to support auto-extraction when image_column is None
- Add validation and warning message in Args schema for i2v without image_column

The first frames are extracted once and cached to avoid repeated video loading.
2025-01-07 06:43:26 +00:00
OleehyO
96e511b413 feat: add warning for fp16 mixed precision training 2025-01-07 06:00:38 +00:00
OleehyO
36427274d6 style: format import statements across finetune module 2025-01-07 05:54:52 +00:00
zR
1789f07256 format and check fp16 for cogvideox2b 2025-01-07 13:16:18 +08:00
OleehyO
1b886326b2 Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev 2025-01-06 10:47:56 +00:00
OleehyO
9157e0cbc8 Adapt dataset for text embeddings and add noise padding
- Add text embedding support in dataset collation
- Pad 2 random noise frames at the beginning of latent space during training
2025-01-06 10:44:58 +00:00
OleehyO
49dc370de6 fix: remove pipeline hooks after validation
- Add pipe.remove_all_hooks() after validation to prevent memory leaks
- Clean up validation pipeline properly to avoid potential issues in subsequent training steps
2025-01-04 06:21:17 +00:00
OleehyO
93b906b3fb docs: clarify train_frames includes padding frame
Add docstring to train_frames field in State schema to explicitly indicate
that it includes one image padding frame
2025-01-04 06:20:25 +00:00
OleehyO
7e1ac76847 feat(cogvideox): add prompt embedding caching support
This change enables caching of prompt embeddings in the CogVideoX text-to-video
LoRA trainer, which can improve training efficiency by avoiding redundant text
encoding operations.
2025-01-04 06:17:56 +00:00
OleehyO
66e4ba2592 fix(cogvideox): add prompt embedding caching and fix frame padding
- Add support for cached prompt embeddings in dataset
- Fix bug where first frame wasn't properly padded in latent space
2025-01-04 06:16:42 +00:00