338 Commits

Author SHA1 Message Date
OleehyO
04a60e7435 Change logger name to trainer 2025-01-01 15:10:55 +00:00
OleehyO
a001842834 feat: implement CogVideoX trainers for I2V and T2V tasks
Add and refactor trainers for CogVideoX model variants:
- Implement CogVideoXT2VLoraTrainer for text-to-video generation
- Refactor CogVideoXI2VLoraTrainer for image-to-video generation

Both trainers support LoRA fine-tuning with proper handling of:
- Model components loading and initialization
- Video encoding and batch collation
- Loss computation with noise prediction
- Validation step for generation
2025-01-01 15:10:54 +00:00
OleehyO
91d79fd9a4 feat: add schemas module for configuration and state management
Add Pydantic models to handle:
- CLI arguments and configuration (Args)
- Model components and pipeline (Components)
- Training state and parameters (State)
2025-01-01 15:10:54 +00:00
OleehyO
45d40450a1 refactor: simplify dataset implementation and add latent precomputation
- Replace bucket-based dataset with simpler resize-based implementation
- Add video latent precomputation during dataset initialization
- Improve code readability and user experience
- Remove complexity of bucket sampling for better maintainability

This change makes the codebase more straightforward and easier to use while
maintaining functionality through resize-based video processing.
2025-01-01 15:10:54 +00:00
OleehyO
6eae5c201e feat: add latent caching for video encodings
- Add caching mechanism to store VAE-encoded video latents to disk
- Cache latents in a "latent" subdirectory alongside video files
- Skip re-encoding when cached latent file exists
- Add logging for successful cache saves
- Minor code cleanup and formatting improvements

This change improves training efficiency by avoiding redundant video encoding operations.
2025-01-01 15:10:42 +00:00
OleehyO
2a6cca0656 Add type conversion and validation checks 2025-01-01 15:10:42 +00:00
OleehyO
fa4659fb2c feat(trainer): add validation functionality to Trainer class
Add validation capabilities to the Trainer class including:
- Support for validating images and videos during training
- Periodic validation based on validation_steps parameter
- Artifact logging to wandb for validation results
- Memory tracking during validation process
2025-01-01 15:10:41 +00:00
OleehyO
6971364591 Export file_utils.py 2025-01-01 15:10:41 +00:00
OleehyO
60f6a3d7ee feat: add base trainer implementation and training script
- Add Trainer base class with core training loop functionality
- Implement distributed training setup with Accelerate
- Add training script with model/trainer initialization
- Support LoRA fine-tuning with checkpointing and validation
2025-01-01 15:10:41 +00:00
OleehyO
a505f2e312 Add constants.py 2025-01-01 15:10:40 +00:00
OleehyO
78f655a9a4 Add utils 2025-01-01 15:10:40 +00:00
OleehyO
85e00a1082 feat(models): add scaffolding 2025-01-01 15:10:40 +00:00
OleehyO
918ebb5a54 feat(datasets): implement video dataset modules
- Add dataset implementations for text-to-video and image-to-video
- Include bucket sampler for efficient batch processing
- Add utility functions for data processing
- Create dataset package structure with proper initialization
2025-01-01 15:10:40 +00:00
OleehyO
e3f6def234 feat: add video frame extraction tool
Add utility script to extract first frames from videos, helping users convert T2V datasets to I2V format
2025-01-01 15:10:39 +00:00
OleehyO
7b282246dd chore: remove unused configuration files after refactoring
Delete accelerate configs, deepspeed config and host file that are no longer needed
2025-01-01 15:10:39 +00:00
OleehyO
5cb9303286 chore: update .gitignore
- Add new ignore patterns for dataset and model directories
- Update rules for development files
2025-01-01 15:10:32 +00:00
OleehyO
ba85627577 [docs] improve help messages in argument parser
Fix and clarify help documentation in parser.add_argument() to better describe command-line arguments.
2025-01-01 15:10:31 +00:00
OleehyO
2508c8353b [bugfix] fix specific resolution setting
Different models use different resolutions, for example, for the CogVideoX1.5 series models, the optimal generation resolution is 1360x768, But for CogVideoX, the best resolution is 720x480.
2025-01-01 15:10:31 +00:00
Gforky
48ac9c1066 [fix]fix typo in train_cogvideox_image_to_video_lora.py 2025-01-01 15:10:30 +00:00
Zheng Guang Cong
21693ca770 fix bugs of image-to-video without image-condition 2025-01-01 15:10:30 +00:00
OleehyO
d3a7d2dc91 Add resolution warning 2024-12-16 11:34:51 +00:00
OleehyO
7b4c9db6d9 Fix for CogVideoX-{2B,5B}
When loading CogVideX-{2B,5B}, `patch_size_t` is None,
which results in the `prepare_rotary_position_embeddings` function.
2024-12-13 04:02:27 +00:00
OleehyO
36f1333788 Fix for deepspeed training 2024-12-13 04:02:26 +00:00
OleehyO
4d1b9fd166 Fix for Disney video dataset 2024-12-13 04:02:21 +00:00
OleehyO
3ff9d3049d docs: change "read this in English" to "中文阅读"
Update README.md to use Chinese text for language switch link
2024-12-11 05:10:28 +00:00
Yuxuan.Zhang
87ccd38cea
Merge pull request #567 from THUDM/main
New Finetune
2024-12-02 11:30:20 +08:00
Yuxuan.Zhang
5aa6d3a9ee
Merge pull request #515 from Gforky/fix_finetune_demo
[fix]fix deepspeed initialization issue in finetune examples
2024-12-02 11:29:42 +08:00
Yuxuan.Zhang
a094b34425
Merge pull request #565 from THUDM/CogVideoX_dev
Cog video x dev
2024-11-30 12:45:25 +08:00
zR
0fe46df21f new jobs of friendly link 2024-11-30 12:40:07 +08:00
Yuxuan.Zhang
f1a2b48974
Merge pull request #556 from THUDM/main
new announced
2024-11-27 12:11:12 +08:00
Yuxuan.Zhang
d82922cc79
Merge pull request #538 from spacegoing/fix_rope_finetune_shape
[Fix] fix rope temporal patch size
2024-11-23 21:24:39 +08:00
spacegoing
2fb763d25f [Fix] fix rope temporal patch size 2024-11-21 16:26:45 +00:00
luwen.miao
ac2f2c78f7 [fix]fix deepspeed initialization issue in finetune examples 2024-11-18 09:49:31 +00:00
Yuxuan.Zhang
2fdc59c3ce
Merge pull request #507 from THUDM/CogVideoX_dev
diffusers version
2024-11-17 21:54:47 +08:00
zR
17996f11f8 update 2024-11-16 10:06:22 +08:00
Yuxuan.Zhang
5e3e3aabe0
Merge pull request #500 from THUDM/main
Merge
2024-11-13 21:15:49 +08:00
zR
e7a35ea33b update friendly link 2024-11-13 17:06:16 +08:00
zR
cd5ceca22b fix resolution docs 2024-11-12 00:41:23 +08:00
zR
bb2cb130a0 add width and height 2024-11-12 00:17:19 +08:00
zR
2151a3bdfb update with diffusers 2024-11-11 22:41:28 +08:00
zR
68d93ce8fc fix 2024-11-09 22:51:39 +08:00
zR
155456befa update 2024-11-09 22:49:03 +08:00
zR
2475902027 friendly link 2024-11-09 22:43:02 +08:00
zR
fb806eecce update table 2024-11-09 22:29:36 +08:00
zR
c8c7b62aa1 update diffusers code 2024-11-09 22:07:32 +08:00
Yuxuan.Zhang
e2987ff565
Merge pull request #474 from THUDM/CogVideoX_dev
Fix #472 #473
2024-11-09 00:18:01 +08:00
zR
a8205b575d Update cp_enc_dec.py 2024-11-08 23:27:44 +08:00
zR
e7bcecf947 remove wrong fake_cp 2024-11-08 22:54:17 +08:00
zR
d8ee013842 add 10 second comment 2024-11-08 22:31:39 +08:00
zR
e43a7645fd Update autoencoder.py 2024-11-08 21:49:02 +08:00