358 Commits

Author SHA1 Message Date
OleehyO
f731c35f70 Add unload_model function 2025-01-03 08:21:27 +00:00
OleehyO
a88c1ede69 feat(args): add validation for training resolution
- Add validation check to ensure number of frames is multiple of 8
- Add format validation for train_resolution string (frames x height x width)
2025-01-02 03:12:09 +00:00
OleehyO
362b7bf273 docs: update README in multiple languages 2025-01-02 03:07:34 +00:00
OleehyO
cf2fff7e55 Merge remote-tracking branch 'upstream/main' into dev 2025-01-01 16:03:51 +00:00
OleehyO
7fa1bb48be refactor: remove deprecated training scripts 2025-01-01 15:56:14 +00:00
OleehyO
48ad178818 Reorganize training script arguments 2025-01-01 15:52:39 +00:00
三洋三洋
6ef15dd2a5 docs: update TOC and add friendly link in README files
- Update table of contents in README.md, README_ja.md and README_zh.md
- Add friendly link section to all README files
2025-01-01 15:10:55 +00:00
OleehyO
6e79472417 feat: add training launch scripts for I2V and T2V models
Add two shell scripts to simplify model training:
- accelerate_train_i2v.sh: Launch script for Image-to-Video training
- accelerate_train_t2v.sh: Launch script for Text-to-Video training

Both scripts provide comprehensive configurations for:
- Model settings
- Data pipeline
- Training parameters
- System resources
- Checkpointing
- Validation
2025-01-01 15:10:55 +00:00
OleehyO
26b87cd4ff feat(args): add validation and arg interface for training parameters
- Add field validators for model type and validation settings
- Implement command line argument parsing with argparse
- Add type hints and documentation for training parameters
- Support configuration of model, training, and validation parameters
2025-01-01 15:10:55 +00:00
OleehyO
04a60e7435 Change logger name to trainer 2025-01-01 15:10:55 +00:00
OleehyO
a001842834 feat: implement CogVideoX trainers for I2V and T2V tasks
Add and refactor trainers for CogVideoX model variants:
- Implement CogVideoXT2VLoraTrainer for text-to-video generation
- Refactor CogVideoXI2VLoraTrainer for image-to-video generation

Both trainers support LoRA fine-tuning with proper handling of:
- Model components loading and initialization
- Video encoding and batch collation
- Loss computation with noise prediction
- Validation step for generation
2025-01-01 15:10:54 +00:00
OleehyO
91d79fd9a4 feat: add schemas module for configuration and state management
Add Pydantic models to handle:
- CLI arguments and configuration (Args)
- Model components and pipeline (Components)
- Training state and parameters (State)
2025-01-01 15:10:54 +00:00
OleehyO
45d40450a1 refactor: simplify dataset implementation and add latent precomputation
- Replace bucket-based dataset with simpler resize-based implementation
- Add video latent precomputation during dataset initialization
- Improve code readability and user experience
- Remove complexity of bucket sampling for better maintainability

This change makes the codebase more straightforward and easier to use while
maintaining functionality through resize-based video processing.
2025-01-01 15:10:54 +00:00
OleehyO
6eae5c201e feat: add latent caching for video encodings
- Add caching mechanism to store VAE-encoded video latents to disk
- Cache latents in a "latent" subdirectory alongside video files
- Skip re-encoding when cached latent file exists
- Add logging for successful cache saves
- Minor code cleanup and formatting improvements

This change improves training efficiency by avoiding redundant video encoding operations.
2025-01-01 15:10:42 +00:00
OleehyO
2a6cca0656 Add type conversion and validation checks 2025-01-01 15:10:42 +00:00
OleehyO
fa4659fb2c feat(trainer): add validation functionality to Trainer class
Add validation capabilities to the Trainer class including:
- Support for validating images and videos during training
- Periodic validation based on validation_steps parameter
- Artifact logging to wandb for validation results
- Memory tracking during validation process
2025-01-01 15:10:41 +00:00
OleehyO
6971364591 Export file_utils.py 2025-01-01 15:10:41 +00:00
OleehyO
60f6a3d7ee feat: add base trainer implementation and training script
- Add Trainer base class with core training loop functionality
- Implement distributed training setup with Accelerate
- Add training script with model/trainer initialization
- Support LoRA fine-tuning with checkpointing and validation
2025-01-01 15:10:41 +00:00
OleehyO
a505f2e312 Add constants.py 2025-01-01 15:10:40 +00:00
OleehyO
78f655a9a4 Add utils 2025-01-01 15:10:40 +00:00
OleehyO
85e00a1082 feat(models): add scaffolding 2025-01-01 15:10:40 +00:00
OleehyO
918ebb5a54 feat(datasets): implement video dataset modules
- Add dataset implementations for text-to-video and image-to-video
- Include bucket sampler for efficient batch processing
- Add utility functions for data processing
- Create dataset package structure with proper initialization
2025-01-01 15:10:40 +00:00
OleehyO
e3f6def234 feat: add video frame extraction tool
Add utility script to extract first frames from videos, helping users convert T2V datasets to I2V format
2025-01-01 15:10:39 +00:00
OleehyO
7b282246dd chore: remove unused configuration files after refactoring
Delete accelerate configs, deepspeed config and host file that are no longer needed
2025-01-01 15:10:39 +00:00
OleehyO
5cb9303286 chore: update .gitignore
- Add new ignore patterns for dataset and model directories
- Update rules for development files
2025-01-01 15:10:32 +00:00
OleehyO
ba85627577 [docs] improve help messages in argument parser
Fix and clarify help documentation in parser.add_argument() to better describe command-line arguments.
2025-01-01 15:10:31 +00:00
OleehyO
2508c8353b [bugfix] fix specific resolution setting
Different models use different resolutions, for example, for the CogVideoX1.5 series models, the optimal generation resolution is 1360x768, But for CogVideoX, the best resolution is 720x480.
2025-01-01 15:10:31 +00:00
Gforky
48ac9c1066 [fix]fix typo in train_cogvideox_image_to_video_lora.py 2025-01-01 15:10:30 +00:00
Zheng Guang Cong
21693ca770 fix bugs of image-to-video without image-condition 2025-01-01 15:10:30 +00:00
三洋三洋
a6e611e354 docs: update TOC and add friendly link in README files
- Update table of contents in README.md, README_ja.md and README_zh.md
- Add friendly link section to all README files
2024-12-27 19:37:08 +08:00
Yuxuan.Zhang
7935bd58a1
Merge pull request #615 from THUDM/CogVideoX_dev
Cog video x dev
2024-12-19 12:57:56 +08:00
OleehyO
1811c50e73 [docs] improve help messages in argument parser
Fix and clarify help documentation in parser.add_argument() to better describe command-line arguments.
2024-12-18 12:30:13 +00:00
OleehyO
92a589240f [bugfix] fix specific resolution setting
Different models use different resolutions, for example, for the CogVideoX1.5 series models, the optimal generation resolution is 1360x768, But for CogVideoX, the best resolution is 720x480.
2024-12-18 12:25:43 +00:00
OleehyO
7add8f8437
Merge pull request #607 from THUDM/CogVideoX_dev
Add resolution warning
2024-12-17 09:58:10 +08:00
OleehyO
cfaca91cde Merge remote-tracking branch 'upstream/main' into dev 2024-12-16 11:38:26 +00:00
OleehyO
d3a7d2dc91 Add resolution warning 2024-12-16 11:34:51 +00:00
Yuxuan.Zhang
46098f446b
Merge pull request #603 from Gforky/fix-demo-issue
[fix]fix typo in train_cogvideox_image_to_video_lora.py
2024-12-15 22:00:41 +08:00
Gforky
5a03e6fa79 [fix]fix typo in train_cogvideox_image_to_video_lora.py 2024-12-14 16:12:57 +08:00
Yuxuan.Zhang
1605e95033
Merge pull request #599 from THUDM/CogVideoX_dev
Cog video x dev
2024-12-13 15:03:48 +08:00
OleehyO
7b4c9db6d9 Fix for CogVideoX-{2B,5B}
When loading CogVideX-{2B,5B}, `patch_size_t` is None,
which results in the `prepare_rotary_position_embeddings` function.
2024-12-13 04:02:27 +00:00
OleehyO
36f1333788 Fix for deepspeed training 2024-12-13 04:02:26 +00:00
OleehyO
4d1b9fd166 Fix for Disney video dataset 2024-12-13 04:02:21 +00:00
OleehyO
3ff9d3049d docs: change "read this in English" to "中文阅读"
Update README.md to use Chinese text for language switch link
2024-12-11 05:10:28 +00:00
Yuxuan.Zhang
496e220463
Merge pull request #585 from ZGCTroy/patch-1
fix bugs of image-to-video without image-condition
2024-12-08 19:31:59 +08:00
Zheng Guang Cong
a46d762cd9
fix bugs of image-to-video without image-condition 2024-12-06 20:14:43 +08:00
Yuxuan.Zhang
87ccd38cea
Merge pull request #567 from THUDM/main
New Finetune
2024-12-02 11:30:20 +08:00
Yuxuan.Zhang
5aa6d3a9ee
Merge pull request #515 from Gforky/fix_finetune_demo
[fix]fix deepspeed initialization issue in finetune examples
2024-12-02 11:29:42 +08:00
Yuxuan.Zhang
a094b34425
Merge pull request #565 from THUDM/CogVideoX_dev
Cog video x dev
2024-11-30 12:45:25 +08:00
zR
0fe46df21f new jobs of friendly link 2024-11-30 12:40:07 +08:00
Yuxuan.Zhang
f1a2b48974
Merge pull request #556 from THUDM/main
new announced
2024-11-27 12:11:12 +08:00