60 Commits

Author SHA1 Message Date
OleehyO
0e21d41b12 Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev 2025-01-07 09:51:48 +00:00
OleehyO
392e37021a Add video path to error message for better debugging 2025-01-07 09:50:21 +00:00
zR
11935892ae remove --image_column 2025-01-07 16:37:11 +08:00
OleehyO
ee1f666206 docs: update READMEs with auto first-frame extraction feature 2025-01-07 06:45:10 +00:00
OleehyO
e084a4a270 feat: auto-extract first frames as conditioning images for i2v model
When training i2v models without specifying image_column, automatically extract
and use first frames from training videos as conditioning images. This includes:

- Add load_images_from_videos() utility function to extract and cache first frames
- Update BaseI2VDataset to support auto-extraction when image_column is None
- Add validation and warning message in Args schema for i2v without image_column

The first frames are extracted once and cached to avoid repeated video loading.
2025-01-07 06:43:26 +00:00
OleehyO
96e511b413 feat: add warning for fp16 mixed precision training 2025-01-07 06:00:38 +00:00
OleehyO
36427274d6 style: format import statements across finetune module 2025-01-07 05:54:52 +00:00
zR
1789f07256 format and check fp16 for cogvideox2b 2025-01-07 13:16:18 +08:00
OleehyO
9157e0cbc8 Adapt dataset for text embeddings and add noise padding
- Add text embedding support in dataset collation
- Pad 2 random noise frames at the beginning of latent space during training
2025-01-06 10:44:58 +00:00
OleehyO
49dc370de6 fix: remove pipeline hooks after validation
- Add pipe.remove_all_hooks() after validation to prevent memory leaks
- Clean up validation pipeline properly to avoid potential issues in subsequent training steps
2025-01-04 06:21:17 +00:00
OleehyO
93b906b3fb docs: clarify train_frames includes padding frame
Add docstring to train_frames field in State schema to explicitly indicate
that it includes one image padding frame
2025-01-04 06:20:25 +00:00
OleehyO
7e1ac76847 feat(cogvideox): add prompt embedding caching support
This change enables caching of prompt embeddings in the CogVideoX text-to-video
LoRA trainer, which can improve training efficiency by avoiding redundant text
encoding operations.
2025-01-04 06:17:56 +00:00
OleehyO
66e4ba2592 fix(cogvideox): add prompt embedding caching and fix frame padding
- Add support for cached prompt embeddings in dataset
- Fix bug where first frame wasn't properly padded in latent space
2025-01-04 06:16:42 +00:00
OleehyO
de5bef6611 feat(args): add train_resolution validation for video frames and resolution
- Add validation to ensure (frames - 1) is multiple of 8
- Add specific resolution check (480x720) for cogvideox-5b models
- Add error handling for invalid resolution format
2025-01-04 06:16:42 +00:00
OleehyO
ffb6ee36b4 docs: update finetune documentation in all languages 2025-01-04 06:16:42 +00:00
OleehyO
c817e7f062 chore: update default training parameters for t2v and i2v scripts 2025-01-04 06:16:42 +00:00
OleehyO
e5b8f9a2ee feat: add caching for prompt embeddings
- Add caching for prompt embeddings
- Store cached files using safetensors format
- Add cache directory structure under data_root/cache
- Optimize memory usage by moving tensors to CPU after caching
- Add debug logging for cache hits
- Add info logging for cache writes

The caching system helps reduce redundant computation and memory usage during training by:
1. Caching prompt embeddings based on prompt text hash
2. Caching encoded video latents based on video filename
3. Moving tensors to CPU after caching to free GPU memory
2025-01-04 06:16:31 +00:00
OleehyO
f731c35f70 Add unload_model function 2025-01-03 08:21:27 +00:00
OleehyO
a88c1ede69 feat(args): add validation for training resolution
- Add validation check to ensure number of frames is multiple of 8
- Add format validation for train_resolution string (frames x height x width)
2025-01-02 03:12:09 +00:00
OleehyO
362b7bf273 docs: update README in multiple languages 2025-01-02 03:07:34 +00:00
OleehyO
7fa1bb48be refactor: remove deprecated training scripts 2025-01-01 15:56:14 +00:00
OleehyO
48ad178818 Reorganize training script arguments 2025-01-01 15:52:39 +00:00
OleehyO
6e79472417 feat: add training launch scripts for I2V and T2V models
Add two shell scripts to simplify model training:
- accelerate_train_i2v.sh: Launch script for Image-to-Video training
- accelerate_train_t2v.sh: Launch script for Text-to-Video training

Both scripts provide comprehensive configurations for:
- Model settings
- Data pipeline
- Training parameters
- System resources
- Checkpointing
- Validation
2025-01-01 15:10:55 +00:00
OleehyO
26b87cd4ff feat(args): add validation and arg interface for training parameters
- Add field validators for model type and validation settings
- Implement command line argument parsing with argparse
- Add type hints and documentation for training parameters
- Support configuration of model, training, and validation parameters
2025-01-01 15:10:55 +00:00
OleehyO
04a60e7435 Change logger name to trainer 2025-01-01 15:10:55 +00:00
OleehyO
a001842834 feat: implement CogVideoX trainers for I2V and T2V tasks
Add and refactor trainers for CogVideoX model variants:
- Implement CogVideoXT2VLoraTrainer for text-to-video generation
- Refactor CogVideoXI2VLoraTrainer for image-to-video generation

Both trainers support LoRA fine-tuning with proper handling of:
- Model components loading and initialization
- Video encoding and batch collation
- Loss computation with noise prediction
- Validation step for generation
2025-01-01 15:10:54 +00:00
OleehyO
91d79fd9a4 feat: add schemas module for configuration and state management
Add Pydantic models to handle:
- CLI arguments and configuration (Args)
- Model components and pipeline (Components)
- Training state and parameters (State)
2025-01-01 15:10:54 +00:00
OleehyO
45d40450a1 refactor: simplify dataset implementation and add latent precomputation
- Replace bucket-based dataset with simpler resize-based implementation
- Add video latent precomputation during dataset initialization
- Improve code readability and user experience
- Remove complexity of bucket sampling for better maintainability

This change makes the codebase more straightforward and easier to use while
maintaining functionality through resize-based video processing.
2025-01-01 15:10:54 +00:00
OleehyO
6eae5c201e feat: add latent caching for video encodings
- Add caching mechanism to store VAE-encoded video latents to disk
- Cache latents in a "latent" subdirectory alongside video files
- Skip re-encoding when cached latent file exists
- Add logging for successful cache saves
- Minor code cleanup and formatting improvements

This change improves training efficiency by avoiding redundant video encoding operations.
2025-01-01 15:10:42 +00:00
OleehyO
2a6cca0656 Add type conversion and validation checks 2025-01-01 15:10:42 +00:00
OleehyO
fa4659fb2c feat(trainer): add validation functionality to Trainer class
Add validation capabilities to the Trainer class including:
- Support for validating images and videos during training
- Periodic validation based on validation_steps parameter
- Artifact logging to wandb for validation results
- Memory tracking during validation process
2025-01-01 15:10:41 +00:00
OleehyO
6971364591 Export file_utils.py 2025-01-01 15:10:41 +00:00
OleehyO
60f6a3d7ee feat: add base trainer implementation and training script
- Add Trainer base class with core training loop functionality
- Implement distributed training setup with Accelerate
- Add training script with model/trainer initialization
- Support LoRA fine-tuning with checkpointing and validation
2025-01-01 15:10:41 +00:00
OleehyO
a505f2e312 Add constants.py 2025-01-01 15:10:40 +00:00
OleehyO
78f655a9a4 Add utils 2025-01-01 15:10:40 +00:00
OleehyO
85e00a1082 feat(models): add scaffolding 2025-01-01 15:10:40 +00:00
OleehyO
918ebb5a54 feat(datasets): implement video dataset modules
- Add dataset implementations for text-to-video and image-to-video
- Include bucket sampler for efficient batch processing
- Add utility functions for data processing
- Create dataset package structure with proper initialization
2025-01-01 15:10:40 +00:00
OleehyO
e3f6def234 feat: add video frame extraction tool
Add utility script to extract first frames from videos, helping users convert T2V datasets to I2V format
2025-01-01 15:10:39 +00:00
OleehyO
7b282246dd chore: remove unused configuration files after refactoring
Delete accelerate configs, deepspeed config and host file that are no longer needed
2025-01-01 15:10:39 +00:00
Gforky
48ac9c1066 [fix]fix typo in train_cogvideox_image_to_video_lora.py 2025-01-01 15:10:30 +00:00
Zheng Guang Cong
21693ca770 fix bugs of image-to-video without image-condition 2025-01-01 15:10:30 +00:00
OleehyO
7b4c9db6d9 Fix for CogVideoX-{2B,5B}
When loading CogVideX-{2B,5B}, `patch_size_t` is None,
which results in the `prepare_rotary_position_embeddings` function.
2024-12-13 04:02:27 +00:00
OleehyO
36f1333788 Fix for deepspeed training 2024-12-13 04:02:26 +00:00
OleehyO
4d1b9fd166 Fix for Disney video dataset 2024-12-13 04:02:21 +00:00
Yuxuan.Zhang
5aa6d3a9ee
Merge pull request #515 from Gforky/fix_finetune_demo
[fix]fix deepspeed initialization issue in finetune examples
2024-12-02 11:29:42 +08:00
spacegoing
2fb763d25f [Fix] fix rope temporal patch size 2024-11-21 16:26:45 +00:00
luwen.miao
ac2f2c78f7 [fix]fix deepspeed initialization issue in finetune examples 2024-11-18 09:49:31 +00:00
zR
e6ee283d0e Merge branch 'CogVideoX_dev' of github.com:THUDM/CogVideo into CogVideoX_dev 2024-10-14 11:34:40 +08:00
zR
e169e7b045 Update train_cogvideox_image_to_video_lora.py 2024-10-06 22:50:56 +08:00
Yuxuan.Zhang
532f246d7c
Merge pull request #389 from THUDM/CogVideoX_dev
I2V Finetune of CogVIdeoX-5B-I2V
2024-10-05 22:14:52 +08:00