428 Commits

Author SHA1 Message Date
OleehyO
392e37021a Add video path to error message for better debugging 2025-01-07 09:50:21 +00:00
zR
11935892ae remove --image_column 2025-01-07 16:37:11 +08:00
OleehyO
ee1f666206 docs: update READMEs with auto first-frame extraction feature 2025-01-07 06:45:10 +00:00
OleehyO
e084a4a270 feat: auto-extract first frames as conditioning images for i2v model
When training i2v models without specifying image_column, automatically extract
and use first frames from training videos as conditioning images. This includes:

- Add load_images_from_videos() utility function to extract and cache first frames
- Update BaseI2VDataset to support auto-extraction when image_column is None
- Add validation and warning message in Args schema for i2v without image_column

The first frames are extracted once and cached to avoid repeated video loading.
2025-01-07 06:43:26 +00:00
OleehyO
96e511b413 feat: add warning for fp16 mixed precision training 2025-01-07 06:00:38 +00:00
OleehyO
36427274d6 style: format import statements across finetune module 2025-01-07 05:54:52 +00:00
zR
1789f07256 format and check fp16 for cogvideox2b 2025-01-07 13:16:18 +08:00
OleehyO
1b886326b2 Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev 2025-01-06 10:47:56 +00:00
OleehyO
9157e0cbc8 Adapt dataset for text embeddings and add noise padding
- Add text embedding support in dataset collation
- Pad 2 random noise frames at the beginning of latent space during training
2025-01-06 10:44:58 +00:00
OleehyO
49dc370de6 fix: remove pipeline hooks after validation
- Add pipe.remove_all_hooks() after validation to prevent memory leaks
- Clean up validation pipeline properly to avoid potential issues in subsequent training steps
2025-01-04 06:21:17 +00:00
OleehyO
93b906b3fb docs: clarify train_frames includes padding frame
Add docstring to train_frames field in State schema to explicitly indicate
that it includes one image padding frame
2025-01-04 06:20:25 +00:00
OleehyO
7e1ac76847 feat(cogvideox): add prompt embedding caching support
This change enables caching of prompt embeddings in the CogVideoX text-to-video
LoRA trainer, which can improve training efficiency by avoiding redundant text
encoding operations.
2025-01-04 06:17:56 +00:00
OleehyO
66e4ba2592 fix(cogvideox): add prompt embedding caching and fix frame padding
- Add support for cached prompt embeddings in dataset
- Fix bug where first frame wasn't properly padded in latent space
2025-01-04 06:16:42 +00:00
OleehyO
de5bef6611 feat(args): add train_resolution validation for video frames and resolution
- Add validation to ensure (frames - 1) is multiple of 8
- Add specific resolution check (480x720) for cogvideox-5b models
- Add error handling for invalid resolution format
2025-01-04 06:16:42 +00:00
OleehyO
ffb6ee36b4 docs: update finetune documentation in all languages 2025-01-04 06:16:42 +00:00
OleehyO
c817e7f062 chore: update default training parameters for t2v and i2v scripts 2025-01-04 06:16:42 +00:00
OleehyO
e5b8f9a2ee feat: add caching for prompt embeddings
- Add caching for prompt embeddings
- Store cached files using safetensors format
- Add cache directory structure under data_root/cache
- Optimize memory usage by moving tensors to CPU after caching
- Add debug logging for cache hits
- Add info logging for cache writes

The caching system helps reduce redundant computation and memory usage during training by:
1. Caching prompt embeddings based on prompt text hash
2. Caching encoded video latents based on video filename
3. Moving tensors to CPU after caching to free GPU memory
2025-01-04 06:16:31 +00:00
OleehyO
f731c35f70 Add unload_model function 2025-01-03 08:21:27 +00:00
zR
ce2c299c1f Update diffusion_video.py 2025-01-03 08:45:42 +08:00
zR
b080c6a010 put lora back(sat), unavailable running 2025-01-02 11:48:18 +08:00
OleehyO
a88c1ede69 feat(args): add validation for training resolution
- Add validation check to ensure number of frames is multiple of 8
- Add format validation for train_resolution string (frames x height x width)
2025-01-02 03:12:09 +00:00
OleehyO
362b7bf273 docs: update README in multiple languages 2025-01-02 03:07:34 +00:00
Yuxuan Zhang
aa240dc675
Merge pull request #632 from THUDM/CogVideoX_dev
Refactored the training code of finetune
2025-01-02 08:31:25 +08:00
OleehyO
cf2fff7e55 Merge remote-tracking branch 'upstream/main' into dev 2025-01-01 16:03:51 +00:00
OleehyO
7fa1bb48be refactor: remove deprecated training scripts 2025-01-01 15:56:14 +00:00
OleehyO
48ad178818 Reorganize training script arguments 2025-01-01 15:52:39 +00:00
三洋三洋
6ef15dd2a5 docs: update TOC and add friendly link in README files
- Update table of contents in README.md, README_ja.md and README_zh.md
- Add friendly link section to all README files
2025-01-01 15:10:55 +00:00
OleehyO
6e79472417 feat: add training launch scripts for I2V and T2V models
Add two shell scripts to simplify model training:
- accelerate_train_i2v.sh: Launch script for Image-to-Video training
- accelerate_train_t2v.sh: Launch script for Text-to-Video training

Both scripts provide comprehensive configurations for:
- Model settings
- Data pipeline
- Training parameters
- System resources
- Checkpointing
- Validation
2025-01-01 15:10:55 +00:00
OleehyO
26b87cd4ff feat(args): add validation and arg interface for training parameters
- Add field validators for model type and validation settings
- Implement command line argument parsing with argparse
- Add type hints and documentation for training parameters
- Support configuration of model, training, and validation parameters
2025-01-01 15:10:55 +00:00
OleehyO
04a60e7435 Change logger name to trainer 2025-01-01 15:10:55 +00:00
OleehyO
a001842834 feat: implement CogVideoX trainers for I2V and T2V tasks
Add and refactor trainers for CogVideoX model variants:
- Implement CogVideoXT2VLoraTrainer for text-to-video generation
- Refactor CogVideoXI2VLoraTrainer for image-to-video generation

Both trainers support LoRA fine-tuning with proper handling of:
- Model components loading and initialization
- Video encoding and batch collation
- Loss computation with noise prediction
- Validation step for generation
2025-01-01 15:10:54 +00:00
OleehyO
91d79fd9a4 feat: add schemas module for configuration and state management
Add Pydantic models to handle:
- CLI arguments and configuration (Args)
- Model components and pipeline (Components)
- Training state and parameters (State)
2025-01-01 15:10:54 +00:00
OleehyO
45d40450a1 refactor: simplify dataset implementation and add latent precomputation
- Replace bucket-based dataset with simpler resize-based implementation
- Add video latent precomputation during dataset initialization
- Improve code readability and user experience
- Remove complexity of bucket sampling for better maintainability

This change makes the codebase more straightforward and easier to use while
maintaining functionality through resize-based video processing.
2025-01-01 15:10:54 +00:00
OleehyO
6eae5c201e feat: add latent caching for video encodings
- Add caching mechanism to store VAE-encoded video latents to disk
- Cache latents in a "latent" subdirectory alongside video files
- Skip re-encoding when cached latent file exists
- Add logging for successful cache saves
- Minor code cleanup and formatting improvements

This change improves training efficiency by avoiding redundant video encoding operations.
2025-01-01 15:10:42 +00:00
OleehyO
2a6cca0656 Add type conversion and validation checks 2025-01-01 15:10:42 +00:00
OleehyO
fa4659fb2c feat(trainer): add validation functionality to Trainer class
Add validation capabilities to the Trainer class including:
- Support for validating images and videos during training
- Periodic validation based on validation_steps parameter
- Artifact logging to wandb for validation results
- Memory tracking during validation process
2025-01-01 15:10:41 +00:00
OleehyO
6971364591 Export file_utils.py 2025-01-01 15:10:41 +00:00
OleehyO
60f6a3d7ee feat: add base trainer implementation and training script
- Add Trainer base class with core training loop functionality
- Implement distributed training setup with Accelerate
- Add training script with model/trainer initialization
- Support LoRA fine-tuning with checkpointing and validation
2025-01-01 15:10:41 +00:00
OleehyO
a505f2e312 Add constants.py 2025-01-01 15:10:40 +00:00
OleehyO
78f655a9a4 Add utils 2025-01-01 15:10:40 +00:00
OleehyO
85e00a1082 feat(models): add scaffolding 2025-01-01 15:10:40 +00:00
OleehyO
918ebb5a54 feat(datasets): implement video dataset modules
- Add dataset implementations for text-to-video and image-to-video
- Include bucket sampler for efficient batch processing
- Add utility functions for data processing
- Create dataset package structure with proper initialization
2025-01-01 15:10:40 +00:00
OleehyO
e3f6def234 feat: add video frame extraction tool
Add utility script to extract first frames from videos, helping users convert T2V datasets to I2V format
2025-01-01 15:10:39 +00:00
OleehyO
7b282246dd chore: remove unused configuration files after refactoring
Delete accelerate configs, deepspeed config and host file that are no longer needed
2025-01-01 15:10:39 +00:00
OleehyO
5cb9303286 chore: update .gitignore
- Add new ignore patterns for dataset and model directories
- Update rules for development files
2025-01-01 15:10:32 +00:00
OleehyO
ba85627577 [docs] improve help messages in argument parser
Fix and clarify help documentation in parser.add_argument() to better describe command-line arguments.
2025-01-01 15:10:31 +00:00
OleehyO
2508c8353b [bugfix] fix specific resolution setting
Different models use different resolutions, for example, for the CogVideoX1.5 series models, the optimal generation resolution is 1360x768, But for CogVideoX, the best resolution is 720x480.
2025-01-01 15:10:31 +00:00
Gforky
48ac9c1066 [fix]fix typo in train_cogvideox_image_to_video_lora.py 2025-01-01 15:10:30 +00:00
Zheng Guang Cong
21693ca770 fix bugs of image-to-video without image-condition 2025-01-01 15:10:30 +00:00
三洋三洋
a6e611e354 docs: update TOC and add friendly link in README files
- Update table of contents in README.md, README_ja.md and README_zh.md
- Add friendly link section to all README files
2024-12-27 19:37:08 +08:00