Add two shell scripts to simplify model training:
- accelerate_train_i2v.sh: Launch script for Image-to-Video training
- accelerate_train_t2v.sh: Launch script for Text-to-Video training
Both scripts provide comprehensive configurations for:
- Model settings
- Data pipeline
- Training parameters
- System resources
- Checkpointing
- Validation
- Add field validators for model type and validation settings
- Implement command line argument parsing with argparse
- Add type hints and documentation for training parameters
- Support configuration of model, training, and validation parameters
Add and refactor trainers for CogVideoX model variants:
- Implement CogVideoXT2VLoraTrainer for text-to-video generation
- Refactor CogVideoXI2VLoraTrainer for image-to-video generation
Both trainers support LoRA fine-tuning with proper handling of:
- Model components loading and initialization
- Video encoding and batch collation
- Loss computation with noise prediction
- Validation step for generation
Add Pydantic models to handle:
- CLI arguments and configuration (Args)
- Model components and pipeline (Components)
- Training state and parameters (State)
- Replace bucket-based dataset with simpler resize-based implementation
- Add video latent precomputation during dataset initialization
- Improve code readability and user experience
- Remove complexity of bucket sampling for better maintainability
This change makes the codebase more straightforward and easier to use while
maintaining functionality through resize-based video processing.
- Add caching mechanism to store VAE-encoded video latents to disk
- Cache latents in a "latent" subdirectory alongside video files
- Skip re-encoding when cached latent file exists
- Add logging for successful cache saves
- Minor code cleanup and formatting improvements
This change improves training efficiency by avoiding redundant video encoding operations.
Add validation capabilities to the Trainer class including:
- Support for validating images and videos during training
- Periodic validation based on validation_steps parameter
- Artifact logging to wandb for validation results
- Memory tracking during validation process
- Add Trainer base class with core training loop functionality
- Implement distributed training setup with Accelerate
- Add training script with model/trainer initialization
- Support LoRA fine-tuning with checkpointing and validation
- Add dataset implementations for text-to-video and image-to-video
- Include bucket sampler for efficient batch processing
- Add utility functions for data processing
- Create dataset package structure with proper initialization
Different models use different resolutions, for example, for the CogVideoX1.5 series models, the optimal generation resolution is 1360x768, But for CogVideoX, the best resolution is 720x480.
Different models use different resolutions, for example, for the CogVideoX1.5 series models, the optimal generation resolution is 1360x768, But for CogVideoX, the best resolution is 720x480.