14 Commits

Author SHA1 Message Date
OleehyO
b362663679 fix: normalize image tensors in I2VDataset 2025-01-12 06:01:48 +00:00
OleehyO
30ba1085ff Merge remote-tracking branch 'upstream/main' into dev 2025-01-12 05:58:07 +00:00
Zheng Guang Cong
cd861bbe1e
Update i2v_dataset.py
image should also be transformed to [-1, 1]
2025-01-11 17:24:35 +08:00
Zheng Guang Cong
35383e2db3
fix potential bug of i2v
Image value is in [0, 255] and should be transformed into [-1, 1], similar to video.
2025-01-11 17:08:25 +08:00
OleehyO
f6d722cec7 fix: remove copying first video frame as conditioning image 2025-01-09 15:52:51 +00:00
OleehyO
07766001f6 feat(dataset): pad short videos by repeating last frame
When loading videos with fewer frames than max_num_frames, repeat the last
frame to reach the required length instead of failing. This ensures consistent
tensor dimensions across the dataset while preserving as much original video
content as possible.
2025-01-08 02:14:56 +00:00
OleehyO
392e37021a Add video path to error message for better debugging 2025-01-07 09:50:21 +00:00
OleehyO
e084a4a270 feat: auto-extract first frames as conditioning images for i2v model
When training i2v models without specifying image_column, automatically extract
and use first frames from training videos as conditioning images. This includes:

- Add load_images_from_videos() utility function to extract and cache first frames
- Update BaseI2VDataset to support auto-extraction when image_column is None
- Add validation and warning message in Args schema for i2v without image_column

The first frames are extracted once and cached to avoid repeated video loading.
2025-01-07 06:43:26 +00:00
OleehyO
36427274d6 style: format import statements across finetune module 2025-01-07 05:54:52 +00:00
zR
1789f07256 format and check fp16 for cogvideox2b 2025-01-07 13:16:18 +08:00
OleehyO
e5b8f9a2ee feat: add caching for prompt embeddings
- Add caching for prompt embeddings
- Store cached files using safetensors format
- Add cache directory structure under data_root/cache
- Optimize memory usage by moving tensors to CPU after caching
- Add debug logging for cache hits
- Add info logging for cache writes

The caching system helps reduce redundant computation and memory usage during training by:
1. Caching prompt embeddings based on prompt text hash
2. Caching encoded video latents based on video filename
3. Moving tensors to CPU after caching to free GPU memory
2025-01-04 06:16:31 +00:00
OleehyO
6eae5c201e feat: add latent caching for video encodings
- Add caching mechanism to store VAE-encoded video latents to disk
- Cache latents in a "latent" subdirectory alongside video files
- Skip re-encoding when cached latent file exists
- Add logging for successful cache saves
- Minor code cleanup and formatting improvements

This change improves training efficiency by avoiding redundant video encoding operations.
2025-01-01 15:10:42 +00:00
OleehyO
2a6cca0656 Add type conversion and validation checks 2025-01-01 15:10:42 +00:00
OleehyO
918ebb5a54 feat(datasets): implement video dataset modules
- Add dataset implementations for text-to-video and image-to-video
- Include bucket sampler for efficient batch processing
- Add utility functions for data processing
- Create dataset package structure with proper initialization
2025-01-01 15:10:40 +00:00