CogVideo

mirror of https://github.com/THUDM/CogVideo.git synced 2026-01-10 23:27:05 +08:00

Author	SHA1	Message	Date
OleehyO	0e21d41b12	Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev	2025-01-07 09:51:48 +00:00
OleehyO	392e37021a	Add video path to error message for better debugging	2025-01-07 09:50:21 +00:00
zR	11935892ae	remove --image_column	2025-01-07 16:37:11 +08:00
OleehyO	ee1f666206	docs: update READMEs with auto first-frame extraction feature	2025-01-07 06:45:10 +00:00
OleehyO	e084a4a270	feat: auto-extract first frames as conditioning images for i2v model When training i2v models without specifying image_column, automatically extract and use first frames from training videos as conditioning images. This includes: - Add load_images_from_videos() utility function to extract and cache first frames - Update BaseI2VDataset to support auto-extraction when image_column is None - Add validation and warning message in Args schema for i2v without image_column The first frames are extracted once and cached to avoid repeated video loading.	2025-01-07 06:43:26 +00:00
OleehyO	96e511b413	feat: add warning for fp16 mixed precision training	2025-01-07 06:00:38 +00:00
OleehyO	36427274d6	style: format import statements across finetune module	2025-01-07 05:54:52 +00:00
zR	1789f07256	format and check fp16 for cogvideox2b	2025-01-07 13:16:18 +08:00
OleehyO	9157e0cbc8	Adapt dataset for text embeddings and add noise padding - Add text embedding support in dataset collation - Pad 2 random noise frames at the beginning of latent space during training	2025-01-06 10:44:58 +00:00
OleehyO	49dc370de6	fix: remove pipeline hooks after validation - Add pipe.remove_all_hooks() after validation to prevent memory leaks - Clean up validation pipeline properly to avoid potential issues in subsequent training steps	2025-01-04 06:21:17 +00:00
OleehyO	93b906b3fb	docs: clarify train_frames includes padding frame Add docstring to train_frames field in State schema to explicitly indicate that it includes one image padding frame	2025-01-04 06:20:25 +00:00
OleehyO	7e1ac76847	feat(cogvideox): add prompt embedding caching support This change enables caching of prompt embeddings in the CogVideoX text-to-video LoRA trainer, which can improve training efficiency by avoiding redundant text encoding operations.	2025-01-04 06:17:56 +00:00
OleehyO	66e4ba2592	fix(cogvideox): add prompt embedding caching and fix frame padding - Add support for cached prompt embeddings in dataset - Fix bug where first frame wasn't properly padded in latent space	2025-01-04 06:16:42 +00:00
OleehyO	de5bef6611	feat(args): add train_resolution validation for video frames and resolution - Add validation to ensure (frames - 1) is multiple of 8 - Add specific resolution check (480x720) for cogvideox-5b models - Add error handling for invalid resolution format	2025-01-04 06:16:42 +00:00
OleehyO	ffb6ee36b4	docs: update finetune documentation in all languages	2025-01-04 06:16:42 +00:00
OleehyO	c817e7f062	chore: update default training parameters for t2v and i2v scripts	2025-01-04 06:16:42 +00:00
OleehyO	e5b8f9a2ee	feat: add caching for prompt embeddings - Add caching for prompt embeddings - Store cached files using safetensors format - Add cache directory structure under data_root/cache - Optimize memory usage by moving tensors to CPU after caching - Add debug logging for cache hits - Add info logging for cache writes The caching system helps reduce redundant computation and memory usage during training by: 1. Caching prompt embeddings based on prompt text hash 2. Caching encoded video latents based on video filename 3. Moving tensors to CPU after caching to free GPU memory	2025-01-04 06:16:31 +00:00
OleehyO	f731c35f70	Add unload_model function	2025-01-03 08:21:27 +00:00
OleehyO	a88c1ede69	feat(args): add validation for training resolution - Add validation check to ensure number of frames is multiple of 8 - Add format validation for train_resolution string (frames x height x width)	2025-01-02 03:12:09 +00:00
OleehyO	362b7bf273	docs: update README in multiple languages	2025-01-02 03:07:34 +00:00
OleehyO	7fa1bb48be	refactor: remove deprecated training scripts	2025-01-01 15:56:14 +00:00
OleehyO	48ad178818	Reorganize training script arguments	2025-01-01 15:52:39 +00:00
OleehyO	6e79472417	feat: add training launch scripts for I2V and T2V models Add two shell scripts to simplify model training: - accelerate_train_i2v.sh: Launch script for Image-to-Video training - accelerate_train_t2v.sh: Launch script for Text-to-Video training Both scripts provide comprehensive configurations for: - Model settings - Data pipeline - Training parameters - System resources - Checkpointing - Validation	2025-01-01 15:10:55 +00:00
OleehyO	26b87cd4ff	feat(args): add validation and arg interface for training parameters - Add field validators for model type and validation settings - Implement command line argument parsing with argparse - Add type hints and documentation for training parameters - Support configuration of model, training, and validation parameters	2025-01-01 15:10:55 +00:00
OleehyO	04a60e7435	Change logger name to trainer	2025-01-01 15:10:55 +00:00
OleehyO	a001842834	feat: implement CogVideoX trainers for I2V and T2V tasks Add and refactor trainers for CogVideoX model variants: - Implement CogVideoXT2VLoraTrainer for text-to-video generation - Refactor CogVideoXI2VLoraTrainer for image-to-video generation Both trainers support LoRA fine-tuning with proper handling of: - Model components loading and initialization - Video encoding and batch collation - Loss computation with noise prediction - Validation step for generation	2025-01-01 15:10:54 +00:00
OleehyO	91d79fd9a4	feat: add schemas module for configuration and state management Add Pydantic models to handle: - CLI arguments and configuration (Args) - Model components and pipeline (Components) - Training state and parameters (State)	2025-01-01 15:10:54 +00:00
OleehyO	45d40450a1	refactor: simplify dataset implementation and add latent precomputation - Replace bucket-based dataset with simpler resize-based implementation - Add video latent precomputation during dataset initialization - Improve code readability and user experience - Remove complexity of bucket sampling for better maintainability This change makes the codebase more straightforward and easier to use while maintaining functionality through resize-based video processing.	2025-01-01 15:10:54 +00:00
OleehyO	6eae5c201e	feat: add latent caching for video encodings - Add caching mechanism to store VAE-encoded video latents to disk - Cache latents in a "latent" subdirectory alongside video files - Skip re-encoding when cached latent file exists - Add logging for successful cache saves - Minor code cleanup and formatting improvements This change improves training efficiency by avoiding redundant video encoding operations.	2025-01-01 15:10:42 +00:00
OleehyO	2a6cca0656	Add type conversion and validation checks	2025-01-01 15:10:42 +00:00
OleehyO	fa4659fb2c	feat(trainer): add validation functionality to Trainer class Add validation capabilities to the Trainer class including: - Support for validating images and videos during training - Periodic validation based on validation_steps parameter - Artifact logging to wandb for validation results - Memory tracking during validation process	2025-01-01 15:10:41 +00:00
OleehyO	6971364591	Export file_utils.py	2025-01-01 15:10:41 +00:00
OleehyO	60f6a3d7ee	feat: add base trainer implementation and training script - Add Trainer base class with core training loop functionality - Implement distributed training setup with Accelerate - Add training script with model/trainer initialization - Support LoRA fine-tuning with checkpointing and validation	2025-01-01 15:10:41 +00:00
OleehyO	a505f2e312	Add constants.py	2025-01-01 15:10:40 +00:00
OleehyO	78f655a9a4	Add utils	2025-01-01 15:10:40 +00:00
OleehyO	85e00a1082	feat(models): add scaffolding	2025-01-01 15:10:40 +00:00
OleehyO	918ebb5a54	feat(datasets): implement video dataset modules - Add dataset implementations for text-to-video and image-to-video - Include bucket sampler for efficient batch processing - Add utility functions for data processing - Create dataset package structure with proper initialization	2025-01-01 15:10:40 +00:00
OleehyO	e3f6def234	feat: add video frame extraction tool Add utility script to extract first frames from videos, helping users convert T2V datasets to I2V format	2025-01-01 15:10:39 +00:00
OleehyO	7b282246dd	chore: remove unused configuration files after refactoring Delete accelerate configs, deepspeed config and host file that are no longer needed	2025-01-01 15:10:39 +00:00
Gforky	48ac9c1066	[fix]fix typo in train_cogvideox_image_to_video_lora.py	2025-01-01 15:10:30 +00:00
Zheng Guang Cong	21693ca770	fix bugs of image-to-video without image-condition	2025-01-01 15:10:30 +00:00
OleehyO	7b4c9db6d9	Fix for CogVideoX-{2B,5B} When loading CogVideX-{2B,5B}, `patch_size_t` is None, which results in the `prepare_rotary_position_embeddings` function.	2024-12-13 04:02:27 +00:00
OleehyO	36f1333788	Fix for deepspeed training	2024-12-13 04:02:26 +00:00
OleehyO	4d1b9fd166	Fix for Disney video dataset	2024-12-13 04:02:21 +00:00
Yuxuan.Zhang	5aa6d3a9ee	Merge pull request #515 from Gforky/fix_finetune_demo [fix]fix deepspeed initialization issue in finetune examples	2024-12-02 11:29:42 +08:00
spacegoing	2fb763d25f	[Fix] fix rope temporal patch size	2024-11-21 16:26:45 +00:00
luwen.miao	ac2f2c78f7	[fix]fix deepspeed initialization issue in finetune examples	2024-11-18 09:49:31 +00:00
zR	e6ee283d0e	Merge branch 'CogVideoX_dev' of github.com:THUDM/CogVideo into CogVideoX_dev	2024-10-14 11:34:40 +08:00
zR	e169e7b045	Update train_cogvideox_image_to_video_lora.py	2024-10-06 22:50:56 +08:00
Yuxuan.Zhang	532f246d7c	Merge pull request #389 from THUDM/CogVideoX_dev I2V Finetune of CogVIdeoX-5B-I2V	2024-10-05 22:14:52 +08:00

1 2

60 Commits