CogVideo

mirror of https://github.com/THUDM/CogVideo.git synced 2025-12-05 20:22:09 +08:00

Author	SHA1	Message	Date
google-labs-jules[bot]	ebc9d39c02	feat: Add knowledge distillation for logo generation with VGG teacher This commit introduces a knowledge distillation module to enhance logo generation in the CogVideoX-2B text-to-video model. The key changes include: - A new `KDTrainer` class that inherits from `CogVideoXT2VLoraTrainer`. This trainer loads a teacher model and computes a knowledge distillation loss to guide the student model. - The teacher model loading logic has been updated to support a VGG16-based Faster R-CNN model, to be compatible with user-provided weights. This includes a custom construction of the Faster R-CNN model with a VGG16 backbone and appropriate RoI heads. - The `kd` training type is now supported, allowing users to select it from the command line. - New command-line arguments (`teacher_model_path`, `teacher_model_num_classes`, `kd_loss_weight`) have been added to configure the knowledge distillation process. - A new configuration file (`cogvideox_2b_kd.yaml`) is provided as an example for running a `kd` training session.	2025-08-21 10:13:06 +00:00
google-labs-jules[bot]	193b1f4dcb	feat: Add knowledge distillation for logo generation This commit introduces a knowledge distillation module to enhance logo generation in the CogVideoX-2B text-to-video model. The key changes include: - A new `KDTrainer` class that inherits from `CogVideoXT2VLoraTrainer`. This trainer loads a teacher model (OpenLogo Faster R-CNN) and computes a knowledge distillation loss to guide the student model. - The `kd` training type is now supported, allowing users to select it from the command line. - New command-line arguments (`teacher_model_path`, `teacher_model_num_classes`, `kd_loss_weight`) have been added to configure the knowledge distillation process. - A new configuration file (`cogvideox_2b_kd.yaml`) is provided as an example for running a `kd` training session.	2025-08-21 09:14:51 +00:00
OleehyO	e519eced78	Update README_zh.md	2025-05-14 11:07:02 +08:00
OleehyO	657eee4379	Update README_zh.md	2025-05-14 11:05:17 +08:00
OleehyO	503a9faa93	Update README.md	2025-05-14 11:03:41 +08:00
Yuxuan Zhang	39c6562dc8	format	2025-03-22 15:14:06 +08:00
OleehyO	0e78f20629	Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev	2025-01-14 04:00:11 +00:00
Yuxuan Zhang	4615479b51	move to tools	2025-01-14 11:33:02 +08:00
Yuxuan Zhang	7993670957	zero_to_bf16	2025-01-14 11:31:25 +08:00
OleehyO	4878edd0cf	fix: correct do_validation argument parsing	2025-01-13 12:48:21 +00:00
Yuxuan Zhang	78275b0480	add comment of bash scripts	2025-01-13 20:02:06 +08:00
OleehyO	455b44a7b5	chore: code cleanup and parameter optimization - Remove redundant comments and debug information - Adjust default parameters in training scripts - Clean up code in lora_trainer and trainer implementations	2025-01-13 11:56:28 +00:00
zR	1534bf33eb	add pipeline	2025-01-12 19:27:21 +08:00
OleehyO	70c899f444	chore: update default training configurations	2025-01-12 08:50:15 +00:00
OleehyO	b362663679	fix: normalize image tensors in I2VDataset	2025-01-12 06:01:48 +00:00
OleehyO	30ba1085ff	Merge remote-tracking branch 'upstream/main' into dev	2025-01-12 05:58:07 +00:00
OleehyO	f5169385bd	docs: add SFT support documentation in multilingual README	2025-01-12 05:53:13 +00:00
OleehyO	795dd144a4	Rename lora training scripts as ddp	2025-01-12 05:36:32 +00:00
OleehyO	fdb9820949	feat: support DeepSpeed ZeRO-3 and optimize peak memory usage - Add DeepSpeed ZeRO-3 configuration support - Optimize memory usage during training - Rename training scripts to reflect ZeRO usage - Update related configuration files and trainers	2025-01-12 05:33:56 +00:00
Zheng Guang Cong	09a49d3546	fix bug of i2v; video is already 0-255 video is already 0-255 and should not be multiplied 255 any more	2025-01-11 17:29:27 +08:00
Zheng Guang Cong	cd861bbe1e	Update i2v_dataset.py image should also be transformed to [-1, 1]	2025-01-11 17:24:35 +08:00
Zheng Guang Cong	35383e2db3	fix potential bug of i2v Image value is in [0, 255] and should be transformed into [-1, 1], similar to video.	2025-01-11 17:08:25 +08:00
OleehyO	caa24bdc36	feat: add SFT support with ZeRO optimization strategies - Add SFT (Supervised Fine-Tuning) trainers for all model variants: - CogVideoX I2V and T2V - CogVideoX-1.5 I2V and T2V - Add DeepSpeed ZeRO configuration files: - ZeRO-2 with and without CPU offload - ZeRO-3 with and without CPU offload - Add base accelerate config for distributed training - Update trainer.py to support SFT training mode This enables full-parameter fine-tuning with memory-efficient distributed training using DeepSpeed ZeRO optimization.	2025-01-11 02:13:32 +00:00
OleehyO	e213b6c083	fix: pad latent frames to match patch_size_t requirements	2025-01-11 02:08:07 +00:00
OleehyO	f6d722cec7	fix: remove copying first video frame as conditioning image	2025-01-09 15:52:51 +00:00
OleehyO	07766001f6	feat(dataset): pad short videos by repeating last frame When loading videos with fewer frames than max_num_frames, repeat the last frame to reach the required length instead of failing. This ensures consistent tensor dimensions across the dataset while preserving as much original video content as possible.	2025-01-08 02:14:56 +00:00
OleehyO	249fadfb76	docs: add hardware requirements for model training Add a table in README files showing hardware requirements for training different CogVideoX models, including: - Memory requirements for each model variant - Supported training types (LoRA) - Training resolutions - Mixed precision settings Updated in all language versions (EN/ZH/JA).	2025-01-08 01:39:37 +00:00
OleehyO	10de04fc08	perf: cast VAE and text encoder to target dtype before precomputing cache Before precomputing the latent cache and text embeddings, cast the VAE and text encoder to the target training dtype (fp16/bf16) instead of keeping them in fp32. This reduces memory usage during the precomputation phase. The change occurs in prepare_dataset() where the models are moved to device and cast to weight_dtype before being used to generate the cache.	2025-01-08 01:38:13 +00:00
OleehyO	0e21d41b12	Merge remote-tracking branch 'upstream/CogVideoX_dev' into dev	2025-01-07 09:51:48 +00:00
OleehyO	392e37021a	Add video path to error message for better debugging	2025-01-07 09:50:21 +00:00
zR	11935892ae	remove --image_column	2025-01-07 16:37:11 +08:00
OleehyO	ee1f666206	docs: update READMEs with auto first-frame extraction feature	2025-01-07 06:45:10 +00:00
OleehyO	e084a4a270	feat: auto-extract first frames as conditioning images for i2v model When training i2v models without specifying image_column, automatically extract and use first frames from training videos as conditioning images. This includes: - Add load_images_from_videos() utility function to extract and cache first frames - Update BaseI2VDataset to support auto-extraction when image_column is None - Add validation and warning message in Args schema for i2v without image_column The first frames are extracted once and cached to avoid repeated video loading.	2025-01-07 06:43:26 +00:00
OleehyO	96e511b413	feat: add warning for fp16 mixed precision training	2025-01-07 06:00:38 +00:00
OleehyO	36427274d6	style: format import statements across finetune module	2025-01-07 05:54:52 +00:00
zR	1789f07256	format and check fp16 for cogvideox2b	2025-01-07 13:16:18 +08:00
OleehyO	9157e0cbc8	Adapt dataset for text embeddings and add noise padding - Add text embedding support in dataset collation - Pad 2 random noise frames at the beginning of latent space during training	2025-01-06 10:44:58 +00:00
OleehyO	49dc370de6	fix: remove pipeline hooks after validation - Add pipe.remove_all_hooks() after validation to prevent memory leaks - Clean up validation pipeline properly to avoid potential issues in subsequent training steps	2025-01-04 06:21:17 +00:00
OleehyO	93b906b3fb	docs: clarify train_frames includes padding frame Add docstring to train_frames field in State schema to explicitly indicate that it includes one image padding frame	2025-01-04 06:20:25 +00:00
OleehyO	7e1ac76847	feat(cogvideox): add prompt embedding caching support This change enables caching of prompt embeddings in the CogVideoX text-to-video LoRA trainer, which can improve training efficiency by avoiding redundant text encoding operations.	2025-01-04 06:17:56 +00:00
OleehyO	66e4ba2592	fix(cogvideox): add prompt embedding caching and fix frame padding - Add support for cached prompt embeddings in dataset - Fix bug where first frame wasn't properly padded in latent space	2025-01-04 06:16:42 +00:00
OleehyO	de5bef6611	feat(args): add train_resolution validation for video frames and resolution - Add validation to ensure (frames - 1) is multiple of 8 - Add specific resolution check (480x720) for cogvideox-5b models - Add error handling for invalid resolution format	2025-01-04 06:16:42 +00:00
OleehyO	ffb6ee36b4	docs: update finetune documentation in all languages	2025-01-04 06:16:42 +00:00
OleehyO	c817e7f062	chore: update default training parameters for t2v and i2v scripts	2025-01-04 06:16:42 +00:00
OleehyO	e5b8f9a2ee	feat: add caching for prompt embeddings - Add caching for prompt embeddings - Store cached files using safetensors format - Add cache directory structure under data_root/cache - Optimize memory usage by moving tensors to CPU after caching - Add debug logging for cache hits - Add info logging for cache writes The caching system helps reduce redundant computation and memory usage during training by: 1. Caching prompt embeddings based on prompt text hash 2. Caching encoded video latents based on video filename 3. Moving tensors to CPU after caching to free GPU memory	2025-01-04 06:16:31 +00:00
OleehyO	f731c35f70	Add unload_model function	2025-01-03 08:21:27 +00:00
OleehyO	a88c1ede69	feat(args): add validation for training resolution - Add validation check to ensure number of frames is multiple of 8 - Add format validation for train_resolution string (frames x height x width)	2025-01-02 03:12:09 +00:00
OleehyO	362b7bf273	docs: update README in multiple languages	2025-01-02 03:07:34 +00:00
OleehyO	7fa1bb48be	refactor: remove deprecated training scripts	2025-01-01 15:56:14 +00:00
OleehyO	48ad178818	Reorganize training script arguments	2025-01-01 15:52:39 +00:00

1 2

88 Commits