diff --git a/README.md b/README.md index 441f0b7..885ff4d 100644 --- a/README.md +++ b/README.md @@ -2,18 +2,21 @@ This is the official repo for the paper: CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers. - + +https://user-images.githubusercontent.com/48993524/170857367-2033c514-3c9f-4297-876f-2468592a254b.mp4 + ## Generated Samples **Video samples generated by CogVideo**. The actual text inputs are in Chinese. Each sample is a 4-second clip of 32 frames, and here we sample 9 frames uniformly for display purposes. -![Overview](assets/intro-image.pdf) +![Intro images](assets/intro-image.pdf) ![More samples](assets/appendix-moresamples.pdf) -**CogVideo is able to generate relatively high-frame-rate videos. ** A 4-second clip of 32 frames is shown below. +**CogVideo is able to generate relatively high-frame-rate videos.** +A 4-second clip of 32 frames is shown below. -![Overview](assets/appendix-sample-highframerate.pdf) \ No newline at end of file +![high-frame-rate sample](assets/appendix-sample-highframerate.pdf)