celebv-text.github.io - CelebV-Text: A Large-Scale Facial Text-Video Dataset

Description: CelebV-Text: A Large-Scale Facial Text-Video Dataset

celebv-text dataset (1)

Example domain paragraphs

Currently, text-driven generation models are booming in video editing with their compelling results. However, for the face-centric text-to-video generation, challenges remain severe as a suitable dataset with high-quality videos and highly-relevant texts is lacking. In this work, we present a large-scale, high-quality, and diverse facial text-video dataset, CelebV-Text , to facilitate the research of facial text-to-video generation tasks.

CelebV-Text contains 70,000 in-the-wild face video clips covering diverse visual content. Each video clip is paired with 20 texts generated by the proposed semi-auto text generation strategy, which is able to describe both the static and dynamic attributes precisely. We make comprehensive statistical analysis on videos, texts, and text-video relevance of CelebV-Text, verifying its superiority over other datasets. Also, we conduct extensive self-evaluations to show the effectiveness and potential of CelebV-T

For more details of the dataset, please refer to the paper " CelebV-Text: A Large-Scale Facial Text-Video Dataset ".

Links to celebv-text.github.io (2)