Lip-movement Video Corpus

The corpus uses high-definition cameras to capture lip speech video data from approximately 208 individuals. The capture scenario is an indoor quiet environment, simulating various types of lighting, including normal light, strong light, backlight, and weak light. The shooting distance includes 0.5m and 1m, with a primary focus on 0.5m, accounting for about 90% of the recordings. The shooting angle is frontal, with the imaging size focusing mainly on the upper body. In addition to solo collections, the collection also simulates queue scenarios, with about 30% of each person's video data being collected in multi-person scenarios, where the number of people appearing in the multi-person scenes is mostly two. The collectors primarily speak Mandarin (prioritizing northern pronunciation individuals, with some collectors having better southern Mandarin pronunciation), some collectors may have a slight local accent, speaking at a normal pace, recording 10 sentences per person, with an average of 10 to 15 characters per sentence. The collectors' ages range from 7 to over 60 years old, mainly children and middle-aged and young people, with a balanced gender ratio. While the video is being recorded, there is also a front-facing interface microphone recording synchronized with the collector, and the other audio file comes from the collected video.
Specifications:
ID:
King-VD-028
Size:
46.2 GB

People also searched for

Aesthetic Composition Training Corpus
Images are captured by professional photographers. Composition types include rule-of-thirds, horizontal, diagonal, triangular, and central composition. All images are evaluated and annotated by personnel with high aesthetic standards. Each image meets at least one composition type and at most three composition types.
Handheld Object Portrait Corpus
Data collection covers both indoor and outdoor environments, including offices, meeting rooms, parking lots, gardens, and other common work and daily-life scenarios. Lighting conditions include normal lighting, low-light, and backlit scenarios commonly encountered in real-world settings. Twenty video clips per participant, with each clip corresponding to the presentation of a single object and accompanied by three close-up images of the object from different angles. The subject appears in upper-body or full-body views, holding one object with one or both hands, recorded in standing or seated postures. The object is moved according to predefined actions. Each video clip includes a brief verbal description of the object provided by the model. The subject is clearly visible, and the face is not occluded for extended periods during recording.
English-Lao Parallel Corpus
Corpus Field: Most are inclined to fields such as news, transportation and tourism, daily life, sports and health, finance, and technology.
Chinese-Lao Parallel Corpus
Corpus Field: Most are inclined to fields such as news, transportation and tourism, daily life, sports and health, finance, and technology.

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.