The corpus uses six cameras and two microphone arrays to simultaneously capture the lip speech video data of speakers. The capture and filming scenario simulates the interior of a cockpit, with diverse shooting angles and lighting. Data is collected from 250 individuals, all of whom are adults, primarily middle-aged and young people. Each person's target effective recording time is approximately 0.5 hours, with an average of about 600 short sentences per person. The product library also extracts audio from any one of the six video routes captured for each ID, saving it as a separate audio file. The results from the six cameras will be aligned with an error of less than 30 milliseconds, and the two microphone results will also be synchronized with the camera results.
Product Features: This corpus includes data from 23 categories such as cuisine, landscapes, architecture, cities, countryside, health, sports, medical, automobiles, backgrounds, finance, education, oil paintings, illustrations, watercolors, travel, fashion, romance, animals, plants, space, and technology.
Product Features: Images from various scenarios, multiple time periods, and different shooting angles, covering architecture, displays, urban streetscapes, home environments, competition scenes, shopping malls, schools, exhibitions, and natural environments. Corresponding text descriptions are provided.
Product Features: This dataset has collected 100,000 dance videos, each averaging 30 seconds in length, at 4K resolution, including adults and teenagers with a foundation in dance, with a balanced gender ratio. It includes both solo and group dances, with high richness in videos from various angles such as front, side, back, and turning. Dance types include folk dance, jazz, street dance, and more.
Application Fields: This dataset can be applied to virtual humans, VR, dance education, video production, and other fields, promoting the application and development of multimodal technology in the corresponding areas.
【Product Features】
High-quality images of architecture and plants, with no blurring within the full size of the image, ensuring that both the foreground and background show clear textures even when enlarged; no more than 5 images of the same subject from different angles to ensure diversity in the content captured.
【Image Specifications】
Resolution above 4k (shoot in the highest quality mode with the camera); focal length within the range of 185mm to 235mm.