Name: Lip speech video was collected for 250 people - DataoceanAI
SKU: King-VD-018
Availability: InStock

Lip speech video was collected for 250 people

This dataset uses six cameras and two microphones to simultaneously collect and record the lip-voiced video data of the speaker. The shooting scene is simulated in a cockpit environment, and the shooting angles and lighting conditions are diverse.

Specifications:

ID:

King-VD-018

Size:

250 participant

Accuracy Rate

0.95

People also searched for

Supermarket self-scanning goods payment codes

This dataset was collected using high-definition cameras in six supermarket scenarios to capture the self-checkout scenes. The collected data covered six self-checkout scenarios, with approximately 10 people being filmed in each scenario. The checkout equipment types included vertical self-checkout machines and handheld scanning checkout machines. Each person was filmed using these two types of equipment to capture the checkout actions. There were two main types of self-checkout actions: normal checkout and abnormal checkout (with unchecked items). The collection equipment captured from three angles, including the person's direct above, the left above (20 degrees or 45 degrees), and the right above (20 degrees or 45 degrees). Based on the collected videos, three types of annotations were made: target item recognition detection box annotation - handheld device, target item recognition detection box annotation - self-checkout device, and video segment start and end frame annotation.

3D modeling data or market sence

The data collection equipment for this dataset is a 3D space scanning device. It was collected in Chinese supermarkets. Using a 3D space scanning camera device, 10 complete 3D scene data of medium and small-sized supermarkets were collected, and a 3D point cloud data model was generated for each supermarket.

Portrait segmentation video dataset

This dataset consists of 7 types of portrait video data, including selfies, live broadcasts, movies, TV shows, cartoons, anime, and variety shows. The video frames are extracted and labeled with semantic segmentation for human faces. The annotations cover both real humans and non-real humans. The product library contains labeled images covering normal light, dim light, and backlight conditions. The figures in the images include close-up photos, half-body photos, and full-body photos, and the poses are diverse, including frontal, side, and back views. The number of people in selfies and live broadcasts is mainly single, while other categories may have multiple people types (2-3 people). The overall data is mainly single-person. The categories of facial semantic segmentation annotations for the figures in the images are approximately 23 types.

Figure reference reading dataset

This dataset consists of video recordings of finger-pointing actions. The videos capture the complete movement of the index finger pointing from outside the camera lens towards the printed English words on a book. The finger must remain on the book for at least 2 seconds. The light sources used during the collection include LED lights and incandescent lamps, and the light coverage includes strong, medium, and weak types. The camera captures both the left and right hands. The captured range of the fingers includes the entire hand and arm, the entire index finger, half of the index finger, and four types: the entire hand, the entire index finger, half of the index finger, and the entire hand plus the arm. The types of books also include those with doodles and clean books, with or without creases, with or without curling. The collection equipment covers 5 positions: the front, upper right, upper left, right side, and left side of the screen, and is shot in landscape mode. The main subjects of the collection are children and middle-aged people.