Product Features: Captured in a conference setting, with participants maintaining a neutral facial expression throughout, slowly walking around the room, without side glances or looking up/down, and with faces unobstructed. Each participant records 1–2 sets of videos (standing/sitting) using four cameras simultaneously (Logitech Rally, Aver CAM 550, Yealink UVC86, Poly E60). Additionally, two sets of photos are collected: one taken on the day using a computer, and one personal photo taken within the past two years using a phone.
Ethnicities Collected: Black, White, Asian (non-Chinese), Brown.
Age Range: All age groups, with balanced gender ratio.
Video/Image Specifications: Video resolution 1080P or higher, photo resolution 720P; each video is approximately 1 minute long.
Data collection covers both indoor and outdoor environments, including offices, meeting rooms, parking lots, gardens, and other common work and daily-life scenarios.
Lighting conditions include normal lighting, low-light, and backlit scenarios commonly encountered in real-world settings.
Twenty video clips per participant, with each clip corresponding to the presentation of a single object and accompanied by three close-up images of the object from different angles.
The subject appears in upper-body or full-body views, holding one object with one or both hands, recorded in standing or seated postures. The object is moved according to predefined actions.
Each video clip includes a brief verbal description of the object provided by the model.
The subject is clearly visible, and the face is not occluded for extended periods during recording.