Video data of human head poses and facial expressions, collected indoors across diverse daily and work environments (e.g., office, meeting room, home, dormitory, corridor). Each participant records one video, with the portrait framed approximately at head-and-shoulder size. The recorded content includes head movements (up, down, left, right) and mouth actions (open, close), combined into various pose–action sequences. Lighting conditions cover common scenarios such as normal light, low light, and backlight, ensuring that facial details remain clearly visible.
Data collection covers both indoor and outdoor environments, including offices, meeting rooms, parking lots, gardens, and other common work and daily-life scenarios.
Lighting conditions include normal lighting, low-light, and backlit scenarios commonly encountered in real-world settings.
Twenty video clips per participant, with each clip corresponding to the presentation of a single object and accompanied by three close-up images of the object from different angles.
The subject appears in upper-body or full-body views, holding one object with one or both hands, recorded in standing or seated postures. The object is moved according to predefined actions.
Each video clip includes a brief verbal description of the object provided by the model.
The subject is clearly visible, and the face is not occluded for extended periods during recording.