The Foundation Powering Advanced AI

Competition-level Mathematics, Physics Reasoning Corpus

This dataset is for AI models to train to learn to extract critical information from problem statements and methodically derive solutions. This type of dataset proves particularly valuable for developing automated question-answering systems and AI applications requiring sophisticated reasoning capabilities.

University-level Business, Law, Medicine Reasoning Corpus

Chain of Thought Reasoning Coding

University-level Mathematics, Physics, Chemistry, Computer Science Reasoning Corpus

K12 (Primary/Junior/Senior High) Testing Questions Across all Subjects

High-Quality Coding Q&A Corpus

This dataset supports AI training in code comprehension, debugging, and complex logic reasoning, enabling applications such as automated code generation, technical documentation assistants, and intelligent programming tutors.

Agentic AI Coding CoT

DMS with Multi-skin color Drivers Corpus

This dataset serves as a comprehensive resource for developing and testing driver monitoring systems, specifically designed to enhance in-vehicle safety through behavioral analysis.

Competition-level Mathematics, Physics Reasoning Corpus

University-level Business, Law, Medicine Reasoning Corpus

Chain of Thought Reasoning Coding

University-level Mathematics, Physics, Chemistry, Computer Science Reasoning Corpus

K12 (Primary/Junior/Senior High) Testing Questions Across all Subjects

High-Quality Coding Q&A Corpus

Agentic AI Coding CoT

DMS with Multi-skin color Drivers Corpus

This dataset serves as a comprehensive resource for developing and testing driver monitoring systems, specifically designed to enhance in-vehicle safety through behavioral analysis.

TTS Human Sound Human Voice

TTS

Hundreds of Sound Effects Corpus

The database includes 107 categories, with a total of 69,154 sets of audio files and corresponding labeling. The total duration of the audio is about 254.81 hours,including the zero-cleared/original silence at the beginning and ending that need to be retained. The duration of the labeling is about 248.59 hours, which only covers the duration of sound events. The sound effect content is organized into 4 first-level categories (Human Sounds, Animal Sounds, Environmental Sounds, Mechanical Sounds), 26 second-level categories, and 107 third-level categories (see Appendix). It basically covers the sound elements in common real-life scenarios. In the labeling process, the occurrence and duration of each sound effect (sound event) in the audio are annotated in the form of timestamps, and detailed descriptions are provided in natural Chinese language.

Dance Video Virtual human Dance Education

High-Definition Dance Video Corpus

Product Features: This dataset has collected 100,000 dance videos, each averaging 30 seconds in length, at 4K resolution, including adults and teenagers with a foundation in dance, with a balanced gender ratio. It includes both solo and group dances, with high richness in videos from various angles such as front, side, back, and turning. Dance types include folk dance, jazz, street dance, and more. Application Fields: This dataset can be applied to virtual humans, VR, dance education, video production, and other fields, promoting the application and development of multimodal technology in the corresponding areas.

DMS with Multi-skin color Drivers Corpus

This dataset serves as a comprehensive resource for developing and testing driver monitoring systems, specifically designed to enhance in-vehicle safety through behavioral analysis.

Education and learning Smart search Chinese

ASR

Mandarin Speech Recognition Corpus (Desktop)

This database is collected over Desktop microphones in quiet (office/home/Studio) environment, which were from 13895 speakers, including 6811 male and 7084 female. The total pure recording time is about 5867.9 hours, including the reasonable leading and trailing silence.

Virtual human VR Video Collection

Lip-movement Video Corpus

The corpus uses high-definition cameras to capture lip speech video data from approximately 208 individuals. The capture scenario is an indoor quiet environment, simulating various types of lighting, including normal light, strong light, backlight, and weak light. The shooting distance includes 0.5m and 1m, with a primary focus on 0.5m, accounting for about 90% of the recordings. The shooting angle is frontal, with the imaging size focusing mainly on the upper body. In addition to solo collections, the collection also simulates queue scenarios, with about 30% of each person's video data being collected in multi-person scenarios, where the number of people appearing in the multi-person scenes is mostly two. The collectors primarily speak Mandarin (prioritizing northern pronunciation individuals, with some collectors having better southern Mandarin pronunciation), some collectors may have a slight local accent, speaking at a normal pace, recording 10 sentences per person, with an average of 10 to 15 characters per sentence. The collectors' ages range from 7 to over 60 years old, mainly children and middle-aged and young people, with a balanced gender ratio. While the video is being recorded, there is also a front-facing interface microphone recording synchronized with the collector, and the other audio file comes from the collected video.

TTS Human Sound Human Voice

TTS

Hundreds of Sound Effects Corpus

Dance Video Virtual human Dance Education

High-Definition Dance Video Corpus

DMS with Multi-skin color Drivers Corpus

This dataset serves as a comprehensive resource for developing and testing driver monitoring systems, specifically designed to enhance in-vehicle safety through behavioral analysis.

Education and learning Smart search Chinese

ASR

Mandarin Speech Recognition Corpus (Desktop)

Virtual human VR Video Collection

Lip-movement Video Corpus