All Datasets

Search our off-the-shelf datasets.

Filter by
High-Quality Coding Q&A Corpus
This dataset supports AI training in code comprehension, debugging, and complex logic reasoning, enabling applications such as automated code generation, technical documentation assistants, and intelligent programming tutors.
K12, University, and Graduate-Level Professional Subject Q&A Corpus
This dataset consisted of millions of high-quality Chain-of-Thought (CoT) questions are derived from authoritative sources and include questions and answers. These datasets undergo rigorous processing steps such as question screening, entry, duplicate checking, solving, review, and proofreading, followed by strict quality control to form standardized question banks.
Multiclass Sound Effects Corpus
Human Sound, Human Voice, Respiratory Sound, Digestive Sound, Human Locomotion, Heartbeat
English Average Voice Synthesis Corpus – Conversation
Participants in pairs are recorded in the same studio, with each individual's voice captured in a separate audio file. No text transcriptions are currently available.
Sichuan Dialect Speech Recognition Corpus
Sichuan Dialect Speech Recognition Corpus-Conversation
Haitian Creole Speech Recognition Corpus – conversation
Singapore English Speech Recognition Corpus
Hokkien Speech Recognition Corpus – Conversation

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.

Filter by