All Datasets

Search our off-the-shelf datasets.

Filter by
Competition-level Mathematics, Physics Reasoning Corpus
This dataset is for AI models to train to learn to extract critical information from problem statements and methodically derive solutions. This type of dataset proves particularly valuable for developing automated question-answering systems and AI applications requiring sophisticated reasoning capabilities.
University-level Business, Law, Medicine Reasoning Corpus
This dataset is for AI models to train to learn to extract critical information from problem statements and methodically derive solutions. This type of dataset proves particularly valuable for developing automated question-answering systems and AI applications requiring sophisticated reasoning capabilities.
University-level Mathematics, Physics, Chemistry, Computer Science Reasoning Corpus
This dataset is for AI models to train to learn to extract critical information from problem statements and methodically derive solutions. This type of dataset proves particularly valuable for developing automated question-answering systems and AI applications requiring sophisticated reasoning capabilities.
K12 (Primary/Junior/Senior High) Testing Questions Across all Subjects
This dataset is for AI models to train to learn to extract critical information from problem statements and methodically derive solutions. This type of dataset proves particularly valuable for developing automated question-answering systems and AI applications requiring sophisticated reasoning capabilities
High-Quality Coding Q&A Corpus
This dataset supports AI training in code comprehension, debugging, and complex logic reasoning, enabling applications such as automated code generation, technical documentation assistants, and intelligent programming tutors.
K12, University, and Graduate-Level Professional Subject Q&A Corpus
This dataset consisted of millions of high-quality Chain-of-Thought (CoT) questions are derived from authoritative sources and include questions and answers. These datasets undergo rigorous processing steps such as question screening, entry, duplicate checking, solving, review, and proofreading, followed by strict quality control to form standardized question banks.
Multiclass Sound Effects Corpus
Human Sound, Human Voice, Respiratory Sound, Digestive Sound, Human Locomotion, Heartbeat
English Average Voice Synthesis Corpus – Conversation
Participants in pairs are recorded in the same studio, with each individual's voice captured in a separate audio file. No text transcriptions are currently available.
Sichuan Dialect Speech Recognition Corpus

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.

Filter by