All Datasets

Search our off-the-shelf datasets.

Filter by
People from Multi-Country Speak Spanish Corpus
This corpus contains 5,763 speakers with a balanced gender ratio. The speakers are from Spain, Mexico, America,Argentina, and Colombia. The age range is from 16 to 80 years old.
India Multilingual Speech Corpus
This corpus covers 12 languages of India with 13,150 speakers.The languages including Assamese,English,Gujarati,Hindi,Kashmiri,Malayalam,Marathi,Odia,Punjabi,Tamil,Telugu,and Urdu
People from Multi-Country Speak English Corpus
This corpus comprises recordings from 35,628 speakers with each speaker contributing between 10 to 60 minutes of speech. The gender distribution is approximately equal. The age range of the speakers spans from 7 to 80 years old. It includes a diverse array of accents, representing 64 countries including China, the United States, the United Kingdom, Canada, Australia, Japan, South Korea, and many others.
Chinese American English Synthesis Corpus
This datasets contains 80 speakers, with a balanced gender ratio, approximately 1.5 hours of data per speaker. Existing labeling stages: Pronunciation, Prosody Ongoing labeling: Phoneme boundaries Overview: Focuses on common/fundamental language, includes everyday dialogue in a natural style
Morocco Arabic Speech Recognition Corpus ( Phone )
This dataset covers free dialogue content, the topics include news, text messages, car control, music, general, maps, daily oral language, family, health, travel, work, socializing, celebrities, weather, and other common topics in life.
People from Multi-Country Speak English Corpus
Topics include news, text messages, car control, music, general, maps, daily oral language, family, health, travel, work, socializing, celebrities, weather, and other common topics in life. Including USA, UK, Canada, and Australia accents.
Chinese Casual Chat Corpus
Casual Chat Data, collecting 8 million daily questions or single chat sentences for large model training and subsequent question-answering generation.
Hong Kong POI Dataset with Cantonese Pinyin Labeling
Collect Hong Kong Cantonese corpus, including place names and other information, and perform POI tagging and pinyin labeling.
English-Arabic Parallel Corpus
Daily data in English and Arabic, parallel corpus dataset

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.

Filter by