Chinese Mandarin Speech Recognition Corpus for the Elderly and Children

This dataset is specifically tailored to capture the nuances of speech from the elderly and children, two demographic groups with distinct vocal characteristics. This dataset is recorded using desktop equipment to ensure high audio quality, and all recordings take place in a quiet environment to minimize background noise. The corpus includes read speech, which is beneficial for training speech recognition models on clear and deliberate pronunciations. The gender balance in the dataset ensures that the recognition system can accurately interpret both male and female voices. Furthermore, the speakers are drawn from the seven major Chinese dialect regions, providing a diverse and balanced distribution of accents and speech patterns. For the children's recordings, the dataset includes speech samples from interactive car control systems, children's audiobooks, children's video content, and music featuring children's songs and popular tunes from platforms like TikTok. The elderly recordings cover similar domains with a focus on applications and content that cater to their preferences, such as car control, map navigation, audiobooks with programs selected for an older audience, and music that includes selections favored by the elderly. This comprehensive approach ensures that the speech recognition system can effectively adapt to the unique speech traits of these age groups across various contexts and dialects.
Specifications:
ID:
King-ASR-953
Size:
400 hours
Language:
Chinese Mandarin
Devices:
Desktop

People also searched for

People from Multi-Country Speak Spanish Corpus
This corpus contains 5,763 speakers with a balanced gender ratio. The speakers are from Spain, Mexico, America,Argentina, and Colombia. The age range is from 16 to 80 years old.
India Multilingual Speech Corpus
This corpus covers 12 languages of India with 13,150 speakers.The languages including Assamese,English,Gujarati,Hindi,Kashmiri,Malayalam,Marathi,Odia,Punjabi,Tamil,Telugu,and Urdu
People from Multi-Country Speak English Corpus
This corpus comprises recordings from 35,628 speakers with each speaker contributing between 10 to 60 minutes of speech. The gender distribution is approximately equal. The age range of the speakers spans from 7 to 80 years old. It includes a diverse array of accents, representing 64 countries including China, the United States, the United Kingdom, Canada, Australia, Japan, South Korea, and many others.
Morocco Arabic Speech Recognition Corpus ( Phone )
This dataset covers free dialogue content, the topics include news, text messages, car control, music, general, maps, daily oral language, family, health, travel, work, socializing, celebrities, weather, and other common topics in life.

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.