ASR

Search our off-the-shelf datasets.

Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios
More
Chinese Mandarin Speech Recognition Corpus (Telephone)
This dataset was recorded in a quiet office environment, with 98 speakers participating, including 53 males and 45 females. All speakers involved in the recording were professionally selected to ensure standard pronunciation and clear enunciation. The recorded text covers information such as place names.
Chinese Mandarin Speech Recognition Corpus (Telephone)
This dataset was recorded in a quiet office environment, with 100 speakers participating, including 53 males and 47 females. All speakers involved in the recording were professionally selected to ensure standard pronunciation and clear enunciation. The recorded text covers information such as numerical strings.
Chinese Mandarin Speech Recognition Corpus for the Elderly and Children
This dataset is specifically tailored to capture the nuances of speech from the elderly and children, two demographic groups with distinct vocal characteristics. This dataset is recorded using desktop equipment to ensure high audio quality, and all recordings take place in a quiet environment to minimize background noise. The corpus includes read speech, which is beneficial for training speech recognition models on clear and deliberate pronunciations. The gender balance in the dataset ensures that the recognition system can accurately interpret both male and female voices. Furthermore, the speakers are drawn from the seven major Chinese dialect regions, providing a diverse and balanced distribution of accents and speech patterns. For the children's recordings, the dataset includes speech samples from interactive car control systems, children's audiobooks, children's video content, and music featuring children's songs and popular tunes from platforms like TikTok. The elderly recordings cover similar domains with a focus on applications and content that cater to their preferences, such as car control, map navigation, audiobooks with programs selected for an older audience, and music that includes selections favored by the elderly. This comprehensive approach ensures that the speech recognition system can effectively adapt to the unique speech traits of these age groups across various contexts and dialects.
Chinese Mandarin Speech Recognition Corpus_Microphone Array (Desktop)
This dataset was recorded in both quiet and noisy environments, with a total of 300 speakers participating, including 149 males and 151 females. All speakers involved in the recording were professionally selected to ensure standard pronunciation and clear articulation. The recorded text covers everyday expressions, wake-up words, and other information.
Chinese Mandarin Speech Recognition Corpus_Studio (Desktop)
This dataset was recorded in a recording studio environment, with 200 speakers participating, including 100 males and 100 females. All speakers involved in the recording were professionally selected to ensure standard pronunciation and clear enunciation. The recorded text covers information such as news.
Chinese Mandarin Speech Recognition Corpus(Mobile)
This dataset was recorded in both quiet and noisy environments, with a total of 300 speakers participating, including 147 males and 153 females. All speakers involved in the recording process were professionally selected to ensure standardized pronunciation and clear enunciation. The recorded texts encompass information from news, everyday conversations, online literature, and other relevant topics.
Chinese Mandarin VPR Corpus (Mobile)
This dataset was recorded in a quiet office/home environment, with a total of 300 speakers participating, including 133 males and 167 females. All speakers who took part in the recording were professionally selected to ensure standardized pronunciation and clear articulation. The recorded texts span information from news, everyday conversations, tweets, and other similar content.
Chinese Mandarin Wake-up Words Speech Recognition Corpus (Mobile)
This dataset was recorded in both quiet office and home environments, with the participation of 460 speakers, including 222 males and 238 females. All speakers involved in the recording were professionally selected to ensure standardized pronunciation and clear enunciation. The recorded texts cover wake-up words and other information.
Chinese Mandarin Wake-up Words Speech Recognition Corpus (TicHome Mini)
This dataset was recorded in both quiet and noisy environments, with a total of 203 speakers participating, including 96 males and 107 females. All speakers involved in the recordings were professionally selected to ensure standard pronunciation and clear articulation. The recorded texts cover information related to wake words and other relevant data.

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.

Filter by
Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios
More