This dataset was recorded by 97 speakers with authentic pronunciation and diverse vocal qualities (49 males and 48 females) in a professional recording studio. The recorded texts cover all phonemes, and the annotators have a professional linguistic background, ensuring the data meets the research and development needs for voice synthesis.
News, education, film and television, and other fields.
Labeling Process
Text, audio, proofreading
Accuracy Rate
The accuracy rate of phonetic labeling is 99.5%.
Samples
Audio
People also searched for
Chinese American English Synthesis Corpus
This datasets contains 80 speakers, with a balanced gender ratio, approximately 1.5 hours of data per speaker.
Existing labeling stages: Pronunciation, Prosody
Ongoing labeling: Phoneme boundaries
Overview: Focuses on common/fundamental language, includes everyday dialogue in a natural style