This dataset was recorded by 5 speakers with authentic pronunciation and diverse vocal qualities (one male and four females) in a professional recording studio. The recorded texts span the full range of phonemes, and the annotators have a professional linguistic background, ensuring the data meets the research and development needs for voice synthesis.
Text, audio, phone labeling, quality inspection, xml labeling, run labeling, phonetic labeling
Accuracy Rate
The accuracy rate of phonetic labeling is 99.5%.
Samples
Audio
But to my great surprise , Ever since I looked in your eyes
I aimed for the sky , A nine year old can see so far
People also searched for
Chinese American English Synthesis Corpus
This datasets contains 80 speakers, with a balanced gender ratio, approximately 1.5 hours of data per speaker.
Existing labeling stages: Pronunciation, Prosody
Ongoing labeling: Phoneme boundaries
Overview: Focuses on common/fundamental language, includes everyday dialogue in a natural style