This dataset was recorded by a 41-year-old male speaker with authentic pronunciation and a calm, rugged vocal quality in a professional recording studio. The recorded texts span the full range of phonemes, and the annotators have a professional linguistic background, ensuring the data meets the research and development needs for voice synthesis.
This datasets contains 80 speakers, with a balanced gender ratio, approximately 1.5 hours of data per speaker.
Existing labeling stages: Pronunciation, Prosody
Ongoing labeling: Phoneme boundaries
Overview: Focuses on common/fundamental language, includes everyday dialogue in a natural style