This dataset was recorded by 3 speakers with authentic pronunciation and distinct vocal qualities (2 males and 1 female) in a professional recording studio. The recorded texts span the full range of phonemes, and the annotators have a professional linguistic background, ensuring the data meets the research and development needs for voice synthesis.
This datasets contains 80 speakers, with a balanced gender ratio, approximately 1.5 hours of data per speaker.
Existing labeling stages: Pronunciation, Prosody
Ongoing labeling: Phoneme boundaries
Overview: Focuses on common/fundamental language, includes everyday dialogue in a natural style