This dataset was recorded by 10 speakers with authentic pronunciation and diverse vocal qualities (4 males and 6 females) in a professional recording studio. The recorded texts cover all phonemes, and the annotators have a professional linguistic background, ensuring the data meets the research and development needs for voice synthesis.
Chinese Multi-speaker – Amateur Multi-Emotion, Multi-Style
This dataset consists of 11 hours of recordings with a balanced gender ratio. It has been meticulously labeled, including pronunciation, prosody, and voice quality labeling. The voice samples, recorded by non-professional speakers, offer a higher degree of naturalness and are categorized and labeled according to voice gender, perceived age, voice description, vocal cord condition, and pronunciation location. The topic includes multi-emotional data, covering emotions such as calm, happy, angry, sad, and more.