This dataset was recorded by 620 speakers with authentic pronunciation and diverse vocal qualities (334 males and 286 females) in a quiet indoor environment. The recorded texts cover all phonemes, and the annotators have a professional linguistic background, ensuring the data meets the research and development needs for voice synthesis.
English Average Voice Synthesis Corpus – Conversation
Participants in pairs are recorded in the same studio, with each individual's voice captured in a separate audio file. No text transcriptions are currently available.