This datasets contains 80 speakers, with a balanced gender ratio, approximately 1.5 hours of data per speaker.
Existing labeling stages: Pronunciation, Prosody
Ongoing labeling: Phoneme boundaries
Overview: Focuses on common/fundamental language, includes everyday dialogue in a natural style
English Average Voice Synthesis Corpus – Conversation
Participants in pairs are recorded in the same studio, with each individual's voice captured in a separate audio file. No text transcriptions are currently available.