TTS

Search our off-the-shelf datasets.

Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios
American English Female Speech Synthesis Corpus
The database recorded 8,810 sentences (77,291 words) from a female voice talent. The total audio duration is about 10.56 hours, including the original silence at the beginning and ending (about 350 ms each). The recorded content is organized into 19 texts, including multiple fields, such as news, digit, money, date, etc. We used en-us_cmu phone set for labeling.
American English Female Speech Synthesis Corpus
The database recorded 13,709 sentences (154,586 words) from a female voice talent. The total audio duration is about 19.86 hours, including the original silence at the beginning and ending (about 350 ms each). The recorded content is organized into 23 texts, including multiple fields, such as news, dialog, digit, etc. We used en-us_cmu phone set for labeling.
American English Female Speech Synthesis Corpus
The database recorded 6534 sentences (81914 words) from a female voice talent. The total audio duration is about 10.75 hours, including the original silence at the beginning and ending (about 500 ms each). The recorded content is organized into 13 texts, including multiple fields, such as news, letter, etc. We used en-us_cmu phone set for labeling.
American English Female Speech Synthesis Corpus
The database recorded 10439 sentences (122386 words) from a female voice talent. The total audio duration is about 13.39 hours, including the clear silence at the beginning and ending (about 500 ms each). The recorded content is organized into 39 texts, including multiple field, such as news,dialog, digit, time, etc. We used en-us_cmu phone set for labeling.
American English Female Speech Synthesis Corpus (Advertising Marketing)
The database recorded 744 sentences (13,002 words) from a female voice talent. The total audio duration is about 1.21 hours, including the zero silence at the beginning and ending (about 350 ms each). The recorded content is organized into 1 texts, focused on domain of advertising marketing. We en-us_cmu phone set for labeling.
American English Female Speech Synthesis Corpus (Customer Service Style)
The database recorded 8,884 sentences (89,686 words) from a female voice talent. The total audio duration is about 9.15 hours, including the zero silence at the beginning and ending (about 300 ms each). The recorded content is organized into 5 texts, including multiple fields, such as greetings, service, dialog, etc. We used en-us_cmu phone set for labeling. The voice talent was born and raised in America, with standard English.She studied in broadcasting and performance, with a good line foundation. The recording has a gentle timbre and even speech rate.
American English Female Speech Synthesis Corpus (Livestreaming Style)
The database recorded 709 sentences (11,812 words) from a female voice talent. The total audio duration is about 1.4 hours, including the zero silence at the beginning and ending (about 350 ms each). The recorded content is organized into 1 texts, focused on domain of live broadcast. We en-us_cmu phone set for labeling.
American English Female Speech Synthesis Corpus (Mature and Steady)
The database recorded 2,624 sentences (44,192 words) from a female voice talent. The total audio duration is about 5.02 hours, including the original silence at the beginning and ending (about 300 ms each). The recorded content is organized into 14 texts, including multiple emotions such as neutral、happy、angry、sad、shocked、hate、afraid、shout、crying、laughing、weakness、curious、confusion、comfort, etc. The voice talent was born and raised in America in 1960, with standard Mandarin English with a good line foundation.
American English Female Speech Synthesis Corpus (Natural Conversational Style)
The database recorded 3,810 sentences (30,612 words) from a female voice talent. The total audio duration is about 2.74 hours, including the clear silence at the beginning and ending (about 300 ms each). The recorded content is organized into 2 texts, mainly about dialog. We used & en-us_cmu phone set for labeling. The voice talent was born and raised in Canada, with standard American English. She works in broadcasting industry, with a good line foundation. The recording has a gentle timbre and even speech rate.

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.

Filter by
Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios