TTS

Search our off-the-shelf datasets.

Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios
American English Male and Female Speech Synthesis Corpus (Customer and Audiobook)
This database contains 2000 sentences from one female speaker and one male speaker, with a total audio duration of approximately 2 hours. The texts include customer and audiobook field.
American English Male Speech Synthesis Corpus
The database recorded 9150 sentences (89731 words) from a male voice talent, collected of waveform and electroglottography (EGG) signal simultaneously. The total audio duration is about 12.18 hours, including the original silence at the beginning and ending (about 350 ms each). The recorded content is organized into 38 texts, including multiple fields, such as news, dialog, narration, digit, time, etc. We used en-us_cmu phone set for labeling.
American English Male Speech Synthesis Corpus
The database recorded 22,999 sentences (198,729 words) from a male voice talent. The total audio duration is about 22.70 hours, including the clear silence at the beginning and ending (about 400 ms each). The recorded content is organized into 9 texts, including multiple fields, such as news, dialog, food, education, health, etc. We used en-us_cmu phone set for labeling.
American English Male Speech Synthesis Corpus (Customer Service Style)
The database recorded 693 sentences (11,138 words) from a male voice talent. The total audio duration is about 1.16 hours, including the zero silence at the beginning and ending (about 350 ms each). The recorded content is organized into 1 texts, focused on domain of customer service. We en-us_cmu phone set for labeling.
American English Male Speech Synthesis Corpus (Gentle and Considerate)
The database recorded 3,546 sentences (29,005 words) from a male voice talent. The total audio duration is about 3.04 hours, including the original silence at the beginning and ending (about 300 ms each). The recorded content is organized into 11 texts, including multiple emotions, such as happy, angry, sad, surprise, neutral, etc. We used en-us_cmu phone set for labeling and proofreading. The voice talent was born and raised in America, and was 28 years old when recording the database. He has a standard Mandarin pronunciation and is a professional broadcaster.The recording has a gentle timbre and even speech rate.
American English Male Speech Synthesis Corpus (Gentle and Mature)
The database recorded 2,027 sentences(30,122 words) from a male voice talent. The total audio duration is about 3.02 hours, including the original silence at the beginning and ending (about 300 ms each). The recorded content is organized into 11 texts, including multiple emotions) such as happy, angry, sad, surprise, etc. We used en-us_cmu phone set for labeling and proofreading. The voice talent was born and raised in America in 1970, with standard Mandarin English with a good line foundation.
American English Male Speech Synthesis Corpus (Gentle and Mature)
The database recorded 2,756 sentences (289,23 words) from a female voice talent. The total audio duration is about 3.08 hours, including the original silence at the beginning and end (each approximately 300 ms) The recorded content is organized into 11 texts, including calm, happy, sad, angry, etc. emotion. We used en-us_cmu phone set for labeling and proofreading. The voice talent was born and raised in California, USA in 1976, with American English. She was 49 years old when recording the database. The recording has an even speech rate, and she has a strong ability to express emotions. .
American English Male Speech Synthesis Corpus (Livestreaming Style)
The database recorded 748 sentences (14,248 words) from a male voice talent. The total audio duration is about 1.38 hours, including the zero silence at the beginning and ending (about 350 ms each). The recorded content is organized into 1 texts, focused on domain of live broadcast. We en-us_cmu phone set for labeling.
American English Male Speech Synthesis Corpus (Multi-style)
The database recorded 12130 sentences (179,867 words) from a male voice talent. The total audio duration is about 19.39 hours, including the silence at the beginning and ending (about 300 ms each). The recorded content is organized into 25 texts, including multiple fields, such as news, encyclopedia, dialog, etc. We used en-us_cmu phone set for labeling. The voice talent was born and raised in American. He is a professional voice talent. The recording has a deep timbre and even speech rate.

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.

Filter by
Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios