Mandarin-English Multi-speaker Speech Synthesis Corpus

The database recorded 109419 sentences (1635335 words) from non-professional speakers, including 64517 for channel 0 and 44902 for channel 1. The total audio duration is about 140.09 hours, including 82.41 hours for channel 0 and 57.68 hours for channel 1, including the cleared silence at the beginning and ending (about 350 ms each). The recorded content is organized into 8 texts, including multiple fields, such as news, dialog, etc. We used zh-cn_pinyin & en-us_cmu phone set for labeling. Channel 0_micro and channel 1_mobile’ English texts have both proofreading and phonetic labeling, Chinese and Chinese mixed with English texts only have proofreading.
Specifications:
ID:
King-TTS-111
Size:
140.09 hours
Language:
English, Chinese
Sample rate & bit depth
48kHz, 16bit
Recording environment
Professional recording studio
Speaker
25 males and 25 females
Devices:
Studio
Accuracy Rate
proofreading -- based on individual word, the accuracy is 99% phonetic labeling -- based on individual phone, the accuracy is 99.5%
Samples
Audio
King-TTS-111-030000182
King-TTS-111-030000361
King-TTS-111-030301013
King-TTS-111-030301055

People also searched for

American English Male and Female Speech Synthesis Corpus (Customer and Audiobook)
This database contains 2000 sentences from one female speaker and one male speaker, with a total audio duration of approximately 2 hours. The texts include customer and audiobook field.
Brazilian Portuguese Male and Female Speech Synthesis Corpus
The database recorded 2,924 sentences (49,025 words) from 3 voice talents(2 females and 1 male). The total audio duration is about 6 hours, including the original silence at the beginning and ending (about 300 ms each). The recorded content is organized into 11 texts, F048-03 including multiple fields, such as news, letters, digit, etc. We used pt-BR_xsampa phone set for labeling. The voice talents were born and raised in Brazil, in 1969/1973/1997, with standard Brazilian Portuguese and were 56/51/48 years old when recording the database, with a good line foundation. The recordings have even speech rate.
Spain Spanish Male and Female Speech Synthesis Corpus
The database recorded 3,068 sentences (52,494 words) from 3 voice talents(2 females and 1 male). The total audio duration is about 6 hours, including the original silence at the beginning and ending (about 300 ms each). The recorded content is organized into 11 texts, F021-04 including multiple fields , such as news, letter, digit, etc. We used es-es_sampa phone set for labeling. The voice talents were born and raised in Spain in1962/1984/1987, with standard Spanish, and were 63/41/37 years old when recording the database, with a good line foundation. The recording have even speech rate.
New Zealand English Female Speech Synthesis Corpus
The database recorded 1,600 sentences (17,808 words) from a male voice talent. The total audio duration is about 2.04 hours, including the original silence at the beginning and ending (about 350 ms each). The recorded content is organized into 1 texts, news. The voice talent was born and raised in New Zealand in 1989, with standard New Zealand English. She is a professional voice talent who has many years of experience in dubbing and broadcasting , with a good line foundation.

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.