The database recorded 109419 sentences (1635335 words) from non-professional speakers, including 64517 for channel 0 and 44902 for channel 1. The total audio duration is about 140.09 hours, including 82.41 hours for channel 0 and 57.68 hours for channel 1, including the cleared silence at the beginning and ending (about 350 ms each).
The recorded content is organized into 8 texts, including multiple fields, such as news, dialog, etc. We used zh-cn_pinyin & en-us_cmu phone set for labeling. Channel 0_micro and channel 1_mobile’ English texts have both proofreading and phonetic labeling, Chinese and Chinese mixed with English texts only have proofreading.
proofreading -- based on individual word, the accuracy is 99%
phonetic labeling -- based on individual phone, the accuracy is 99.5%
Samples
Audio
King-TTS-111-030000182
King-TTS-111-030000361
King-TTS-111-030301013
King-TTS-111-030301055
People also searched for
American English Male and Female Speech Synthesis Corpus (Customer and Audiobook)
This database contains 2000 sentences from one female speaker and one male speaker, with a total audio duration of approximately 2 hours. The texts include
customer and audiobook field.
Brazilian Portuguese Male and Female Speech Synthesis Corpus
The database recorded 2,924 sentences (49,025 words) from 3 voice talents(2 females and 1 male). The total audio duration is about 6 hours, including the original silence at the beginning and ending (about 300 ms each).
The recorded content is organized into 11 texts, F048-03 including multiple fields, such as news, letters, digit, etc. We used pt-BR_xsampa phone set for labeling.
The voice talents were born and raised in Brazil, in 1969/1973/1997, with standard Brazilian Portuguese and were 56/51/48 years old when recording the database, with a good line foundation. The recordings have even speech rate.
Spain Spanish Male and Female Speech Synthesis Corpus
The database recorded 3,068 sentences (52,494 words) from 3 voice talents(2 females and 1 male). The total audio duration is about 6 hours, including the original silence at the beginning and ending (about 300 ms each).
The recorded content is organized into 11 texts, F021-04 including multiple fields , such as news, letter, digit, etc. We used es-es_sampa phone set for labeling.
The voice talents were born and raised in Spain in1962/1984/1987, with standard Spanish, and were 63/41/37 years old when recording the database, with a good line foundation. The recording have even speech rate.
New Zealand English Female Speech Synthesis Corpus
The database recorded 1,600 sentences (17,808 words) from a male voice talent. The total audio duration is about 2.04 hours, including the original silence at the beginning and ending (about 350 ms each).
The recorded content is organized into 1 texts, news.
The voice talent was born and raised in New Zealand in 1989, with standard New Zealand English. She is a professional voice talent who has many years of experience in dubbing and broadcasting , with a good line foundation.