Dataocean AI New Datasets – May

Blog
May 13, 2024

In the field of artificial intelligence, the technology of large models is continuously driving innovation and development across various industries.  Dataocean AI has introduced new multilingual, multi-emotional, and multi-scenario intelligent voice data, as well as image data with Chinese element styles, to help companies develop more diverse and high-quality models and products to meet the broad needs of global users.
 

Singapore English Speech Recognition Corpus

Real-life scenario Collection:Total duration over 200 hours.
Sentence Segmentation:All conversation content is accurately segmented by sentence meaning.
Dual-Channel Recording:Recording from both mobile and internet phones.
Extensive Dialect Range :Covers over 10 Topics including telemarketing customer service, financial, daily life, social, travel, shopping, sports, education, entertainment, healthcare, technology, and gaming.
Content:30% of the dialogue pertains to telemarketing customer service and financial transactions.
King-ASR-189-1-Chinese-English Mixed Speech Recognition Corpus (Desktop)
King-ASR-646-Singapore English Speech Recognition Corpus (Mobile)
 

 

Chinese-English Mixed Speech Recognition Corpus

Product Features:
Total Duration: Over 3200 hours
 High-Quality Audio: Recorded in a quiet desktop environment
Gender Balance: Equal proportion of male and female speakers
Extensive Dialect Range: Includes speakers from the seven major dialect regions of China, enhancing the model’s ability to recognize different dialects and accents
Diverse Content: Includes various scenarios such as music, maps, casual conversations, code-mixing interactions, life queries, encyclopedias, tools, application control, radio, audiobooks, and videos.
King-ASR-951 Chinese-English Mixed Speech Recognition Corpus
King-ASR-954 Chinese-English Mixed Speech Recognition Corpus
King-ASR-873-Chinese-English Mixed Speech Recognition Corpus (Desktop)
 

Cantonese-English Mixed Speech Recognition Corpus

Product Features:
 Total Duration: 205 hours of Hong Kong Cantonese mixed with English reading content
 Speakers: 201 native Cantonese speakers from Hong Kong
 Collection Scenarios: Daily life conversation scenarios
Content: Includes commonly used English word abbreviations in Cantonese, as well as names, software, trademarks, and shop names.
King-ASR-957-Hong Kong Cantonese-English Mixed Corpus
 

Mandarin Speech Recognition Corpus for Elderly and Children

Product Features:
 Special Age Groups: The speech data set covers distinct age groups, including children and the elderly, ensuring gender balance.
 Broad Regional Coverage: Speakers are from the seven major dialect regions of China, achieving wide regional representation.
Children’s Recordings:
 Car control commands
 Audiobooks (especially children’s programs)
 Children’s video programs
Children’s songs and popular children’s songs from Douyin (TikTok)
Elderly Recordings:
Car control commands
 Navigation
 Audiobooks (especially programs favored by the elderly)
 Music types preferred by the elderly
King-ASR-953-Chinese Mandarin Speech Recognition Corpus for the Elderly and Children
 

British English Themed Scene Speech Synthesis Series Corpus

Product Features:
This series includes recordings from 5 male and 3 female speakers.
 Female Voice Features: The speech synthesis library focuses on content for live streaming, advertising, and training courses, with clear and precise pronunciation.
 Male Voice Features: In addition to live streaming, advertising, and training courses, the male voice library includes customer service and natural dialogue content, offering a more human-like interactive experience.
 Precise labeling: We have precisely labeled pronunciation, prosody, and phoneme boundaries, enabling better simulation of real human speech habits and rhythm.
King-TTS-293 British English Memale Voice Synthesis Corpus for Advertising and Marketing
 
 

Chinese Themed Scene Speech Synthesis Series Corpus

Product Features:
Voice Talent: This series includes recordings from 4 male and 4 female speakers.
Specialized Scenarios: The featured scenarios include self-media vlogs, educational content, live commerce, and advertising marketing.
Focus on Clarity and Expressiveness: Designed to better capture users’ attention, this series supports the development of education, marketing, audiobooks, podcasts, as well as film and animation dubbing.
King-TTS-302 Advertising and Marketing-Female voice
King-TTS-303 Live-streaming sales-Male voice
King-TTS-304 Education-Male voice
King-TTS-305 Self-media vlogs-Male voice  
 

Chinese Novel Speech Synthesis Corpus

Key Features:
Audio Content: The dataset covers over 200 hours of audio content.
Diverse Character Voices: It covers both supporting and main character voices, providing an array of original content that can vividly depict novel character settings.
Variety of Age Groups: Includes characters from various age groups such as youth, middle-aged, and elderly.
Diverse Character Archetypes: Encompasses a range of character types including the spoiled young lady, kind-hearted elder, malicious woman, and idle rascal.
Emotional Range: Represents nine basic emotions including neutral, joy, anger, sadness, fear, disdain, concern, seriousness, and inner monologue, as well as secondary languages like crying, stuttering, laughter, cold snorts, and sighs.
Tonal Variations: Captures tones of confusion and surprise.
King-TTS-114 Chinese Male Speech Synthesis Corpus (Novels)
King-TTS-115 Chinese Male Speech Synthesis Corpus (Novels Style & Multi voices)
King-TTS-116 Chinese Male Speech Synthesis Corpus (Novels) 
King-TTS-119 Chinese And English male and female Speech Synthesis Corpus (Novels &10 speakers)
 
 
Share this post

Related articles

cover
Dataocean AI: An Expert in Content Moderation for a Safe and Reliable Network Environment
WX20240929-172037@2x
Dataocean AI New Datasets - September
cn cover1
Chinese Continuous Visual Speech Recognition Challenge Workshop 2024 Has Concluded Successfully

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.