The new dataset from Dataocean AI for December is here! This release includes datasets in speech recognition, speech synthesis, multimodal learning, and more, designed to support the training of multimodal large models. Developers can easily overcome data bottlenecks and efficiently improve model performance.
-
Indonesian Speech Recognition Dataset
-
Thai Speech Recognition Dataset
-
Chinese Female Speech Synthesis Dataset – Multi Emotions
-
American English Emotional Speech Synthesis Dataset
-
Professional Scenario Text-Image Pair Dataset
-
General Knowledge Text-Image Pair Dataset
Indonesian Speech Recognition Dataset – Dialogue
Indonesian is the official language of Indonesia and one of the official languages of Malaysia and Brunei. It is also spoken by a significant number of people in Singapore and East Timor, with around 190 million speakers globally. High-quality Indonesian data helps improve model speech recognition capabilities, supporting businesses in expanding into the Southeast Asian market.
🔥 Product Features: The dataset includes 100 speakers with a total duration of 109 hours and over 95% word accuracy. The speakers are gender-balanced, with ages ranging from 18 to 65, covering various age groups and language characteristics.
🚀 Topics: Daily casual conversation topics such as family, health, music, shopping, sports, travel, work, food, education, movies, social networks, friends, entertainment, news, pets, computers, TV, celebrities, life, marriage, weather, and more.
🔗 King-ASR-868 Indonesian Speech Recognition Dataset – Dialogue
Thai Speech Recognition Dataset – Dialogue
Thai, also known as Siamese, is primarily spoken in Thailand and parts of Laos, with around 68 million speakers globally. Dialogue data helps the model understand real-world conversation patterns and linguistic habits, improving its accuracy in Thai language comprehension.
🔥 Product Features: This dataset includes 402 speakers with a total duration of over 203 hours and more than 95% word accuracy. It contains 61.69% male and 38.31% female speakers, ranging in age from 18 to 65, covering a wide range of language characteristics.
🚀 Topics: Includes Thai monologues and conversations across 11 industries, such as finance, education, healthcare, technology, environment, travel, and more.
🔗 King-ASR-301 Thai Speech Recognition Dataset – Dialogue
Chinese Female Speech Synthesis Dataset – Multi Emotions
This high-quality dataset is highly favored by clients and is an ideal choice for digital humans and virtual broadcasters, helping models generate more natural and appealing voices.
🔥 Product Features: Total duration of 4.43 hours with more than 99% word accuracy. The voice tone is warm and gentle with a steady pace. The dataset also includes detailed annotations for pronunciation, rhythm, and other aspects.
🚀 Topics: Includes conversations between couples, e-commerce live streaming, declarative speech, casual conversations, and more. The dataset covers 14 different emotional tones, such as joy, dissatisfaction, fear, gentleness, sentimentality, sadness, sternness, friendliness, whispering, apology, excitement, affection, anger, and calm.
🔗 King-TTS-264 Chinese Female Speech Synthesis Dataset