Dataocean AI New Datasets – December

Blog
December 11, 2024
The new dataset from Dataocean AI for December is here! This release includes datasets in speech recognition, speech synthesis, multimodal learning, and more, designed to support the training of multimodal large models. Developers can easily overcome data bottlenecks and efficiently improve model performance.
  • Indonesian Speech Recognition Dataset
  • Thai Speech Recognition Dataset
  • Chinese Female Speech Synthesis Dataset – Multi Emotions
  • American English Emotional Speech Synthesis Dataset
  • Professional Scenario Text-Image Pair Dataset
  • General Knowledge Text-Image Pair Dataset
 
Indonesian Speech Recognition Dataset – Dialogue
Indonesian is the official language of Indonesia and one of the official languages of Malaysia and Brunei. It is also spoken by a significant number of people in Singapore and East Timor, with around 190 million speakers globally. High-quality Indonesian data helps improve model speech recognition capabilities, supporting businesses in expanding into the Southeast Asian market.
🔥 Product Features: The dataset includes 100 speakers with a total duration of 109 hours and over 95% word accuracy. The speakers are gender-balanced, with ages ranging from 18 to 65, covering various age groups and language characteristics.
🚀 Topics: Daily casual conversation topics such as family, health, music, shopping, sports, travel, work, food, education, movies, social networks, friends, entertainment, news, pets, computers, TV, celebrities, life, marriage, weather, and more.
🔗 King-ASR-868 Indonesian Speech Recognition Dataset – Dialogue
 
Thai Speech Recognition Dataset – Dialogue
Thai, also known as Siamese, is primarily spoken in Thailand and parts of Laos, with around 68 million speakers globally. Dialogue data helps the model understand real-world conversation patterns and linguistic habits, improving its accuracy in Thai language comprehension.
🔥 Product Features: This dataset includes 402 speakers with a total duration of over 203 hours and more than 95% word accuracy. It contains 61.69% male and 38.31% female speakers, ranging in age from 18 to 65, covering a wide range of language characteristics.
🚀 Topics: Includes Thai monologues and conversations across 11 industries, such as finance, education, healthcare, technology, environment, travel, and more.
🔗 King-ASR-301 Thai Speech Recognition Dataset – Dialogue
 
Chinese Female Speech Synthesis Dataset – Multi Emotions
This high-quality dataset is highly favored by clients and is an ideal choice for digital humans and virtual broadcasters, helping models generate more natural and appealing voices.
🔥 Product Features: Total duration of 4.43 hours with more than 99% word accuracy. The voice tone is warm and gentle with a steady pace. The dataset also includes detailed annotations for pronunciation, rhythm, and other aspects.
🚀 Topics: Includes conversations between couples, e-commerce live streaming, declarative speech, casual conversations, and more. The dataset covers 14 different emotional tones, such as joy, dissatisfaction, fear, gentleness, sentimentality, sadness, sternness, friendliness, whispering, apology, excitement, affection, anger, and calm.
🔗 King-TTS-264 Chinese Female Speech Synthesis Dataset

 

U.S. English Emotional Speech Synthesis Dataset
In fields like gaming, audiobooks, and virtual humans, emotionally rich speech synthesis data can significantly enhance model performance and user experience, helping domestic companies enter the European and American markets.
🔥 Product Features: Includes three datasets, each with 3 hours of data recorded by 2 male and 1 female speaker, covering three different tonal ranges. Each tonal range includes 11 emotions: neutral, happy, angry, sad, shocked, hateful, fearful, shouting, crying, laughing, weak.
🔗 King-TTS-285 U.S. English Male Speech Synthesis Dataset – Gentle Warm Man
🔗 King-TTS-286 U.S. English Male Speech Synthesis Dataset – Gentle Mature
🔗 King-TTS-287 U.S. English Female Speech Synthesis Dataset – Mature and Steady
 
Professional Scenario Text-Image Pair Dataset
🔥 Product Features: Includes images taken from various scenarios, periods, and shooting angles, covering topics like architecture, displays, city streets, home environments, sports events, shopping malls, schools, exhibitions, and natural settings. Each image is accompanied by corresponding text descriptions.
🚀 Product Scale: 20,000 pairs
🖼️ Image Specifications: 720P or higher
📝 Text Specifications: Includes labels, Chinese and English descriptions, with Chinese descriptions containing at least 30 Chinese characters (excluding symbols).
🔗 King-IM-105 Professional Scenario Text-Image Pair Dataset
 
General Knowledge Text-Image Pair Dataset
🔥 Product Features: Contains data in 23 categories, including people, food, landscapes, architecture, cities, rural areas, health, sports, medical, automobiles, backgrounds, finance, education, oil paintings, illustrations, watercolor, travel, fashion, romance, animals, plants, space, and technology.
🚀 Product Scale: 2,000,000 pairs
🖼️ Image Specifications: 2K or higher
📝 Text Specifications: Includes labels, with descriptions in Chinese or English
🔗 King-IM-104 General Knowledge Text-Image Pair Dataset
Share this post

Related articles

WX20241211-122704@2x
Dataocean AI New Datasets - December
cover
Dataocean AI: An Expert in Content Moderation for a Safe and Reliable Network Environment
WX20240929-172037@2x
Dataocean AI New Datasets - September

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.