Dataocean AI New Datasets – December

Blog

11 12 月, 2024

The new dataset from Dataocean AI for December is here! This release includes datasets in speech recognition, speech synthesis, multimodal learning, and more, designed to support the training of multimodal large models. Developers can easily overcome data bottlenecks and efficiently improve model performance.

Indonesian Speech Recognition Dataset
Thai Speech Recognition Dataset
Chinese Female Speech Synthesis Dataset – Multi Emotions
American English Emotional Speech Synthesis Dataset
Professional Scenario Text-Image Pair Dataset
General Knowledge Text-Image Pair Dataset

Indonesian Speech Recognition Dataset – Dialogue

Indonesian is the official language of Indonesia and one of the official languages of Malaysia and Brunei. It is also spoken by a significant number of people in Singapore and East Timor, with around 190 million speakers globally. High-quality Indonesian data helps improve model speech recognition capabilities, supporting businesses in expanding into the Southeast Asian market.

🔥 Product Features: The dataset includes 100 speakers with a total duration of 109 hours and over 95% word accuracy. The speakers are gender-balanced, with ages ranging from 18 to 65, covering various age groups and language characteristics.

🚀 Topics: Daily casual conversation topics such as family, health, music, shopping, sports, travel, work, food, education, movies, social networks, friends, entertainment, news, pets, computers, TV, celebrities, life, marriage, weather, and more.

🔗 King-ASR-868 Indonesian Speech Recognition Dataset – Dialogue

Thai Speech Recognition Dataset – Dialogue

Thai, also known as Siamese, is primarily spoken in Thailand and parts of Laos, with around 68 million speakers globally. Dialogue data helps the model understand real-world conversation patterns and linguistic habits, improving its accuracy in Thai language comprehension.

🔥 Product Features: This dataset includes 402 speakers with a total duration of over 203 hours and more than 95% word accuracy. It contains 61.69% male and 38.31% female speakers, ranging in age from 18 to 65, covering a wide range of language characteristics.

🚀 Topics: Includes Thai monologues and conversations across 11 industries, such as finance, education, healthcare, technology, environment, travel, and more.

🔗 King-ASR-301 Thai Speech Recognition Dataset – Dialogue

Chinese Female Speech Synthesis Dataset – Multi Emotions

This high-quality dataset is highly favored by clients and is an ideal choice for digital humans and virtual broadcasters, helping models generate more natural and appealing voices.

🔥 Product Features: Total duration of 4.43 hours with more than 99% word accuracy. The voice tone is warm and gentle with a steady pace. The dataset also includes detailed annotations for pronunciation, rhythm, and other aspects.

🚀 Topics: Includes conversations between couples, e-commerce live streaming, declarative speech, casual conversations, and more. The dataset covers 14 different emotional tones, such as joy, dissatisfaction, fear, gentleness, sentimentality, sadness, sternness, friendliness, whispering, apology, excitement, affection, anger, and calm.

🔗 King-TTS-264 Chinese Female Speech Synthesis Dataset

U.S. English Emotional Speech Synthesis Dataset

In fields like gaming, audiobooks, and virtual humans, emotionally rich speech synthesis data can significantly enhance model performance and user experience, helping domestic companies enter the European and American markets.

🔥 Product Features: Includes three datasets, each with 3 hours of data recorded by 2 male and 1 female speaker, covering three different tonal ranges. Each tonal range includes 11 emotions: neutral, happy, angry, sad, shocked, hateful, fearful, shouting, crying, laughing, weak.

🔗 King-TTS-285 U.S. English Male Speech Synthesis Dataset – Gentle Warm Man

🔗 King-TTS-286 U.S. English Male Speech Synthesis Dataset – Gentle Mature

🔗 King-TTS-287 U.S. English Female Speech Synthesis Dataset – Mature and Steady

Professional Scenario Text-Image Pair Dataset

🔥 Product Features: Includes images taken from various scenarios, periods, and shooting angles, covering topics like architecture, displays, city streets, home environments, sports events, shopping malls, schools, exhibitions, and natural settings. Each image is accompanied by corresponding text descriptions.

🚀 Product Scale: 20,000 pairs

🖼️ Image Specifications: 720P or higher

📝 Text Specifications: Includes labels, Chinese and English descriptions, with Chinese descriptions containing at least 30 Chinese characters (excluding symbols).

🔗 King-IM-105 Professional Scenario Text-Image Pair Dataset

General Knowledge Text-Image Pair Dataset

🔥 Product Features: Contains data in 23 categories, including people, food, landscapes, architecture, cities, rural areas, health, sports, medical, automobiles, backgrounds, finance, education, oil paintings, illustrations, watercolor, travel, fashion, romance, animals, plants, space, and technology.

🚀 Product Scale: 2,000,000 pairs

🖼️ Image Specifications: 2K or higher

📝 Text Specifications: Includes labels, with descriptions in Chinese or English

🔗 King-IM-104 General Knowledge Text-Image Pair Dataset

Share this post

Blog

"Can You Interrupt AI Mid-Response?” Discover the Full-Duplex Power Behind GPT Realtime × Gemini — All Thanks to Full-Duplex Datasets!

9,000-Hour Chinese Full-Duplex Speech Recognition Corpus

Blog

The IEEE International Conference on Multimedia & Expo (ICME) 2025 Audio Encoder Capability Challenge

Blog

Dataocean AI New Datasets - December

Dataocean AI New Datasets – December

Indonesian Speech Recognition Dataset

Thai Speech Recognition Dataset

Chinese Female Speech Synthesis Dataset – Multi Emotions

American English Emotional Speech Synthesis Dataset

Professional Scenario Text-Image Pair Dataset

General Knowledge Text-Image Pair Dataset

Indonesian Speech Recognition Dataset – Dialogue

🔥 Product Features: The dataset includes 100 speakers with a total duration of 109 hours and over 95% word accuracy. The speakers are gender-balanced, with ages ranging from 18 to 65, covering various age groups and language characteristics.

🚀 Topics: Daily casual conversation topics such as family, health, music, shopping, sports, travel, work, food, education, movies, social networks, friends, entertainment, news, pets, computers, TV, celebrities, life, marriage, weather, and more.

🔗 King-ASR-868 Indonesian Speech Recognition Dataset – Dialogue

Thai Speech Recognition Dataset – Dialogue

Thai, also known as Siamese, is primarily spoken in Thailand and parts of Laos, with around 68 million speakers globally. Dialogue data helps the model understand real-world conversation patterns and linguistic habits, improving its accuracy in Thai language comprehension.

🔥 Product Features: This dataset includes 402 speakers with a total duration of over 203 hours and more than 95% word accuracy. It contains 61.69% male and 38.31% female speakers, ranging in age from 18 to 65, covering a wide range of language characteristics.

🚀 Topics: Includes Thai monologues and conversations across 11 industries, such as finance, education, healthcare, technology, environment, travel, and more.

🔗 King-ASR-301 Thai Speech Recognition Dataset – Dialogue

Chinese Female Speech Synthesis Dataset – Multi Emotions

This high-quality dataset is highly favored by clients and is an ideal choice for digital humans and virtual broadcasters, helping models generate more natural and appealing voices.

🔥 Product Features: Total duration of 4.43 hours with more than 99% word accuracy. The voice tone is warm and gentle with a steady pace. The dataset also includes detailed annotations for pronunciation, rhythm, and other aspects.

🔗 King-TTS-264 Chinese Female Speech Synthesis Dataset

U.S. English Emotional Speech Synthesis Dataset

In fields like gaming, audiobooks, and virtual humans, emotionally rich speech synthesis data can significantly enhance model performance and user experience, helping domestic companies enter the European and American markets.

🔥 Product Features: Includes three datasets, each with 3 hours of data recorded by 2 male and 1 female speaker, covering three different tonal ranges. Each tonal range includes 11 emotions: neutral, happy, angry, sad, shocked, hateful, fearful, shouting, crying, laughing, weak.

🔗 King-TTS-285 U.S. English Male Speech Synthesis Dataset – Gentle Warm Man

🔗 King-TTS-286 U.S. English Male Speech Synthesis Dataset – Gentle Mature

🔗 King-TTS-287 U.S. English Female Speech Synthesis Dataset – Mature and Steady

Professional Scenario Text-Image Pair Dataset

🚀 Product Scale: 20,000 pairs

🖼️ Image Specifications: 720P or higher

📝 Text Specifications: Includes labels, Chinese and English descriptions, with Chinese descriptions containing at least 30 Chinese characters (excluding symbols).

🔗 King-IM-105 Professional Scenario Text-Image Pair Dataset

General Knowledge Text-Image Pair Dataset

🚀 Product Scale: 2,000,000 pairs

🖼️ Image Specifications: 2K or higher

📝 Text Specifications: Includes labels, with descriptions in Chinese or English

🔗 King-IM-104 General Knowledge Text-Image Pair Dataset

Related articles

Join our newsletter to stay updated