Dataocean AI New Datasets – July

Blog

10 7 月, 2024

Dataocean AI has launched new high quality datasets including minor language smart voice dataset, telephoto landscape image dataset, and multi-skin tone cabin video dataset. These resources aim to help enterprises develop more extensive and higher-quality large models and AI applications to meet the diverse needs of global users.

Arabic Speech Recognition Dataset

Product Features: Arabic, with its unique charm and significance, serves as a crucial bridge for global communication. Dataocean AI’s Arabic Speech Recognition Dataset includes 1,937 speakers with a total duration exceeding 1,600 hours. The speakers are gender-balanced, and their ages range from 18 to 65, covering the linguistic characteristics of different age groups comprehensively.

The dataset includes 10 types of Arabic accents: Standard Arabic, Emirate Arabic, Saudi Arabic, Egyptian Arabic, Gulf Arabic, Kuwaiti Arabic, Levantine Arabic, Jordanian Arabic, Moroccan Arabic, and Libyan Arabic.

Application Scenarios: The dataset covers over 20 domains, including daily life, education, finance, healthcare, insurance, call centers, marketing, and tourism. Whether handling complex financial transactions, providing professional medical consultations, or assisting in travel services, this dataset supports models in delivering accurate speech recognition.

King-ASR-925 Arabic Speech Recognition Corpus–Dialogue

King-ASR-318 Arabic (Saudi Arabia) Speech Recognition Corpus

King-ASR-293 UAE Arabic Speech Recognition Corpus (Mobile)

King-ASR-109 UAE Arabic Speech Recognition Corpus (Desktop)

Arabic Speech Synthesis Dataset

Product Features: In addition to Arabic speech recognition dataset, DataoceanAI also offers Arabic speech synthesis dataset. This dataset exceeds 50 hours in total duration and includes accents such as Modern Standard Arabic, Egyptian Arabic, Egyptian Dialect, Gulf Dialect, and mixed Arabic-English. The speakers have professional broadcasting backgrounds, with friendly and natural voice tones and consistent speech rates. All data has been labeled for prosody.

Application Scenarios: The dataset covers a wide range of fields, including daily conversations, news, and finance, featuring both reading and dialogue data. The high-quality data will support companies expanding into markets along the Belt and Road Initiative.

King-TTS-174 Standard Arabic Female Speech Synthesis Corpus (Virtual Talk)

King-TTS-005 Egyptian Arabic Male Speech Synthesis Corpus

King-TTS-004 Egyptian Arabic Male Speech Synthesis Corpus

Minor Language Spontaneous Dialogue Speech Recognition Dataset

Product Features: The dialogues cover more than 20 common life topics such as family, health, travel, education, work, cuisine, marriage, movies, music, socializing, celebrities, weather, and sports, providing a comprehensive and rich natural context. The gender ratio of speakers is balanced, with ages mainly ranging from 16 to 45 years.

Minor Language Spontaneous Dialogue – Odia (India)

This dataset includes 52 hours of spontaneous dialogue recordings from 50 speakers, primarily from Odisha.

King-ASR-946 Odia Free Dialogue Speech Corpus

Minor Language Spontaneous Dialogue – Albanian

This dataset includes 22 hours of spontaneous dialogue recordings from 20 speakers, primarily from Tirana.

King-ASR-942 Albanian Free Dialogue Speech Corpus

Minor Language Spontaneous Dialogue – Amharic (Ethiopia)

This dataset includes 24 hours of spontaneous dialogue recordings from 20 speakers, primarily from central Ethiopia.

King-ASR-939 Ethiopian Amharic Free Dialogue Speech Corpus

Minor Language Spontaneous Dialogue – Serbian

This dataset includes 60 hours of spontaneous dialogue recordings from 50 speakers, primarily from central Serbia.

King-ASR-938 Serbian Free Dialogue Speech Corpus

Telephoto Landscape Image Dataset

Product Features: The brand-new telephoto landscape image dataset includes over 25,000 images, focusing on architecture and plants. The images are full-size and blur-free, maintaining clarity in both the foreground and background when zoomed in. To ensure diversity and uniqueness, no more than five images of the same subject from different angles are included.

Captured using the highest quality camera settings, with resolutions above 4K to ensure detailed and richly colored images. The use of focal lengths between 185mm and 235mm captures details while maintaining depth and a three-dimensional feel.

Application Scenarios: This dataset can be used for developing large multimodal models and as seed images for creators inputting into large models. Its excellent quality and diversity meet the demand for high-quality visual content.

King-IM-101 Telephoto Landscape Image Corpus

AD-DMS Multi-Skin Tone Cabin Video Dataset

Product Features: The multi-skin tone cabin video dataset includes data from over 700 participants, covering a range of skin tones, including black, brown, olive, fair, natural, and very fair.

Participants are from nearly 40 countries, including Switzerland, Colombia, Peru, Brazil, etc., with ages ranging from 18 to 60, primarily focusing on young to middle-aged individuals. This provides a rich sample of facial expressions and actions across different age groups. Each video is at least 25 seconds long, with a resolution of no less than 720P, ensuring image clarity and detail.

Collected Information	Details
Daytime	Front light, backlight, side light, dappled sunlight, overcast, rainy, snowy
Nighttime	Car interior lighting, street lighting, oncoming headlights (both high and low beams)
Facial expressions and actions	Eyes open, mouth opening and closing, exaggerated mouth movements, exaggerated expressions, smirking, winking, etc.
Other actions	Smoking, drinking, using a phone, hand covering the face, etc.
Accessories	All participants wear accessories, including glasses, hats, etc.

Application Scenarios: This dataset provides high-quality, diverse data to support research and applications in facial recognition, emotion analysis, driver monitoring, and other intelligent cabin scenarios. It enables models to be more accurate and reliable across a wide range of skin tones, nationalities, and age groups.

King-ADV-007 AD-DMS Multi-Skin Tone Cabin Video Corpus

High-Definition Dance Video Dataset

Product Features: This dataset contains 100,000 dance videos, each averaging 30 seconds in length, all in 4K resolution. It includes adults and teenagers with basic dance skills, with a balanced gender ratio. The videos feature solo and group performances, captured from various angles such as front, side, back, and turning views. The dance styles include folk, jazz, street dance, and more.

Application Scenarios: This dataset can be used in virtual human development, VR, dance teaching, and video creation, promoting the application and development of multimodal technology in these fields.

King-VD-049 High-Definition Dance Video Corpus

Lip Movement Video Dataset

Product Features: This dataset features lip-sync video data of 208 individuals captured using high-definition cameras. The recordings were done in a quiet indoor environment, simulating various lighting conditions, including normal light, strong light, backlight, and dim light. The shooting distances are 0.5 meters and 1 meter, with 0.5 meters being the primary distance (approximately 90% of the data). The dataset includes both solo and group recordings.

The speakers primarily use Mandarin, with ages ranging from 7 to over 60 years, focusing on children and young to middle-aged adults, with a balanced gender ratio. Audio was recorded simultaneously with the video.

Application Scenarios: This dataset can be applied in lip-reading recognition, virtual human development, and VR. It can assist people with hearing impairments in communication or improve the accuracy of speech recognition systems in noisy environments. Additionally, it can be used in video games, movie production, or virtual reality for virtual characters, driving the application and development in these fields.

King-VD-028 Lip Movement Video Corpus

King-VD-018 Lip-reading Speech Video Corpus

Share this post

Blog

"Can You Interrupt AI Mid-Response?” Discover the Full-Duplex Power Behind GPT Realtime × Gemini — All Thanks to Full-Duplex Datasets!

9,000-Hour Chinese Full-Duplex Speech Recognition Corpus

Blog

The IEEE International Conference on Multimedia & Expo (ICME) 2025 Audio Encoder Capability Challenge

Blog

Dataocean AI New Datasets - December

Dataocean AI New Datasets – July

Arabic Speech Recognition Dataset

The dataset includes 10 types of Arabic accents: Standard Arabic, Emirate Arabic, Saudi Arabic, Egyptian Arabic, Gulf Arabic, Kuwaiti Arabic, Levantine Arabic, Jordanian Arabic, Moroccan Arabic, and Libyan Arabic.

Arabic Speech Synthesis Dataset

Application Scenarios: The dataset covers a wide range of fields, including daily conversations, news, and finance, featuring both reading and dialogue data. The high-quality data will support companies expanding into markets along the Belt and Road Initiative.

Minor Language Spontaneous Dialogue Speech Recognition Dataset

Minor Language Spontaneous Dialogue – Odia (India)

This dataset includes 52 hours of spontaneous dialogue recordings from 50 speakers, primarily from Odisha.

Minor Language Spontaneous Dialogue – Albanian

This dataset includes 22 hours of spontaneous dialogue recordings from 20 speakers, primarily from Tirana.

Minor Language Spontaneous Dialogue – Amharic (Ethiopia)

This dataset includes 24 hours of spontaneous dialogue recordings from 20 speakers, primarily from central Ethiopia.

Minor Language Spontaneous Dialogue – Serbian

This dataset includes 60 hours of spontaneous dialogue recordings from 50 speakers, primarily from central Serbia.

Telephoto Landscape Image Dataset

Captured using the highest quality camera settings, with resolutions above 4K to ensure detailed and richly colored images. The use of focal lengths between 185mm and 235mm captures details while maintaining depth and a three-dimensional feel.

Application Scenarios: This dataset can be used for developing large multimodal models and as seed images for creators inputting into large models. Its excellent quality and diversity meet the demand for high-quality visual content.

AD-DMS Multi-Skin Tone Cabin Video Dataset

Product Features: The multi-skin tone cabin video dataset includes data from over 700 participants, covering a range of skin tones, including black, brown, olive, fair, natural, and very fair.

Collected Information

Details

Daytime

Front light, backlight, side light, dappled sunlight, overcast, rainy, snowy

Nighttime

Car interior lighting, street lighting, oncoming headlights (both high and low beams)

Facial expressions and actions

Eyes open, mouth opening and closing, exaggerated mouth movements, exaggerated expressions, smirking, winking, etc.

Other actions

Smoking, drinking, using a phone, hand covering the face, etc.

Accessories

All participants wear accessories, including glasses, hats, etc.

High-Definition Dance Video Dataset

Application Scenarios: This dataset can be used in virtual human development, VR, dance teaching, and video creation, promoting the application and development of multimodal technology in these fields.

Lip Movement Video Dataset

The speakers primarily use Mandarin, with ages ranging from 7 to over 60 years, focusing on children and young to middle-aged adults, with a balanced gender ratio. Audio was recorded simultaneously with the video.

Related articles

Join our newsletter to stay updated