Dataocean AI New Datasets – July

Blog
July 10, 2024

Dataocean AI has launched new high quality datasets including minor language smart voice dataset, telephoto landscape image dataset, and multi-skin tone cabin video dataset. These resources aim to help enterprises develop more extensive and higher-quality large models and AI applications to meet the diverse needs of global users.

 

Arabic Speech Recognition Dataset

Product Features: Arabic, with its unique charm and significance, serves as a crucial bridge for global communication. Dataocean AI’s Arabic Speech Recognition Dataset includes 1,937 speakers with a total duration exceeding 1,600 hours. The speakers are gender-balanced, and their ages range from 18 to 65, covering the linguistic characteristics of different age groups comprehensively.

 

The dataset includes 10 types of Arabic accents: Standard Arabic, Emirate Arabic, Saudi Arabic, Egyptian Arabic, Gulf Arabic, Kuwaiti Arabic, Levantine Arabic, Jordanian Arabic, Moroccan Arabic, and Libyan Arabic.

 

Application Scenarios: The dataset covers over 20 domains, including daily life, education, finance, healthcare, insurance, call centers, marketing, and tourism. Whether handling complex financial transactions, providing professional medical consultations, or assisting in travel services, this dataset supports models in delivering accurate speech recognition.

 

King-ASR-925 Arabic Speech Recognition Corpus–Dialogue
King-ASR-318 Arabic (Saudi Arabia) Speech Recognition Corpus
King-ASR-293 UAE Arabic Speech Recognition Corpus (Mobile)
King-ASR-109 UAE Arabic Speech Recognition Corpus (Desktop)
 
 

Arabic Speech Synthesis Dataset

Product Features: In addition to Arabic speech recognition dataset, DataoceanAI also offers Arabic speech synthesis dataset. This dataset exceeds 50 hours in total duration and includes accents such as Modern Standard Arabic, Egyptian Arabic, Egyptian Dialect, Gulf Dialect, and mixed Arabic-English. The speakers have professional broadcasting backgrounds, with friendly and natural voice tones and consistent speech rates. All data has been labeled for prosody.

 

Application Scenarios: The dataset covers a wide range of fields, including daily conversations, news, and finance, featuring both reading and dialogue data. The high-quality data will support companies expanding into markets along the Belt and Road Initiative.

 

King-TTS-174 Standard Arabic Female Speech Synthesis Corpus (Virtual Talk)
King-TTS-005 Egyptian Arabic Male Speech Synthesis Corpus
King-TTS-004 Egyptian Arabic Male Speech Synthesis Corpus
 
 

Minor Language Spontaneous Dialogue Speech Recognition Dataset

Product Features: The dialogues cover more than 20 common life topics such as family, health, travel, education, work, cuisine, marriage, movies, music, socializing, celebrities, weather, and sports, providing a comprehensive and rich natural context. The gender ratio of speakers is balanced, with ages mainly ranging from 16 to 45 years.

 

Minor Language Spontaneous Dialogue – Odia (India)
This dataset includes 52 hours of spontaneous dialogue recordings from 50 speakers, primarily from Odisha.
King-ASR-946 Odia Free Dialogue Speech Corpus
 
Minor Language Spontaneous Dialogue – Albanian
This dataset includes 22 hours of spontaneous dialogue recordings from 20 speakers, primarily from Tirana.
King-ASR-942 Albanian Free Dialogue Speech Corpus

 

Minor Language Spontaneous Dialogue – Amharic (Ethiopia)
This dataset includes 24 hours of spontaneous dialogue recordings from 20 speakers, primarily from central Ethiopia.
King-ASR-939 Ethiopian Amharic Free Dialogue Speech Corpus
 
Minor Language Spontaneous Dialogue – Serbian
This dataset includes 60 hours of spontaneous dialogue recordings from 50 speakers, primarily from central Serbia.
King-ASR-938 Serbian Free Dialogue Speech Corpus
 
 

Telephoto Landscape Image Dataset

Product Features: The brand-new telephoto landscape image dataset includes over 25,000 images, focusing on architecture and plants. The images are full-size and blur-free, maintaining clarity in both the foreground and background when zoomed in. To ensure diversity and uniqueness, no more than five images of the same subject from different angles are included.

 

Captured using the highest quality camera settings, with resolutions above 4K to ensure detailed and richly colored images. The use of focal lengths between 185mm and 235mm captures details while maintaining depth and a three-dimensional feel.

 

Application Scenarios: This dataset can be used for developing large multimodal models and as seed images for creators inputting into large models. Its excellent quality and diversity meet the demand for high-quality visual content.

King-IM-101 Telephoto Landscape Image Corpus
 
 

AD-DMS Multi-Skin Tone Cabin Video Dataset

Product Features: The multi-skin tone cabin video dataset includes data from over 700 participants, covering a range of skin tones, including black, brown, olive, fair, natural, and very fair.

 

Participants are from nearly 40 countries, including Switzerland, Colombia, Peru, Brazil, etc., with ages ranging from 18 to 60, primarily focusing on young to middle-aged individuals. This provides a rich sample of facial expressions and actions across different age groups. Each video is at least 25 seconds long, with a resolution of no less than 720P, ensuring image clarity and detail.

 

Collected Information
Details
Daytime
Front light, backlight, side light, dappled sunlight, overcast, rainy, snowy
Nighttime
Car interior lighting, street lighting, oncoming headlights (both high and low beams)
Facial expressions and actions
Eyes open, mouth opening and closing, exaggerated mouth movements, exaggerated expressions, smirking, winking, etc.
Other actions
Smoking, drinking, using a phone, hand covering the face, etc.
Accessories
All participants wear accessories, including glasses, hats, etc.

 
Application Scenarios: This dataset provides high-quality, diverse data to support research and applications in facial recognition, emotion analysis, driver monitoring, and other intelligent cabin scenarios. It enables models to be more accurate and reliable across a wide range of skin tones, nationalities, and age groups.

King-ADV-007 AD-DMS Multi-Skin Tone Cabin Video Corpus
 
 

High-Definition Dance Video Dataset

 Product Features: This dataset contains 100,000 dance videos, each averaging 30 seconds in length, all in 4K resolution. It includes adults and teenagers with basic dance skills, with a balanced gender ratio. The videos feature solo and group performances, captured from various angles such as front, side, back, and turning views. The dance styles include folk, jazz, street dance, and more.

 

Application Scenarios: This dataset can be used in virtual human development, VR, dance teaching, and video creation, promoting the application and development of multimodal technology in these fields.

King-VD-049 High-Definition Dance Video Corpus
 

Lip Movement Video Dataset

Product Features: This dataset features lip-sync video data of 208 individuals captured using high-definition cameras. The recordings were done in a quiet indoor environment, simulating various lighting conditions, including normal light, strong light, backlight, and dim light. The shooting distances are 0.5 meters and 1 meter, with 0.5 meters being the primary distance (approximately 90% of the data). The dataset includes both solo and group recordings.

 

The speakers primarily use Mandarin, with ages ranging from 7 to over 60 years, focusing on children and young to middle-aged adults, with a balanced gender ratio. Audio was recorded simultaneously with the video.

 

Application Scenarios: This dataset can be applied in lip-reading recognition, virtual human development, and VR. It can assist people with hearing impairments in communication or improve the accuracy of speech recognition systems in noisy environments. Additionally, it can be used in video games, movie production, or virtual reality for virtual characters, driving the application and development in these fields.

King-VD-028 Lip Movement Video Corpus
King-VD-018 Lip-reading Speech Video Corpus

Share this post

Related articles

cn cover1
Chinese Continuous Visual Speech Recognition Challenge Workshop 2024 Has Concluded Successfully
Presentation2 2.jpg555
Hi-Scene's Dataset of Over 50,000 Sports Videos Enhances AI Referees' Precision in Capturing Thrilling Moments
Presentation2 2.jpg333
How AthleteGPT is Perfectly Prepared for the Paris Olympics: The Technology Behind the Success

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.