Open Datasets: GigaSpeech 2 – 30,000 Hours of Southeast Asian Multilingual Speech Recognition Open Source Dataset Released

The term “Giga” originates from “gigantic,” reflecting the vast audio resources available on the internet. However, the quality of these audio resources varies significantly, and high-quality audio-text pairs are particularly scarce and expensive to annotate, especially for low-resource languages.  GigaSpeech, a highly successful open-source English dataset, addresses this issue by providing thousands of hours of […]

Unlocking the Emotional Data Behind GPT-4o

  GPT-4o can already be considered an emotionally rich and human-like intelligent voice assistant, or more accurately, a “new species” that is increasingly approaching human interaction. This powerful model also has the ability to understand and synthesize text, images, videos, and voice, and can even be seen as an unfinished version of GPT-5. Click here […]

Key Data for Humanlike Text-to-Speech Systems

As numerous tech companies race to enhance the multimodal capabilities of large models and strive to integrate functions like text summarization and image editing into mobile devices, OpenAI has launched a new product! CEO Samuel Harris Altman expressed his state with three letters: her (just like the movie “Her”). In the early morning of May […]

Dataocean AI New Datasets – May

In the field of artificial intelligence, the technology of large models is continuously driving innovation and development across various industries.  Dataocean AI has introduced new multilingual, multi-emotional, and multi-scenario intelligent voice data, as well as image data with Chinese element styles, to help companies develop more diverse and high-quality models and products to meet the […]

Chinese Continuous Visual Speech Recognition Challenge 2024

Initiated by the NCMMSC 2024 Organizing Committee and jointly hosted by Tsinghua University, Beijing University of Posts and Telecommunications, Speech Home and Dataocean AI, the second Chinese Continuous Visual Speech Recognition Challenge (CNVSRC 2024) kicks off today. We sincerely invite your participation and registration. Event Introduction Visual speech recognition, also known as lip reading, is […]

Essential Data for Training Large Speech Foundation model

Recently, OpenAI has delivered another breakthrough in the field of speech technology. By using a text input along with a 15-second audio sample, they can generate speech that sounds both natural and remarkably similar to the original voice. What’s particularly impressive is that even with a small model, a 15-second sample is enough to create […]