Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

Audio-visual speech recognition has received a lot of attention due to its robustness against acoustic noise. Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets. The authors in this paper proposed to use publicly […]

Chinese Continuous Visual Speech Recognition Challenge 2023

Visual speech recognition, also known as lip reading, is a technology that infers pronunciation content through lip movements. It has important applications in public safety, assisting the elderly and the disabled, and fake video detection. Currently, research on lip reading is still in its early stages and cannot accommodate real-life applications. Significant progress has been […]

SeamLessM4T: A Multi-Modal Model Beyond the Constraints of LLM

On August 23, Meta released a new large speech recognition model-SeamlessM4T. The SeamlessM4T (a multilingual and multimodal machine translation) model supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition in multiple languages, covering up to 100 languages. The paper describes methods for building this model, including learning self-supervised speech representations using […]

Revitalizing the Smart Devices Market: The Potential of Large Language Models

In the past one year, large language models have been in full swing. Smartphones and other smart devices are also catching up with the trend of large language models, aiming to share the market.  However, the smart devices market has been shrunk. One aspect of the decline is that the demand in 2021 has increased significantly, and the […]

Revolutionizing Speech Synthesis: Unleashing the Power of Big Data and Generative Models

In the ever-evolving landscape of artificial intelligence, large language models (LLMs) have captured the limelight in natural language processing (NLP). The rise of chatGPT and similar models has reshaped how we interact with text, revolutionizing everything from content creation to customer service. Yet, amidst this wave of innovation, the realm of speech synthesis has yet […]

AI Alignment: Navigating Complex Challenges

Since the release and launch of chatGPT, artificial intelligence (AI) has officially entered the era of large-scale models. Major top-tier internet companies worldwide are gearing up for general artificial intelligence models. OpenAI has progressively released chatGPT-4, has unveiled Llama 2, and Baidu has introduced Wenyan Yixin… These models possess a variety of abilities that surpass […]