Initiated by the NCMMSC 2024 Organizing Committee and jointly hosted by Tsinghua University, Beijing University of Posts and Telecommunications, Speech Home and Dataocean AI, the second Chinese Continuous Visual Speech Recognition Challenge (CNVSRC 2024) kicks off today. We sincerely invite your participation and registration.
Event Introduction
Visual speech recognition, also known as lip reading, is a technology that infers spoken content through lip movements. This technology has significant applications in public safety, aiding the elderly and disabled, and video authentication. Currently, research in lip reading is flourishing. Although substantial progress has been made in recognizing isolated words and phrases, there are still significant challenges in large vocabulary continuous recognition. Particularly for Chinese, the research progress in this field has been limited due to the lack of corresponding data resources. To address this, Tsinghua University released the CN-CVS dataset in 2023, which became the first Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis, offering the possibility to advance large vocabulary continuous visual speech recognition (LVCVSR) further.
For more information about the CN-CVS dataset, please visit the official database website http://cnceleb.org
To promote the development of this research direction, Tsinghua University, in collaboration with Beijing University of Posts and Telecommunications, Dataocean AI, and Speech Home, will host the second Chinese Continuous Visual Speech Recognition Challenge (CNVSRC 2024) at NCMMSC 2024. This competition will be based on the CN-CVS Chinese visual speech recognition dataset and will evaluate the performance of LVCVSR systems in two scenarios: studio reading (Reading) and online speech (Speech). The results of the competition will be announced and awarded at the NCMMSC 2024 conference.
Compared to the first CNVSRC 2023, this year’s CNVSRC 2024 provides a more powerful fixed-track baseline system and an additional dataset, CN-CVS2-P1, for the open track.
Task Setup
CNVSRC 2024 comprises two tasks:
– T1: Single-speaker Visual Speech Recognition (VSR)
– T2: Multi-speaker Visual Speech Recognition (VSR)
The former focuses on optimizing performance for a specific speaker using a large amount of data, while the latter emphasizes the system’s baseline performance on non-specific speakers.
Each task is further divided into two tracks based on the training data used: Fixed Track and Open Track.
Fixed Track : only allows the use of the CN-CVS dataset, the development set released for each task, as the training set. This track aims to validate the advancement of algorithms.
Open Track : permits the use of any data for training (e.g. CN-CVS2-P1) to assess the performance limits achievable with current technology.
Tsinghua University provides baseline system codes for the Fixed Track as a reference for participants.
How to Join Us
CNVSRC 2024 is open to all individuals and institutions. The competition’s official website is now live and accepting registrations. For more information about competition rules, timelines, and other details, you can access it through your browser http://cnceleb.org/competition
[1] C. Chen, D. Wang, T.F. Zheng, CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis, ICASSP, 2023.