Chinese Continuous Visual Speech Recognition Challenge 2024

News

10 5 月, 2024

Initiated by the NCMMSC 2024 Organizing Committee and jointly hosted by Tsinghua University, Beijing University of Posts and Telecommunications, Speech Home and Dataocean AI, the second Chinese Continuous Visual Speech Recognition Challenge (CNVSRC 2024) kicks off today. We sincerely invite your participation and registration.

Event Introduction

Visual speech recognition, also known as lip reading, is a technology that infers spoken content through lip movements. This technology has significant applications in public safety, aiding the elderly and disabled, and video authentication. Currently, research in lip reading is flourishing. Although substantial progress has been made in recognizing isolated words and phrases, there are still significant challenges in large vocabulary continuous recognition. Particularly for Chinese, the research progress in this field has been limited due to the lack of corresponding data resources. To address this, Tsinghua University released the CN-CVS dataset in 2023, which became the first Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis, offering the possibility to advance large vocabulary continuous visual speech recognition (LVCVSR) further.

For more information about the CN-CVS dataset, please visit the official database website http://cnceleb.org

To promote the development of this research direction, Tsinghua University, in collaboration with Beijing University of Posts and Telecommunications, Dataocean AI, and Speech Home, will host the second Chinese Continuous Visual Speech Recognition Challenge (CNVSRC 2024) at NCMMSC 2024. This competition will be based on the CN-CVS Chinese visual speech recognition dataset and will evaluate the performance of LVCVSR systems in two scenarios: studio reading (Reading) and online speech (Speech). The results of the competition will be announced and awarded at the NCMMSC 2024 conference.

Compared to the first CNVSRC 2023, this year’s CNVSRC 2024 provides a more powerful fixed-track baseline system and an additional dataset, CN-CVS2-P1, for the open track.

Task Setup

CNVSRC 2024 comprises two tasks:

– T1: Single-speaker Visual Speech Recognition (VSR)

– T2: Multi-speaker Visual Speech Recognition (VSR)

The former focuses on optimizing performance for a specific speaker using a large amount of data, while the latter emphasizes the system’s baseline performance on non-specific speakers.

Each task is further divided into two tracks based on the training data used: Fixed Track and Open Track.

Fixed Track : only allows the use of the CN-CVS dataset, the development set released for each task, as the training set. This track aims to validate the advancement of algorithms.

Open Track : permits the use of any data for training (e.g. CN-CVS2-P1) to assess the performance limits achievable with current technology.

Tsinghua University provides baseline system codes for the Fixed Track as a reference for participants.

How to Join Us

CNVSRC 2024 is open to all individuals and institutions. The competition’s official website is now live and accepting registrations. For more information about competition rules, timelines, and other details, you can access it through your browser http://cnceleb.org/competition

[1] C. Chen, D. Wang, T.F. Zheng, CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis, ICASSP, 2023.

Share this post

Related articles

1738832423865

Blog

The IEEE International Conference on Multimedia & Expo (ICME) 2025 Audio Encoder Capability Challenge

WX20241211-122704@2x

Blog

Dataocean AI New Datasets - December

WX20241217-185151@2x

International Project Resource Expert