Chinese Continuous Visual Speech Recognition Challenge Workshop 2024 Has Concluded Successfully

News

30 8 月, 2024

On the morning of August 16th, the Chinese Continuous Visual Speech Recognition Challenge Workshop 2024 (CNVSRC Workshop 2024) was held at the 19th National Conference on Man-Machine Speech Communication (NCMMSC 2024) in Urumqi,China. The workshop includes CNVSRC 2024 introduction, address, rank announcement, technical report and system description sharing.

The workshop is a forum to exchange ideas regarding Chinese large vocabulary visual speech recognition techniques,and it is held as a special event at NCMMSC 2024 and co-hosted by Tsinghua University, Beijing University of Posts and Telecommunications, Dataocean AI, and the Speech home.

The competition attracted 45 teams from domestic and overseas to participate. After nearly three months of intense competition, teams from Northwestern Polytechnical University, Inner Mongolia University,Wuhan University, and others have performed exceptionally well and ranked at the top. Detailed results of the competition and the report video will be published on the official website of the competition, please stay tuned http://cnceleb.org/competition

Task 1 Single-speaker VSR Fixed Track

Rank	TeamID	CER on CNVSRC.Single.Eval	Report
1	T237	30.4679%	Report-T237.pdf
2	T244	39.3110%	Report-T244.pdf

Task 1 Single-speaker VSR Open Track

Rank	TeamID	CER on CNVSRC.Single.Eval	Report
1	T170	30.0680%	Anonymous Submission
2	T237	30.4679%	Report-T237.pdf

Task 2 Multi-speaker VSR Fixed Track

Rank	TeamID	CER on CNVSRC.Multi.Eval	Report
1	T237	34.2955%	Report-T237.pdf
2	T170	45.3244%	Anonymous Submission
3	T244	47.9259%	Report-T244.pdf

Task 2 Multi-speaker VSR Open Track

Rank	TeamID	CER on CNVSRC.Multi.Eval	Report
1	T237	34.2955%	Report-T237.pdf
2	T170	38.3454%	Anonymous Submission
3	T405	57.7762%	Report-T405.pdf

The workshop was hosted by Professor Wang Dong from Tsinghua University. Helen Wang , CMO of Dataocean AI, and Mr. Bu Hui, founder and CEO of Speech home, announced awards to the winning teams. Liu Zehua, a student from Beijing University of Posts and Telecommunications, shared the technical report. Representatives from three outstanding participating teams were also invited to share their technical solutions and competition experiences.

CNVSRC 2024 Introduction. – Dong Wang,THU

CNVSRC 2024 Address. – Helen Wang, Dataocean AI

CNVSRC 2024 Address. – Hui Bu, Speech Home

CNVSRC 2024 Technical Report

CNVSRC 2024 Rank Announcement

The representative of the Northwestern Polytechnical University team shared technical insights

The representative of the Inner Mongolia University team shared technical insights online.

The representative of the Wuhan University team shared technical insights via an online presentation

CNVSRC 2024 Photo

CNVSRC 2024 Organization Committee Member

Visual Speech Recognition

Visual Speech Recognition, also known as Lipreading Recognition, is a technology that infers the content of speech from lip movements. This technology has important applications in public safety, assistance for the elderly and disabled, and video authentication, among other fields. Currently, research in Lipreading Recognition is in ongoing development , and while significant progress has been made in the recognition of isolated words and phrases, there are still huge challenges in large vocabulary continuous recognition. Especially for Chinese, the research progress in this field is limited due to the lack of corresponding data resources. To address this, Tsinghua University released the CN-CVS dataset [1] in 2023, becoming the first large-scale continuous visual-speech dataset in Mandarin Chinese, providing possibilities for further advancing large vocabulary continuous visual speech recognition (LVCVSR), and held the CNVSRC 2023 competition [2] in the same year, promoting the progress of Lipreading Recognition in the Chinese domain.

To further promote this research direction, Tsinghua University, in conjunction with Beijing University of Posts and Telecommunications, Dataocean AI, and Speech home, continued to hold the CNVSRC 2024 at NCMMSC 2024. In this competition, many participating teams achieved significant improvements in system performance on the Lipreading Recognition task, with the best results showing an improvement of over 30% compared to the baseline system. In addition, compared to CNVSRC 2023, there has been a noticeable progress in the scores of all tracks in 2024. Various innovative solutions have been proposed by the participating teams, providing new ideas and methods for the research of large-vocabulary continuous visual speech recognition in Chinese.

[1] C. Chen, D. Wang, T.F. Zheng, CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis, ICASSP, 2023.

[2] C. Chen, Z. Liu, X. Li, L. Li, D. Wang, CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge, INTERSPEECH, 2024.

Share this post

Blog

"Can You Interrupt AI Mid-Response?” Discover the Full-Duplex Power Behind GPT Realtime × Gemini — All Thanks to Full-Duplex Datasets!

9,000-Hour Chinese Full-Duplex Speech Recognition Corpus

Blog

The IEEE International Conference on Multimedia & Expo (ICME) 2025 Audio Encoder Capability Challenge

Blog

Dataocean AI New Datasets - December

Chinese Continuous Visual Speech Recognition Challenge Workshop 2024 Has Concluded Successfully

The workshop is a forum to exchange ideas regarding Chinese large vocabulary visual speech recognition techniques,and it is held as a special event at NCMMSC 2024 and co-hosted by Tsinghua University, Beijing University of Posts and Telecommunications, Dataocean AI, and the Speech home.

Task 1 Single-speaker VSR Fixed Track

Task 1 Single-speaker VSR Open Track

Task 2 Multi-speaker VSR Fixed Track

Task 2 Multi-speaker VSR Open Track

CNVSRC 2024 Introduction. – Dong Wang,THU

CNVSRC 2024 Address. – Helen Wang, Dataocean AI

CNVSRC 2024 Address. – Hui Bu, Speech Home

CNVSRC 2024 Technical Report

CNVSRC 2024 Rank Announcement

The representative of the Northwestern Polytechnical University team shared technical insights

The representative of the Inner Mongolia University team shared technical insights online.

The representative of the Wuhan University team shared technical insights via an online presentation

CNVSRC 2024 Photo

CNVSRC 2024 Organization Committee Member

Visual Speech Recognition

[1] C. Chen, D. Wang, T.F. Zheng, CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis, ICASSP, 2023.

[2] C. Chen, Z. Liu, X. Li, L. Li, D. Wang, CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge, INTERSPEECH, 2024.

Related articles

Join our newsletter to stay updated