On the morning of August 16th, the Chinese Continuous Visual Speech Recognition Challenge Workshop 2024 (CNVSRC Workshop 2024) was held at the 19th National Conference on Man-Machine Speech Communication (NCMMSC 2024) in Urumqi,China. The workshop includes CNVSRC 2024 introduction, address, rank announcement, technical report and system description sharing.
The workshop is a forum to exchange ideas regarding Chinese large vocabulary visual speech recognition techniques,and it is held as a special event at NCMMSC 2024 and co-hosted by Tsinghua University, Beijing University of Posts and Telecommunications, Dataocean AI, and the Speech home.
The competition attracted 45 teams from domestic and overseas to participate. After nearly three months of intense competition, teams from Northwestern Polytechnical University, Inner Mongolia University,Wuhan University, and others have performed exceptionally well and ranked at the top. Detailed results of the competition and the report video will be published on the official website of the competition, please stay tuned http://cnceleb.org/competition
Task 1 Single-speaker VSR Fixed Track
Rank | TeamID | CER on CNVSRC.Single.Eval | Report |
---|---|---|---|
1 | T237 | 30.4679% | Report-T237.pdf |
2 | T244 | 39.3110% | Report-T244.pdf |
Task 1 Single-speaker VSR Open Track
Rank | TeamID | CER on CNVSRC.Single.Eval | Report |
---|---|---|---|
1 | T170 | 30.0680% | Anonymous Submission |
2 | T237 | 30.4679% | Report-T237.pdf |
Task 2 Multi-speaker VSR Fixed Track
Rank | TeamID | CER on CNVSRC.Multi.Eval | Report |
---|---|---|---|
1 | T237 | 34.2955% | Report-T237.pdf |
2 | T170 | 45.3244% | Anonymous Submission |
3 | T244 | 47.9259% | Report-T244.pdf |
Task 2 Multi-speaker VSR Open Track
Rank | TeamID | CER on CNVSRC.Multi.Eval | Report |
---|---|---|---|
1 | T237 | 34.2955% | Report-T237.pdf |
2 | T170 | 38.3454% | Anonymous Submission |
3 | T405 | 57.7762% | Report-T405.pdf |
The workshop was hosted by Professor Wang Dong from Tsinghua University. Helen Wang , CMO of Dataocean AI, and Mr. Bu Hui, founder and CEO of Speech home, announced awards to the winning teams. Liu Zehua, a student from Beijing University of Posts and Telecommunications, shared the technical report. Representatives from three outstanding participating teams were also invited to share their technical solutions and competition experiences.
CNVSRC 2024 Introduction. – Dong Wang,THU
CNVSRC 2024 Address. – Helen Wang, Dataocean AI
CNVSRC 2024 Address. – Hui Bu, Speech Home
CNVSRC 2024 Technical Report
CNVSRC 2024 Rank Announcement
The representative of the Northwestern Polytechnical University team shared technical insights
The representative of the Inner Mongolia University team shared technical insights online.
The representative of the Wuhan University team shared technical insights via an online presentation
CNVSRC 2024 Photo
CNVSRC 2024 Organization Committee Member
Visual Speech Recognition
Visual Speech Recognition, also known as Lipreading Recognition, is a technology that infers the content of speech from lip movements. This technology has important applications in public safety, assistance for the elderly and disabled, and video authentication, among other fields. Currently, research in Lipreading Recognition is in ongoing development , and while significant progress has been made in the recognition of isolated words and phrases, there are still huge challenges in large vocabulary continuous recognition. Especially for Chinese, the research progress in this field is limited due to the lack of corresponding data resources. To address this, Tsinghua University released the CN-CVS dataset [1] in 2023, becoming the first large-scale continuous visual-speech dataset in Mandarin Chinese, providing possibilities for further advancing large vocabulary continuous visual speech recognition (LVCVSR), and held the CNVSRC 2023 competition [2] in the same year, promoting the progress of Lipreading Recognition in the Chinese domain.
To further promote this research direction, Tsinghua University, in conjunction with Beijing University of Posts and Telecommunications, Dataocean AI, and Speech home, continued to hold the CNVSRC 2024 at NCMMSC 2024. In this competition, many participating teams achieved significant improvements in system performance on the Lipreading Recognition task, with the best results showing an improvement of over 30% compared to the baseline system. In addition, compared to CNVSRC 2023, there has been a noticeable progress in the scores of all tracks in 2024. Various innovative solutions have been proposed by the participating teams, providing new ideas and methods for the research of large-vocabulary continuous visual speech recognition in Chinese.