Multimodal Visual Acuity Testing with Speech and Touch Panel

Avatar for Rong TONG
Rong TONG    
Assistant Professor

Read More 

In this project, the team developed a novel automated visual acuity testing system that enables users to interact using natural speech. The system employs a multimodal approach, integrating both speech and image recognition to enhance efficiency and user experience.

Problem statement:

Visual acuity (VA) testing often serves as the initial step in eye clinic workflows. Since this process is typically repeated at each patient visit, the traditional one-on-one approach can become a bottleneck in clinical operations.

Solution and Notable Contribution:

The team addressed the following three key aspects:

  • Fine-tuning the ASR model to accurately characterise the Singaporean accent.
  • Implementing noise cancellation and target speaker identification to handle noisy clinical environments.
  • Developing a multimodal solution to effectively address the issue of cross-talk.

 

Publications:

Boon Peng Yap, Michael Kok Liang Tan, Zhenghao Li, Rong Tong, Speech Enabled Visual Acuity Test, Interspeech 2024

Akshita Abrol, Ridwan Arefeen, Kelvin Zhenghao Li, Zhengkui Wang, Rong Tong, Real-Time Speech Recognition for Noisy Multi-Speaker Clinical Environments, to appear in IALP 2025


 

 

Flowchart of an automated speech assessment pipeline. It shows an audio input with noise being processed through speech separation and target talker identification modules, followed by a fine-tuned ASR module and an automated scoring module to produce a final score.