Contour mapping for speaker-independent lip reading system

Souheil Fenghour; Daqing Chen; Perry Xiao

doi:10.1117/12.2522936

15 March 2019 Contour mapping for speaker-independent lip reading system

Souheil Fenghour, Daqing Chen, Perry Xiao

Proceedings Volume 11041, Eleventh International Conference on Machine Vision (ICMV 2018); 1104114 (2019) https://doi.org/10.1117/12.2522936
Event: Eleventh International Conference on Machine Vision (ICMV 2018), 2018, Munich, Germany

Abstract

In this paper, we demonstrate how an existing deep learning architecture for automatically lip reading individuals can be adapted it so that it can be made speaker independent, and by doing so, improved accuracies can be achieved on a variety of different speakers. The architecture itself is multi-layered consisting of a convolutional neural network, but if we are to apply an initial edge detection-based stage to pre-process the image inputs so that only the contours are required, the architecture can be made to be less speaker favourable. The neural network architecture achieves good accuracy rates when trained and tested on some of the same speakers in the ”overlapped speakers” phase of simulations, where word error rates of just 1.3% and 0.4% are achieved when applied to two individual speakers respectively, as well as character error rates of 0.6% and 0.3%. The ”unseen speakers” phase fails to achieve as good an accuracy, with greater recorded word error rates of 20.6% and 17.0% when tested on the two speakers with character error rates of 11.5% and 8.3%. The variation in size and colour of different people’s lips will result in different outputs at the convolution layer of a convolutional neural network as the output depends on the pixel intensity of the red, green and blue channels of an input image so a convolutional neural network will naturally favour the observations of the individual whom the network was tested on. This paper proposes an initial ”contour mapping stage” which makes all inputs uniform so that the system can be speaker independent.

Citation Download Citation

Souheil Fenghour, Daqing Chen, and Perry Xiao "Contour mapping for speaker-independent lip reading system", Proc. SPIE 11041, Eleventh International Conference on Machine Vision (ICMV 2018), 1104114 (15 March 2019); https://doi.org/10.1117/12.2522936

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available