Paper
12 April 2010 Learning one-to-many mapping functions for audio-visual integrated perception
Jung-Hui Lim, Do-Kwan Oh, Soo-Young Lee
Author Affiliations +
Abstract
In noisy environment the human speech perception utilizes visual lip-reading as well as audio phonetic classification. This audio-visual integration may be done by combining the two sensory features at the early stage. Also, the top-down attention may integrate the two modalities. For the sensory feature fusion we introduce mapping functions between the audio and visual manifolds. Especially, we present an algorithm to provide one-to-many mapping function for the videoto- audio mapping. The top-down attention is also presented to integrate both the sensory features and classification results of both modalities, which is able to explain McGurk effect. Each classifier is separately implemented by the Hidden-Markov Model (HMM), but the two classifiers are combined at the top level and interact by the top-down attention.
© (2010) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jung-Hui Lim, Do-Kwan Oh, and Soo-Young Lee "Learning one-to-many mapping functions for audio-visual integrated perception", Proc. SPIE 7703, Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering VIII, 77030E (12 April 2010); https://doi.org/10.1117/12.855241
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Associative arrays

Visualization

Video

Sensors

Acoustics

Detection and tracking algorithms

Electronic filtering

RELATED CONTENT

Target tracking: method and comparison
Proceedings of SPIE (March 18 2022)
Tracking multiple objects using visual-based sensors
Proceedings of SPIE (April 14 2023)
Human assisted robotic exploration
Proceedings of SPIE (May 25 2016)
Context-enhanced video understanding
Proceedings of SPIE (January 10 2003)

Back to Top