Paper
8 July 1998 Noninvasive extraction of audiovisual cues for multimodal applications
Harouna Kabre
Author Affiliations +
Abstract
We describe HOPS, a system for extracting some audiovisual cues for the modeling of a computer end-user environment. The objective of the study is to provide some reliable audiovisual cues in order to 'augment' the computer input devices set for multimodal applications. The system accepts an audio-visual scene as input and produces different kinds of events which could contribute to increase the awareness and robustness of interactive system. The described framework for the extraction of cues is ecological and homogenous. On the audio path a cross power spectrum method is applied for extracting different kind of acoustic patterns defined as acoustic segments. The acoustic signal from a microphone and the acoustic segments are firstly FFT- transformed, averaged, and secondly correlated in the spectral domain. The maxima of the inverse Fourier transform of this cross-power spectrum is the criteria for the detection of some acoustic events. On the video path, we define some initial color models of some desired cues such as mouth, eyes, etc. and then track them in the audiovisual scene recorded by a camera.
© (1998) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Harouna Kabre "Noninvasive extraction of audiovisual cues for multimodal applications", Proc. SPIE 3389, Hybrid Image and Signal Processing VI, (8 July 1998); https://doi.org/10.1117/12.316534
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Acoustics

Video

Visualization

Image filtering

Speech recognition

Cameras

Computing systems

Back to Top