Paper
17 January 2005 Multimodal approach for speaker identification in news programs
Author Affiliations +
Abstract
The process of identifying speakers in a news program is difficult using only text information. We propose a system that will first perform text and video processing separately to identify the start of speech of a speaker. These start of speech locations are aligned and used to identify a change of speaker in the program. An analysis is performed to identify the contribution of the text and video information. It will be be shown that the change of speaker locations identified by our alignment algorithm is more accurate then either mode individually.
© (2005) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Anthony F. Martone, Cuneyt M. Taskiran, and Edward J. Delp "Multimodal approach for speaker identification in news programs", Proc. SPIE 5682, Storage and Retrieval Methods and Applications for Multimedia 2005, (17 January 2005); https://doi.org/10.1117/12.587870
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Transparent conductors

Video processing

Visualization

Carbon monoxide

Digital video discs

Feature extraction

RELATED CONTENT


Back to Top