Discovery and fusion of salient multimodal features toward news story segmentation

Winston Hsu; Shih-Fu Chang; Chih-Wei Huang; Lyndon Kennedy; Ching-Yung Lin; Giridharan Iyengar

doi:10.1117/12.533037

18 December 2003 Discovery and fusion of salient multimodal features toward news story segmentation

Winston Hsu, Shih-Fu Chang, Chih-Wei Huang, Lyndon Kennedy, Ching-Yung Lin, Giridharan Iyengar

Proceedings Volume 5307, Storage and Retrieval Methods and Applications for Multimedia 2004; (2003) https://doi.org/10.1117/12.533037
Event: Electronic Imaging 2004, 2004, San Jose, California, United States

Abstract

In this paper, we present our new results in news video story segmentation and classification in the context of TRECVID video retrieval benchmarking event 2003. We applied and extended the Maximum Entropy statistical model to effectively fuse diverse features from multiple levels and modalities, including visual, audio, and text. We have included various features such as motion, face, music/speech types, prosody, and high-level text segmentation information. The statistical fusion model is used to automatically discover relevant features contributing to the detection of story boundaries. One novel aspect of our method is the use of a feature wrapper to address different types of features -- asynchronous, discrete, continuous and delta ones. We also developed several novel features related to prosody. Using the large news video set from the TRECVID 2003 benchmark, we demonstrate satisfactory performance (F1 measures up to 0.76 in ABC news and 0.73 in CNN news), present how these multi-level multi-modal features construct the probabilistic framework, and more importantly observe an interesting opportunity for further improvement.

Citation Download Citation

Winston Hsu, Shih-Fu Chang, Chih-Wei Huang, Lyndon Kennedy, Ching-Yung Lin, and Giridharan Iyengar "Discovery and fusion of salient multimodal features toward news story segmentation", Proc. SPIE 5307, Storage and Retrieval Methods and Applications for Multimedia 2004, (18 December 2003); https://doi.org/10.1117/12.533037

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available