This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich
Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization
on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of
six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of
Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3).
Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant
microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech
removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW)
method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall
error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.