Conference/Proceedings | IS&T/SPIE's Electronic Imaging 2004 |
Start date | 18.01.2004 |
End date | 22.01.2004 |
Address | San Jose, CA, USA |
Author(s) | Hyoung-Gook Kim, Thomas Sikora |
Title | Automatic segmentation of speakers in broadcast audio material |
Abstract | In this paper, dimension-reduced, decorrelated spectral features for general sound recognition are applied to segment conversational speech of both broadcast news audio and panel discussion television programs. Without a priori information about number of speakers, the audio stream is segmented by a hybrid metric-based and model-based segmentation algorithm. For the measure of the performance we compare the segmentation results of the hybrid method versus metric-based segmentation with both the MPEG-7 standardized features and Mel-scale Frequency Cepstrum Coefficients (MFCC). Results show that the MFCC features yield better performance compared to MPEG-7 features. The hybrid approach significantly outperforms direct metric based segmentation. |
Key words | MPEG-7, Metric-Based and Model-Based Segmentation, Mel-scale Frequency Cepstrum Coefficients (MFCC) |
File | 0794Kim2004.ps |