|Title||Model-Based Audio Classification/Segmentation Using Perceptual Feature Extraction Methods|
|Tutor||Dr.-Ing. Hyoung-Gook Kim|
|Abstract||The rapid increase of audiovisual data in the last ten years necessitates methods for their automatic analysis. As the analysis of audio data is less complex and the semantic meaning of the content is included in the audio and in the visual signal, the audio classification and segmentation can also provide useful results for the analysis of the visual signal. The features used for classification should be selected in such a way that high recognition rates are attainable.|
This study examines five different perceptual feature extraction methods. Here, the methods mel-frequency cepstral coefficients (MFCCs), mel-filter bank with principal component analysis (Mel-PCA), mel-filter bank with linear discriminant analysis (Mel-LDA), perceptual linear predictive cepstral coefficients (PLPCCs) and the combination of MFCCs with PLPCCs are tested with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) for speaker recognition, sound classification and musical instrument identification. The recognition rates of these methods are compared in dependence of the model, the number of feature dimensions, the number of mel-filters and the processed frequency range.
The results demonstrate that GMMs outperform HMMs in recognition rates and classification speed for the performed experiments whereby the training has a greater computational cost. The combination of mel-filter bank with PCA or LDA can advance the recognition rates in certain cases. Whereas, MFCC and PLPCC achieve often approximately equal or even better recognition rates. As the number of mel-filters and the limitation of the frequency range have also an effect on recognition rates, the method, the model and the parameters should be selected in dependence on the classification task.
|Key words||Audio Classification, Audio Segmentation, Feature Extraction, MFCC, PLP, GMM, HMM, PCA, ICA, LDA|