conference paper

Conference/ProceedingsWiamis 2009
Start date06.05.2009
End date08.05.2009
Author(s)Shan Jin, Hemant Misra, Thomas Sikora, Joemon Jose
TitleAutomatic Topic Detection Strategy for information retrieval in Spoken Document
AbstractThis paper suggests an alternative solution for the task of spoken
document retrieval (SDR). The proposed system runs retrieval on
multi-level transcriptions (word and phone) produced by word and
phone recognizers respectively, and their outputs are combined. We
propose to use latent Dirichlet allocation (LDA) model for capturing
the semantic information on word transcription. The LDA model
is employed for estimating topic distribution in queries and word
transcribed spoken documents, and the matching is performed at the
topic level. Acoustic matching between query words and phonetically
transcribed spoken documents is performed using phone-based
matching algorithm. The results of acoustic and topic level matching
methods are compared and shown to be complementary.
Key wordssemantic analysis, phone-based SDR