Publications

conference paper

Conference/Proceedings	ACM Multimedia 2011 (Workshop on Social and Behavioral Networked Media Access - SBNMA)
Start date	28.11.2011
End date	01.12.2011
Organisation	ACM
Author(s)	Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora
Title	A Hierarchical, Multi-modal Approach for Placing Videos on the Map using Millions of Flickr Photographs
Invited?	Yes
Abstract	We present a hierarchical, multi-modal approach for placing Flickr videos on the map. Our approach makes use of external resources to identify toponyms in the metadata and of visual and textual features to identify similar content. First, the geographical boundaries extraction method identifies the country and its dimension. We use a database of more than 3.6 million Flickr images to group them together into geographical regions and to build a hierarchical model. A fusion of visual and textual methods is used to classify the videos’ location into possible regions. Next, the visually nearest neighbour method uses a nearest neighbour approach to find correspondences with the training images within the preclassified regions. The video sequences are represented us- ing low-level feature vectors from multiple key frames. The Flickr videos are tagged with the geo-information of the visually most similar training item within the regions that is previously filtered by the pre-classification step for each test video. The results show that we are able to tag one third of our videos correctly within an error of 1 km.
Key words	multimedia analysis, geo-localization, gazetteers, Bernoulli classiﬁcation, MPEG- 7 visual features
File	1325Kelm2011.pdf

[BibTeX]