A Novel Framework for Multimodal Retrieval and Visualization of Multimedia Data

Ilias Kalamaras, Athanasios Mademlis, Sotiris Malassiotis, and Dimitrios Tzovaras


multimodal search, multimodal visualization, intelligent user interfaces, human-computer interaction


The paper proposes techniques for retrieval and visualization of multimodal data, i.e. documents that contain multiple modalities, such as image, sound etc. A novel cross-modal retrieval framework is proposed, which fuses the results of unimodal retrieval methods into a multimodal retrieval list, by introducing an estimated cross-modal distance. For the visualization task, a framework for extending existing similarity-based visualization methods for multimodal data is proposed. The similarity between two multimodal objects is calculated as a weighted sum of single modality similarities. The values for the weights of the sum are determined through a semi-supervised user feedback mechanism. Experimental tests of the cross-modal retrieval method show improved performance compared to unimodal approaches and other multimodal ones. Additionally, the results of testing the visualization framework on two existing visualization methods indicate an improvement in the resulting visual data organization when user feedback is allowed.

Important Links:

Go Back