AN ADVERSARIAL AND DEEP HASHING-BASED HIERARCHICAL SUPERVISED CROSS-MODAL IMAGE AND TEXT RETRIEVAL ALGORITHM, 77-86.

Ruidong Chen, Baohua Qiang, Mingliang Zhou, Shihao Zhang, Hong Zheng, and Chenghua Tang

Keywords

Cross-modal image and text retrieval, deep hash algorithm, hierarchical supervision, adversarial network

Abstract

With the rapid development of robotics and sensor technology, vast amounts of valuable multimodal data are collected. It is extremely critical for a variety of robots performing automated tasks to find relevant multimodal information quickly and efficiently in large amounts of data. In this paper, we propose an adversarial and deep hashing-based hierarchical supervised cross-modal image and text retrieval algorithm to perform semantic analysis and association modelling on image and text by making full use of the rich semantic information of the label hierarchy. First, the modal adversarial block and the modal differentiation network both perform adversarial learning to keep different modalities with the same semantics closest to each other in a common subspace. Second, the intra-label layer similarity loss and inter-label layer correlation loss are used to fully exploit the intrinsic similarity existing in each label layer and the correlation existing between label layers. Finally, an objective function for different semantic data is redesigned to keep data with different semantics away from each other in a common subspace, thus avoiding interference of retrieval by data of different semantics. The experimental results on two cross-modal retrieval datasets with hierarchically supervised information show that the proposed method substantially enhances retrieval performance and consistently outperforms other state-of-the-art methods.

Important Links:



Go Back