Paper The following article is Open access

A Neighbourhood Encoding Framework for Deep Mining Heterogeneous Texts in Recipe-image Retrieval

, , , and

Published under licence by IOP Publishing Ltd
, , Citation Changsheng Zhu et al 2021 J. Phys.: Conf. Ser. 1813 012029 DOI 10.1088/1742-6596/1813/1/012029

1742-6596/1813/1/012029

Abstract

Cross-modal retrieval usually fills the semantic gap between different modalities by sharing subspaces. However, existing methods rarely consider that the data in a certain modality may be heterogeneous when mapping multimodal data into a shared subspace. In addition, most existing methods focus on semantic associations between different modalities, while few approaches consider the semantic associations within a single modality. To address the above two deficiencies, we propose a Neighbourhood Encoding (NE) framework that mines the semantic association of data in the same modality, solves the problem of data heterogeneity by improving the semantic expression of a single modality. To verify the effectiveness of the proposed framework, we use two types of recurrent neural networks to instantiate the framework. Experiments show that the instantiated approaches outperform existing advanced methods in both text-to-image and image-to-text retrieval directions.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.
10.1088/1742-6596/1813/1/012029