Abstract
In modern science there is a rapid development of artificial intelligence, image processing has gradually fascinated and inspired the attention of many researchers in the field of artificial intelligence and has become an interesting and demanding task. The main idea of Image caption is to automatically generate natural language descriptions according to the information observed in an image, this is an important portion of scene understanding, which combines all the knowledge and information available of computer vision and natural language processing. The use of image caption is broad and noteworthy, for example, the understanding of human-computer collaboration. This paper reviews the related methods and focuses on the attention mechanism, which plays a vital role in computer vision and is broadly used in image caption generation tasks. Furthermore, the advantages and the shortcomings of these methods are discussed, providing the commonly used datasets and evaluation criteria in this field. Finally, this paper proposes some open challenges in the image caption task.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.