Abstract: Captioning images is a challenging task at the intersection of Computer Vision (CV) and Natural Language Processing (NLP), that involves generating descriptive text to depict the content of ...