Text this: Retrieval-Guided and Semantically Grounded Image Captioning for Open-Domain Scenes.