Text this: Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?