Text this: Cross-Modal Adaptive Fusion and Multi-Scale Aggregation Network for RGB-T Crowd Density Estimation and Counting.