Text this: A multi-scale network with density-guided structural similarity loss and adaptive local maxima detection for crowd counting.