Text this: Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding.