Text this: Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features.