Text this: Combined CNN LSTM with attention for speech emotion recognition based on feature-level fusion.