Text this: Interpretable multimodal emotion recognition using optimized transformer model with SHAP-based transparency.