Text this: A Feature Fusion Network Architecture for Acoustic Scene Classification Using Convolutional Neural Network and Swin Transformer.