Text this: A time-frequency fusion model for multi-channel speech enhancement.