Text this: A two-stage deep learning framework for lead instrument recognition in polyphonic music featuring Chinese instruments.