Text this: Evaluating large language models for abstract evaluation tasks