Equating in Small-Scale Language Testing Programs

Saved in:
Bibliographic Details
Title: Equating in Small-Scale Language Testing Programs
Language: English
Authors: LaFlair, Geoffrey T., Isbell, Daniel, May, L. D. Nicolas, Gutierrez Arvizu, Maria Nelly, Jamieson, Joan
Source: Language Testing. Jan 2017 34(1):127-144.
Availability: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: http://sagepub.com
Peer Reviewed: Y
Page Count: 18
Publication Date: 2017
Document Type: Journal Articles
Reports - Research
Education Level: Higher Education
Postsecondary Education
Descriptors: Language Tests, Equated Scores, Testing Programs, Comparative Analysis, Listening Comprehension Tests, Reading Tests, English for Academic Purposes, College Second Language Programs, Error of Measurement, Student Placement, Foreign Students, College Students
DOI: 10.1177/0265532215620825
ISSN: 0265-5322
Abstract: Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by estimates of quality, namely the method with the least error as defined by random error, systematic error, and total error. This study compared seven different equating methods to no equating--mean, linear Levine, linear Tucker, chained equipercentile, circle-arc, nominal weights mean, and synthetic. A non-equivalent groups anchor test (NEAT) design was used to compare two listening and reading test forms based on small samples (one with 173 test takers the other, 88) at a university's English for Academic Purposes (EAP) program. The equating methods were evaluated based on the amount of error they introduced and their practical effects on placement decisions. It was found that two types of error (systematic and total) could not be reliably computed owing to the lack of an adequate criterion; consequently, only random error was compared. Among the seven methods, the circle-arc method introduced the least random error as estimated by the standard error of equating (SEE). Classification decisions made using the seven methods differed from no equating; all methods indicated that fewer students were ready for university placement. Although interpretations regarding the best equating method could not be made, circle-arc equating reduced the amount of random error in scores, had reportedly low bias in other studies, accounted for form and person differences, and was relatively easy to compute. It was chosen as the method to pilot in an operational setting.
Abstractor: As Provided
Number of References: 30
Entry Date: 2016
Accession Number: EJ1123993
Database: ERIC
Description
Abstract:Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by estimates of quality, namely the method with the least error as defined by random error, systematic error, and total error. This study compared seven different equating methods to no equating--mean, linear Levine, linear Tucker, chained equipercentile, circle-arc, nominal weights mean, and synthetic. A non-equivalent groups anchor test (NEAT) design was used to compare two listening and reading test forms based on small samples (one with 173 test takers the other, 88) at a university's English for Academic Purposes (EAP) program. The equating methods were evaluated based on the amount of error they introduced and their practical effects on placement decisions. It was found that two types of error (systematic and total) could not be reliably computed owing to the lack of an adequate criterion; consequently, only random error was compared. Among the seven methods, the circle-arc method introduced the least random error as estimated by the standard error of equating (SEE). Classification decisions made using the seven methods differed from no equating; all methods indicated that fewer students were ready for university placement. Although interpretations regarding the best equating method could not be made, circle-arc equating reduced the amount of random error in scores, had reportedly low bias in other studies, accounted for form and person differences, and was relatively easy to compute. It was chosen as the method to pilot in an operational setting.
ISSN:0265-5322
DOI:10.1177/0265532215620825