What Makes for an Effective Gifted and Talented Screener?

Saved in:
Bibliographic Details
Title: What Makes for an Effective Gifted and Talented Screener?
Language: English
Authors: Scott J. Peters (ORCID 0000-0003-2459-3384), Matthew C. Makel (ORCID 0000-0002-3837-0088), Lindsay Ellis Lee (ORCID 0000-0003-4519-7209), Tamra Stambaugh (ORCID 0000-0001-5587-1506), Matthew T. McBee, D. Betsy McCoach, Kiana R. Johnson
Source: Gifted Child Today. 2024 47(2):98-107.
Availability: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com
Peer Reviewed: Y
Page Count: 10
Publication Date: 2024
Sponsoring Agency: Department of Education (ED)
Contract Number: S206A20000721
Document Type: Journal Articles
Reports - Descriptive
Education Level: Elementary Secondary Education
Descriptors: Academically Gifted, Talent Identification, Screening Tests, Test Validity, Test Reliability, Student Evaluation, Costs, Elementary Secondary Education
DOI: 10.1177/10762175231222301
ISSN: 1076-2175
2162-951X
Abstract: Universal screening is one of the most-common topics and well-accepted best practices within the field of gifted and talented education. There appears to be little disagreement that universally screening all students as part of a gifted and talented identification process results in fewer missed students. But surprisingly, there is little guidance on what makes for a quality universal screener--the tool that decides who needs further consideration. In this paper, we provide guidance that can help schools select the universal screener that helps them correctly identify as many students as possible at the lowest possible cost.
Abstractor: As Provided
Entry Date: 2024
Accession Number: EJ1417622
Database: ERIC
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
    Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwGVwe11FnbZInrX_wG5yva_AAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDHWf9tnEDAaStvkQsAIBEICBmkoccLzP22aDkGYKlI4zAaKmAHaXAYCrsLVo9s3HlWhUULCsNDErnWvXewWwe5HjX15rEhDMMyZ6mMiOL7RxZ6vRW1oI2_BZifhDRrNtIprowHJelyr7Kz6PAgTbRhTBN553cmpS6TE9ifSavn-NQBI2py8rtgUF6924Hnsshk_PZNwTG-Dl6uTF1R4FjM5t1zr-4O3TH1wwlhM=
Text:
  Availability: 1
  Value: <anid>AN0176065279;gct01apr.24;2025Aug11.09:20;v2.2.500</anid> <title id="AN0176065279-1">What Makes for an Effective Gifted and Talented Screener? </title> <p>Universal screening is one of the most-common topics and well-accepted best practices within the field of gifted and talented education. There appears to be little disagreement that universally screening all students as part of a gifted and talented identification process results in fewer missed students. But surprisingly, there is little guidance on what makes for a quality universal screener—the tool that decides who needs further consideration. In this paper, we provide guidance that can help schools select the universal screener that helps them correctly identify as many students as possible at the lowest possible cost.</p> <p>Keywords: gifted identification; universal screening; cost; equity; alignment</p> <hd id="AN0176065279-2">Introduction</hd> <p>The field of gifted education abounds with recommendations regarding what datapoints should be used to make identification decisions ([<reflink idref="bib17" id="ref1">17</reflink>]), how those data points should be combined ([<reflink idref="bib6" id="ref2">6</reflink>]; [<reflink idref="bib10" id="ref3">10</reflink>]; [<reflink idref="bib12" id="ref4">12</reflink>]), what norms should be used ([<reflink idref="bib8" id="ref5">8</reflink>]; [<reflink idref="bib20" id="ref6">20</reflink>]), and what cut scores are most appropriate ([<reflink idref="bib2" id="ref7">2</reflink>]; [<reflink idref="bib7" id="ref8">7</reflink>]). However, all this attention has focused on the part of the identification process that determines who is identified for a particular service—what we refer to as "phase 2" in a two-phase identification process. This phase of identification asks the question of how high must students score—and on what criteria—to be placed in a given service. This is certainly an important question. But this attention is focused on an approach to identification that is relatively uncommon—one that administers <emph>all</emph> the assessments used to make identification decisions to <emph>all</emph> students in a grade. For example, a school might make identification decisions based on achievement test scores, ability test scores, and teacher ratings of gifted behaviors. If these datapoints are collected for every student (in a grade) and then <emph>all</emph> students are considered for gifted services. We refer to this as universal consideration as there is no screening phase.</p> <p>In our experience, a more common approach to gifted identification is a two-phase model where only those students who pass through an initial screening phase (phase 1) undergo additional testing at the identification phase (phase 2). See Figure 1 for a visualization of this type of identification system.</p> <p>Graph: Figure 1.Example two-phase identification system.</p> <p>An older version of the [<reflink idref="bib16" id="ref9">16</reflink>] report noted that the most common time for identification to take place is following a teacher or parent referral. For example, identification decisions might be made based on three assessments or scores (phase 2), but those data points are only collected from students who are referred by their teacher (phase 1). Similarly, schools might administer a universal screener to all students in a grade (phase 1) but only administer the phase 2 assessments to those who perform at a certain level on that screener. But what level? And based on what screener? Two-phase systems are common, but they can be detrimental to the overall performance of an identification system when implemented poorly ([<reflink idref="bib11" id="ref10">11</reflink>]). Yet they have received far less attention than single-phase, universal consideration systems. The goal of this paper is to provide guidance on how to select and calibrate a universal screener to (<reflink idref="bib1" id="ref11">1</reflink>) minimize overall system cost, (<reflink idref="bib2" id="ref12">2</reflink>) maximize equity, and (<reflink idref="bib3" id="ref13">3</reflink>) maximize sensitivity. We begin with a description of what characteristics make for an effective universal screener and conclude with suggestions for what optimal universal screeners tend to look like in K-12 schools.</p> <hd id="AN0176065279-3">What Makes for a "Good" Screener?</hd> <p>[<reflink idref="bib21" id="ref14">21</reflink>] argued for four factors being important when evaluating the overall goodness or quality of an identification system: cost, alignment, sensitivity, and access (i.e., the CASA criteria). The authors operationalized cost as any resource devoted to the identification process, primarily the costs associated with any assessments and time (teacher and student) required to administer and take them. Alignment was the degree to which the identification criteria were measuring the same skills and dispositions that would be fostered in the resulting services. Sensitivity is a more complicated concept. It refers to the degree to which the identification system correctly identified all students who it is intended to identify. On a scale of zero to one, sensitivity of 1.0 translates to 100% of the intended students being identified and zero translates to none being correctly identified. Ideally, the students who a system is intended to identify are those who will benefit from it, but in practice this is not always the case. For obvious reasons, the goal of any identification system should be to have high sensitivity. Otherwise, talent is left undeveloped. Finally, access refers to removing any intentional or unintentional systematic barriers to being identified (e.g., not requiring a parent nomination).</p> <p>The CASA criteria require a different balance at phase 1 than at the actual identification phase. If implemented poorly, having two phases can decrease overall system sensitivity, but these phases can also decrease cost in terms of time and money devoted to phase 2. If cost were no object, single-phase, universal consideration systems would be preferable. As noted by [<reflink idref="bib11" id="ref15">11</reflink>], when compared to single-phase systems, two-phase systems can only harm access and sensitivity, since they can only cause more students to be missed. But if implemented appropriately, they can minimize harm to access and sensitivity while dramatically decreasing cost (both in dollars and time). How to balance these priorities is a difficult question that we address later in the paper.</p> <p>What makes a good screener? In the next section we argue for the following as characteristics of a "good" or effective screener (phase 1) within the context of the CASA criteria: (<reflink idref="bib1" id="ref16">1</reflink>) strong nomination validity, (<reflink idref="bib2" id="ref17">2</reflink>) strong reliability, and (<reflink idref="bib3" id="ref18">3</reflink>) being fast, cheap, and easy.</p> <hd id="AN0176065279-4">Nomination Validity</hd> <p>When considering the actual identification phase (phase 2), the CASA criterion of alignment is concerned with how well the identification or placement criteria measure the skills, abilities, and dispositions necessary to do well in the resulting service. The greater the agreement between the service and the identification criteria, the better the alignment. If the identification criteria measure skills and abilities in the arts, but the services are focused on developing math talent, alignment is poor. At phase 2, it's essential that the assessment criteria be well-aligned with the program or services to be provided—something with which the field has struggled (see [<reflink idref="bib4" id="ref19">4</reflink>]). But at phase 1, alignment should be thought of differently. <emph>The most important characteristic of a screening phase is that performance on it be strongly predictive of performance on phase 2.</emph> This is the concept of nomination validity about which we have four take-home tips.</p> <p>First, <emph>a screener with the strongest nomination validity is often one of the data points from phase 2</emph>. This is because, by nature of also being one component of phase 2, it is strongly correlated with phase 2 (more on how to find or determine this later). However, it's also possible to use something as a universal screener that is not part of phase 2.</p> <p>Second, <emph>a screener can also be an assessment already administered for other purposes.</emph> Imagine a district wants to make identification decisions based on a quantitative reasoning test and teacher ratings of gifted behaviors at phase 2. But neither of these are administered to all students. The district could administer the ability test to all students as a universal screener at high costs, or, if it is like most schools in the country, it could use its universally administered, state-mandated accountability test in math as the screener. However, state tests vary in terms of quality and content, tend to have lower ceilings, and generally are less useful for advanced learners. Because of this, they are not often included in phase-two identification criteria. Fortunately, even assessments that are only moderately correlated with the phase 2 identification criteria can still be successfully used as screeners. The phase-one cut score just needs to be lowered sufficiently as to not cause undue harm to sensitivity (so that the screener includes all of the students who many perform well on the phase 2 assessment and the system finds the students who are best matched for the services provided).</p> <p>Third, although it may seem counter-intuitive, <emph>assessments that are not strongly correlated with phase 2 can still work as screeners</emph>. For example, a state math accountability test could be used at phase 1 despite its low ceiling, focus on grade-level content, and the desire to measure broader quantitative reasoning or math ability at phase 2. Just as with the example above, these two phases measure different things conceptually and the correlation between the two may be relatively weak. But as long as it is not zero, the math test can usefully function as a screening test so long as the screening test cutoff is adjusted properly to account for the weak relationship.</p> <p>This brings us to our fourth tip: <emph>When the relationship between the screening test and phase 2 is weak or the screener is less reliable (i.e., more error prone, which will cause the correlation to be weaker), the screening test cutoff must be adjusted to a very low value to compensate</emph>. In the example here, imagine the two phases are weakly correlated either because they measure different constructs or because the screening assessment is less reliable in a psychometric sense. To compensate for this weaker relationship or lower reliability, the math test screening cutoff would need to be extremely low, perhaps the 40th or 50th percentile. This implies that 50–60% of students would have to be passed through phase 1 and go on to take the full suite of phase-two assessments for identification. Higher phase-one cut scores would cause many students who would have done well at phase 2 to be missed. Although possible, this option may not be as useful in the long term, particularly if cost is a priority.</p> <p>Despite the importance of the nomination validity coefficient, it can be difficult to calculate in practice due to incomplete data. In an ideal world, a district would have data from all students on the potential screener and all the phase-two data points. This would allow the nomination validity to be calculated as the correlation between the two phases. One could easily perform this calculation in programs like Microsoft Excel or Google Sheets. The problem is that if the data are only available on some students, the calculated correlation will be much smaller than it actually is due to a phenomenon called range restriction.</p> <p>In cases like this, an alternative method to estimate nomination validity is to refer to assessment technical manuals. For example, the Cognitive Abilities Test (CogAT) technical manual reports a correlation of.58 between the CogAT-Quantitative Reasoning subscale and the Iowa Test of Basic Skills reading subscale for grade three ([<reflink idref="bib9" id="ref20">9</reflink>]). Technical manuals do not report these correlations for every possible test combination, but they can serve as useful estimates for other states' accountability tests. Though it might seem counterintuitive to use, for example, a reading test as a screener at phase 1 when phase 2 is focused on quantitative reasoning, the reading test has an estimated nomination validity of.58. The reading test could still work as an effective screener if the screening cut score is set to an appropriately permissive value. In that case, the screener can reduce the necessity of testing all students on quantitative reasoning without significant reducing the sensitivity of the identification process.</p> <hd id="AN0176065279-5">Reliability</hd> <p>Reliability is one of the most important characteristics of an assessment. Assessment scores can be envisioned as being a mixture of two components: true score and measurement error ([<reflink idref="bib3" id="ref21">3</reflink>]). The true score represents the actual underlying quality that we are trying to measure (such as reading achievement or cognitive ability). Measurement error is everything else that affects the observed test scores and is often conceptualized as random noise in a student's score. This random noise can be exhibited in different ways. Examples include:</p> <p></p> <ulist> <item> - Testing occasions. A student may not receive identical scores on the same test given at two different testing sessions.</item> <p></p> <item> - Alternate forms. A student may achieve different scores on Form A and Form B of the same test.</item> <p></p> <item> - Raters. Two teachers completing a rating scale on the same student may produce different scores.</item> </ulist> <p>Every assessment produces scores that are a mixture of true score and measurement error. An assessment that is 100% reliable (e.g., a reliability coefficient of 1.0) would have zero measurement error—it would be a perfect reflection of whatever the assessment is intended to measure. It would therefore be perfectly consistent across occasions, forms, and (if applicable) raters. Unfortunately, 100% reliability is impossible. In contrast, a test with a reliability of 0 would be totally useless. Its scores would resemble the outputs of a random number generator, and those scores would tell you nothing at all about a student's ability or achievement. High reliability should be a requirement for any test used in education, and it is even more important for tests used to make important decisions about a child's educational experience.</p> <p>A practitioner can either (<reflink idref="bib1" id="ref22">1</reflink>) find the reliability estimates of a given assessment in its technical manual or (<reflink idref="bib2" id="ref23">2</reflink>) make an educated guess about the reliability of an assessment or process based on the reliability of similar assessments. It is important to understand that there are different reliability coefficients that measure different aspects of reliability. For example, the common <emph>Cronbach's alpha</emph> coefficient measures how consistent scores are across items on the test. It does not measure reliability over time or across raters. And between-rater reliability is extremely important (and often quite low) for gifted behaviors checklists that are completed by teachers or parents ([<reflink idref="bib14" id="ref24">14</reflink>]). Reliability is simply consistency across factors that should be irrelevant to a student's score, such as who does the grading or scoring (i.e., rater), when the test was given (i.e., test-retest reliability), or which form of a test was used (i.e., parallel forms). Whether a student took a test during third or sixth period should have no bearing on a student's score. Similarly, which teacher completed a rating scale of gifted behaviors and characteristics should not matter to the student's score (though, of course, it does). To the extent that it does, an assessment is said to yield less-reliable data.</p> <p>Reliability is important because at phase 1 it sets an upper limit on nomination validity ([<reflink idref="bib11" id="ref25">11</reflink>]). As a result, as reliability goes down, so too must a phase 1 cut score to compensate. If a test has low reliability, then the scores it generates are mostly noise. Therefore, these scores will not be very useful. They will not be highly correlated with anything else, such as phase 2 in an identification system. For example, imagine a teacher who has little understanding of which students will be successful in a gifted program. Referrals from this teacher will not be reliable; they won't consistently make the same decision about similar students or even the same decision about the same student at different times. Now imagine if teacher referrals <emph>can</emph> make for a good screener (high nomination validity) when the teacher has received substantial training and provides students plenty of opportunities to demonstrate the behaviors needed to be referred. In this scenario, referral decisions will be more consistent. But not all teachers will have received the training necessary to provide effective referrals (same with parents). As a result, the entire process is less reliable—some teachers provide referrals that are strongly predictive of performance on phase 2, while others provide inconsistent referrals. This is a major barrier to teacher ratings or referrals as a universal screener and somethings we will return to again in the example section. Although nomination validity is what matters most to a "good" screening phase, strong reliability is a prerequisite for strong nomination validity. <emph>Identification systems must first yield consistent (reliable) information before they can yield valid or actionable information.</emph></p> <p>Different assessments produce scores with differing levels of reliability. For example, the internal consistency reliability of the MAP Growth Math Test ([<reflink idref="bib18" id="ref26">18</reflink>]) is in the mid to high.90s. Somewhat lower, the Wisconsin Forward Exam 2022 technical manual reported an alpha reliability of.88 for third-grade English/language arts ([<reflink idref="bib23" id="ref27">23</reflink>]). This might seem like a small difference, but even.88 reliability can have a meaningful effect on sensitivity. Such kinds of standardized tests often produce scores with high reliability—something not true of other types of commonly used assessments.</p> <p>What kinds of assessments yield less-reliable scores? Unfortunately, many. A good general rule is that the more a student's score depends on another human, the less reliable it will be (e.g., a behavior checklist completed by an untrained rater). Similarly, the less-structured the task or the scoring criteria, the less reliable it will be. Vaguely structured essay prompts or performance tasks that must be scored by a teacher, especially if the scoring criteria are not clear and concrete, will yield less-consistent scores ([<reflink idref="bib5" id="ref28">5</reflink>]). Similarly, any assessment where the student is not clear on their task, or where understanding of the task depends on outside, irrelevant factors, will result in less-reliable data.</p> <p>A screening test with low reliability will also have low nomination validity. Using it will result in more students proceeding to phase 2 but not doing well, and many students who would have done well at phase 2 never passing phase 1. The only way to prevent such a situation from harming the sensitivity of the process would be to lower the screening test cutoff, allowing many or most students to pass to phase 2. But this means that the screening process is not effectively limiting phase 2 testing to students with a high chance of passing it. As a result, any "good" screener needs to produce highly-reliable scores. As mentioned with nomination validity the lower the reliability and resultant cut-off score in phase 1 the higher number of students to be tested in phase 2, which is costly. Leveraging an assessment that provides more reliable data in phase 1 would more accurately identify students who are likely to quality in phase 2 while reducing the number of phase 2 students assessed.</p> <hd id="AN0176065279-6">Fast, Cheap, and Easy</hd> <p>As noted above, the only reason for a screening phase to exist is to save time or money. Remember, a two-phase identification system can only cause students to be missed when compared to a single-phase system. It cannot improve sensitivity. But it can save time and money that can be better used if directed toward actual services. There is nothing inherently wrong with complex or time-consuming identification systems (e.g., evaluating a student for a disability), but schools should avoid putting students through them unnecessarily.</p> <p>If the screening assessment costs as much as the phase 2 assessments, then it is not a good screener. What is harder to measure is the cost of student and teacher time. A teacher rating of gifted behaviors and characteristics on a district-made form might not cost any money, but it does consume a sizeable amount of teachers' time (and likely lacks nomination validity). Although it is often treated as such, teacher time is not unlimited. Even in the absence of direct costs, there are opportunity costs to consider. What valuable actions or instructional tasks could teachers have undertaken if they had not been burdened with filling out rating forms on all their students? Similarly, a district-made math assessment might not require a purchase from a test publisher, but it might require an entire class period for students to take plus the time for one or more teachers to grade. It might also suffer more from the reliability challenges described earlier when compared to a standardized math assessment. Although it might seem compelling to use district-created assessments or rating scales, they do not come without disadvantages. Schools should not spend any more time on testing or assessment than is necessary. Although it is harder to quantify the opportunity cost of teacher or student time, they are both factors that should be considered when selecting or designing a screening process.</p> <p>Of course, assessments are typically used for other purposes than just gifted identification. Right now, a school might be giving the CogAT to all students as part of a single-phase gifted identification process. At first, this might seem like a place where a district could move to a two-phase system to decrease costs. But perhaps in this district, all teachers use the resulting CogAT data to differentiate their instruction to all students. As a result, although moving to a two-phase system where not all students take the CogAT would save money, it would also come at a loss to those teachers and to student instruction. Costs would decrease, but so would benefits. Similarly, perhaps the teacher rating scales of gifted behaviors used to make identification decisions are also incorporated into professional development for teachers on unconscious bias. As a result, the "cost" of teachers completing these scales might be worth the benefits they provide in terms of professional learning for teachers. The cost-benefit analysis becomes more complicated when gifted identification assessments or data points are used for other purposes besides gifted identification.</p> <hd id="AN0176065279-7">Illustrative Example: The Charles Hamilton Houston School District</hd> <p>Although each of the above three criteria might make sense when it comes to what makes for a good screener, it might be hard to visualize what applying these criteria looks like in practice. In this section, we provide a conceptual example of a hypothetical district that currently uses a single-phase universal consideration system for identification but wants to move toward a two-phase system with a universal screener to try and devote less student time and district funds to testing.</p> <p>The Charles Hamilton Houston School District (CHH) has long been concerned about the costs associated with its gifted identification system. In CHH, students in third grade are universally considered by their performance on a group-administered cognitive ability test (i.e., verbal, quantitative, and nonverbal subscales) used specifically for gifted identification, an academic achievement test (math and reading) that is district-mandated for everyone, and a teacher rating scale of gifted behaviors. These data are collected annually from every third-grade student, thereby representing a single-phase, universal consideration identification system. Students are identified for gifted services in math if they score at (or above) the 95th percentile on the average of these three data points: math achievement, quantitative ability, and teacher ratings. Identification in reading is similar but involves reading achievement, verbal ability, and teacher ratings. For each content area, data points are converted to a common scale and then averaged to make identification decisions.</p> <p>Using these identification practices, CHH identifies 5% of its students for gifted services in math. This system leaves the district confident that they are effectively identifying gifted students (high sensitivity). However, concerns voiced by administrators are forcing them re-evaluate the existing system. Post-pandemic, administrators and parents are concerned about the amount of time spent testing. Similarly, the district is concerned about the time teachers devote to the rating scales. Some teachers are notorious for not turning them in, thereby causing some students to go unidentified. The administration is also concerned about the cost of the ability test, which is only used to make gifted identification decisions. As a result, they are interested in cutting back on the number of tests they administer and possibly moving to a two-phase identification system. To do that, they need to decide what would make for the best screener.</p> <hd id="AN0176065279-8">Universal Screener Candidates for CHH</hd> <p>The baseline system sensitivity at CHH for reading or math identification is very high. But the system is also very costly and inefficient because all students take all three assessments. The question the district faces is how they could decrease the costs associated with the identification process without unduly harming sensitivity. In this section, we consider and evaluate several possible screeners that CHH could implement in place of its single-phase system. This step requires us to return to the criteria for a good screener from above: strong nomination validity, high reliability, and fast, cheap, and easy.</p> <hd id="AN0176065279-9">Nomination Validity</hd> <p>From a nomination validity perspective, the best screener will be one of the assessments that are already part of the phase 2 composite—the achievement test, ability test, or teacher rating. Although it is possible to use a screener that is not part of phase 2, for the reasons discussed above, one of the phase 2 data points will always make for a better screener than an assessment that is not included at phase 2. In the case of CHH, this means evaluating the ability test, achievement test, and teacher rating scale as possible screeners. Purely from a nomination validity perspective, any of the three of the phase 2 data points could work as a universal screener in CHH.</p> <hd id="AN0176065279-10">Reliability</hd> <p>If we limit the pool of potential screeners to one of the multiple measures currently being used to make identification decisions (i.e., ability test, achievement test, and teacher ratings), then the next step is to consider which yields the most reliable scores. This is an easy decision since, as was mentioned earlier in the paper, any data point that relies on several graders or raters yields less-reliable data than one that does not. As a result, the ability or achievement test should yield more reliable data than teacher ratings of gifted behaviors. Looking at technical manuals for various achievement and ability tests, reliability estimates tend to be high. For example, the PARCC Math Test technical report estimates reliability in the low to mid.90s ([<reflink idref="bib19" id="ref29">19</reflink>]). Similarly, [<reflink idref="bib9" id="ref30">9</reflink>] reported CogAT-Quantitative Battery reliability estimates of.93 for grade three. As mentioned above, while such standardized tests tend to focus on a narrow range of academic-related skills, their strength is they tend to yield highly reliable data.</p> <p>Reliability of teacher rating scales of gifted behaviors or characteristics is a bit more complicated. The technical manuals for the Gifted Evaluation Scale ([<reflink idref="bib13" id="ref31">13</reflink>]) and the Gifted Rating Scales ([<reflink idref="bib22" id="ref32">22</reflink>]) report reliability estimates of.96 and.98, respectively, for their "intellectual" subscales. But this can be misleading because some of a student's score on a teacher rating scale is due to the specific teacher doing the rating and not to any characteristic of the student themselves. In fact, [<reflink idref="bib14" id="ref33">14</reflink>] found that between 10% and 25% of the score a student receives on a teacher rating scale of gifted behaviors can be attributed to the person doing the rating and not the person being rated. That represents a large source or unreliability or assessment error.</p> <p>As a result, practitioners should always assume that an internal consistency reliability value reported for any rating scale score (including teacher rating scales) is a poor estimate of the instrument's actual reliability because internal consistency reliability does not consider differences between raters. The achieved reliability of a rating scale score will be much lower than reported unless raters are specifically trained to produce consistent scores ([<reflink idref="bib1" id="ref34">1</reflink>]), which is rarely the case for teacher rating scales. This is true for any data point where the students' score is dependent on an external rater (e.g., scores for students showing their work or grades on essays).</p> <p>Because teacher rating scales suffer from the problem of between-teacher variability, CHH should avoid using them as screeners. This leaves either the ability or achievement test for them to choose from after having applied the nomination validity and reliability metrics. But one criterion remains.</p> <hd id="AN0176065279-11">Fast, Cheap, and Easy</hd> <p>Finally, a good screener should be fast to administer, as inexpensive as possible, and simple to administer. Each of the three existing CHH data points has some attractive characteristics under this criterion. For example, teacher rating scales require no student time. Students need not even know they are being rated. Only the classroom teacher's time is required. Similarly, achievement and ability tests can be administered in a group setting making them much faster to collect data on an entire class or grade of students; achievement tests are often already used for other purposes such as instructional decision-making or state accountability requirements.</p> <p>But each of the three assessments also has weaknesses under this criterion. For example, ability tests are relatively expensive and not often administered for other purposes. This means 100% of the cost of administering ability assessments like the CogAT or Naglieri Nonverbal Ability Test (NNAT: [<reflink idref="bib15" id="ref35">15</reflink>]) to students would fall to the gifted education budget or department. In some cases, classroom or special education teachers might also make use of these data, but for the most part their administration is solely for gifted identification purposes. Similarly, although teacher rating scales are less expensive on a per-pupil basis than achievement or ability tests, they take more teacher time and require more teacher expertise. Some achievement tests are also of limited value if they measure only grade-level skills and are less-accurate at higher levels of whatever is being measured. So even if they are free for use as part of the identification process, it's still important to ensure they yield reliable data even for very high scores.</p> <p>Purely in terms of cost, teacher rating scales will always be the least expensive. Most are less than $2 per student, whereas ability or achievement tests cost closer to $15 per student. But as already noted, there are other downsides to teacher rating scales that do not make the cost savings so simple. When considering other economic considerations such as fast and easy to administer, the teacher rating scales may not be as quick to complete. Similarly, cognitive and achievement assessments take time, money, and some understanding of testing administration but could save time and costs in the long-term, particularly when considering how the district might find the same results and students while testing fewer in phase 2.</p> <hd id="AN0176065279-12">Where Does This Leave CHH?</hd> <p>Like any situation, different districts may have different priorities when making screening decisions. Next, we focus on which assessment CHH would select if its goal was to maximize sensitivity at the lowest possible cost. However, as with the CASA criteria ([<reflink idref="bib21" id="ref36">21</reflink>]), schools must weigh each of the above-described factors based on what matters most to them and one school might put more weight on one than another. Some schools might see the additional cost of using an ability test as a universal screener as insignificant. Others might see the challenges with teacher rating scales as an opportunity to provide professional development and help educators reflect on their own implicit biases. Still others might be unable to devote any additional cost to a universal screener and can only use something they already have on hand for other purposes. It's possible to select or design an ideal universal screener under any of these conditions, but here we presume CHH wants the highest sensitivity it can achieve (i.e., close to a universal consideration system), but at the smallest cost as possible. Under these assumptions and criteria, the existing academic achievement test wins the competition for the best universal screener at CHH.</p> <p>As already presented above, the achievement test comes at zero additional cost since it is already universally administered for other purposes. As a result, students need not sit any additional test, teachers need to do no additional grading, and no additional assessments need to be purchased. But the achievement test would also be the most reliable—certainly more reliable than the rating scale and at least as reliable as the ability test. That leaves only nomination validity to consider. Although it's certainly possible that the ability test as a universal screener would be more-strongly correlated with the composite phase 2, the different is likely to be minimal. As a result, when considering all three contenders, CHH should use its existing, universally administered achievement test as a universal screener.</p> <hd id="AN0176065279-13">What Would CHH's New System Look Like?</hd> <p>Under this new, two-phase identification system, the district would replace its existing system where all three data points are collected from all students (Figure 2) to one where the ability and teacher rating data are only collected from students who score a certain level on the universally administered achievement test (Figure 3).</p> <p>Graph: Figure 2.CHH baseline, single-phase identification system.</p> <p>Graph: Figure 3.CHH Revised, two-phase identification system.</p> <p>Exactly how high students would need to score on the universal screener before progressing to phase 2 depends on the nomination validity and reliability of the data. In general, higher reliability at both phases allows for higher cut scores at phase 1 without causing substantial harm to sensitivity. But based on our experience, it is likely CHH would only need to administer the ability test and ask teachers to complete rating scales on about a third of students or less. Put another way, because of the strong nomination validity, it is exceedingly unlikely that a student who scores lower than the top third on the achievement test would score in the 95th percentile in phase 2. As a result, CHH could collect ability and teacher rating data from only those students who score in the top third on the achievement test and still be confident they are not missing many students had a single-phase system been used. This would save the district substantial money, save teacher time from completing as many rating scales, and save students time from taking an assessment that does not contribute to their educational experience, all while not harming the accuracy of the identification process.</p> <hd id="AN0176065279-14">Summary</hd> <p>Universal screening is consistently recommended to improve the gifted identification process. However, few concrete suggestions have previously been given for how to effectively implement universal screening into identification processes. Using previously proposed CASA criteria for assessing overall identification processes, we propose the adoption of three criteria for selecting a universal screener: nomination validity, reliability, and its being fast, cheap, and easy. There are several takeaways that can guide districts as they consider selecting a universal screener as part of a two-phase gifted and talented identification process:</p> <p></p> <ulist> <item> 1. Universal consideration systems, where the data used to make gifted identification decisions are collected from all students in a grade, result in high sensitivity, but come at the highest possible cost.</item> <p></p> <item> 2. Past research has shown that two-phase systems can result in similar levels of sensitivity as universal consideration systems, but at far less cost ([<reflink idref="bib11" id="ref37">11</reflink>]). However, this is dependent on the nomination validity and reliability of the screener.</item> <p></p> <item> 3. One of the phase 2 assessments used to make actual gifted identification decisions should provide the strongest nomination validity and match to services.</item> <p></p> <item> 4. An assessment that is already being administered for other purposes is likely to be the lowest cost screener in terms of time, money, and resources.</item> <p></p> <item> 5. Taken together, schools that already administer a high-quality academic achievement test can also use those data points as a universal screener. Students who score at a certain level then move on to phase 2 where the actual identification or program eligibility decisions are made.</item> <p></p> <item> 6. A "perfect" screener, one that fits all of the criteria listed in this paper, is useless if the services do not match the phase 2 identification criteria and what the students need in support of developing their identified talents. In the end, identification is a means to an end.</item> </ulist> <p>If schools are seeking to correctly identify as many students as possible at the lowest possible cost, universal administration of a screener with high nomination validity, high reliability, that is fast, cheap, and easy to administer can provide an essential tool when targeted toward a specific and aligned service.</p> <hd id="AN0176065279-15">ORCID iDs</hd> <p>Scott J. Peters https://orcid.org/0000-0003-2459-3384</p> <p>Matthew C. Makel https://orcid.org/0000-0002-3837-0088</p> <p>Lindsay Ellis Lee https://orcid.org/0000-0003-4519-7209</p> <p>Tamra Stambaugh https://orcid.org/0000-0001-5587-1506</p> <ref id="AN0176065279-16"> <title> References </title> <blist> <bibl id="bib1" idref="ref11" type="bt">1</bibl> <bibtext> Bandalos D. L. (2017). Measurement theory and application for the social sciences. Guilford Press.</bibtext> </blist> <blist> <bibl id="bib2" idref="ref7" type="bt">2</bibl> <bibtext> Borland J. H. (2009). Myth 2: The gifted constitute 3% to 5% of the population. Moreover, giftedness equals high IQ, which is a stable measure of aptitude: Spinal tap psychometrics in gifted education. Gifted Child Quarterly, 53(4), 236–238. https://doi.org/10.1177/0016986209346825</bibtext> </blist> <blist> <bibl id="bib3" idref="ref13" type="bt">3</bibl> <bibtext> Crocker L. M., Algina J. (1986). Introduction to classical and modern test theory. Holt, Rinehart, and Winston.</bibtext> </blist> <blist> <bibl id="bib4" idref="ref19" type="bt">4</bibl> <bibtext> Gubbins E. J., Siegle D., Ottone-Cross K., McCoach D. B., Langley S. D., Callahan C. M., Brodersen A. V., Caughey M. (2021). Identifying and serving gifted and talented students: Are identification and services connected? Gifted Child Quarterly, 65(2), 115–131. https://doi.org/10.1177/0016986220988308</bibtext> </blist> <blist> <bibl id="bib5" idref="ref28" type="bt">5</bibl> <bibtext> Harlen W. (2005). Trusting teachers' judgement: Research evidence of the reliability and validity of teachers' assessment used for summative purposes. Research Papers in Education, 20(3), 245–270. https://doi.org/10.1080/02671520500193744</bibtext> </blist> <blist> <bibl id="bib6" idref="ref2" type="bt">6</bibl> <bibtext> Lakin J. M. (2018). Making the cut in gifted selection: Score combination rules and their impact on program diversity. Gifted Child Quarterly, 62(2), 210–219. https://doi.org/10.1177/0016986217752099</bibtext> </blist> <blist> <bibl id="bib7" idref="ref8" type="bt">7</bibl> <bibtext> Lee L. E., Ottwein J. K., Peters S. J. (2020). Eight universal truths of identifying students for advanced academic interventions. In Robins J. H., Jolly J. L., Karnes F. A., Bean S. M. (Eds), Methods and materials for teaching the gifted (5th ed., pp. 61–80). Prufrock Press.</bibtext> </blist> <blist> <bibl id="bib8" idref="ref5" type="bt">8</bibl> <bibtext> Lohman D. F. (2009). Identifying academically talented students: Some general principles, two specific procedures. In Shavinina L. V. (Ed), International handbook on giftedness (pp. 971–997). Springer Science + Business Media.</bibtext> </blist> <blist> <bibl id="bib9" idref="ref20" type="bt">9</bibl> <bibtext> Lohman D. F. (2012). Cognitive Abilities Test Form 7: Research and development guide. Riverside.</bibtext> </blist> <blist> <bibtext> Makel M. C., Peters S. J., Lee L. E., Stambaugh T., McBee M., McCoach D. B., Johnson K. (2023). Effective identification through multiple criteria. https://doi.org/10.35542/osf.io/48nv7</bibtext> </blist> <blist> <bibtext> McBee M. T., Peters S. J., Miller E. M. (2016). The impact of the nomination stage on gifted program identification: A comprehensive psychometric analysis. Gifted Child Quarterly, 60(4), 258–278. https://doi.org/10.1177/0016986216656256</bibtext> </blist> <blist> <bibtext> McBee M. T., Peters S. J., Waterman C. (2014). Combining scores in multiple-criteria assessment systems: The impact of the combination rule. Gifted Child Quarterly, 58(1), 69–89. https://doi.org/10.1177/0016986213513794</bibtext> </blist> <blist> <bibtext> McCarney S. B., Arthaud T. J. (2009). Gifted Evaluation Scale (3rd ed.). Hawthorne Educational Services.</bibtext> </blist> <blist> <bibtext> McCoach D. B., Gambino A. J., Peters S. J., Long D., Siegle D. (2023). How much teacher is in teacher rating scales? EdWorkingPaper: 23-828. Annenberg Institute at Brown University. https://doi.org/10.26300/6vpz-en33</bibtext> </blist> <blist> <bibtext> Naglieri J. A. (2008). Naglieri Nonverbal Ability Test – (2nd ed.). Pearson.</bibtext> </blist> <blist> <bibtext> National Association for Gifted Children. (2015). State of the states in gifted education: Policy and practice data. https://nagc.org/page/state-of-the-states-report</bibtext> </blist> <blist> <bibtext> National Association for Gifted Children. (2011). Identifying and serving culturally and linguistically diverse gifted students. https://cdn.ymaws.com/nagc.org/resource/resmgr/knowledge-center/position-statements/Identifying_and_Serving_Cult.pdf</bibtext> </blist> <blist> <bibtext> NWEA. (2019). MAP® Growth™ technical report. Author. https://<ulink href="http://www.nwea.org/uploads/2021/11/MAP-Growth-Technical-Report-2019%5fNWEA.pdf">www.nwea.org/uploads/2021/11/MAP-Growth-Technical-Report-2019%5fNWEA.pdf</ulink></bibtext> </blist> <blist> <bibtext> Partnership for Assessment of Readiness for College and Careers. (2018). PARCC final technical report for 2018. Pearson. https://files.eric.ed.gov/fulltext/ED599198.pdf</bibtext> </blist> <blist> <bibtext> Peters S., Rambo-Hernandez K.E., Makel M., Matthews M., Plucker J. (2019). The effect of local norms on racial and ethnic representation in gifted education. AERA Open, 5(2), 1–18. https://doi.org/10.1177/2332858419848446</bibtext> </blist> <blist> <bibtext> Peters S. J., Stambaugh T., Makel M. C., Lee L. E., McBee M. T., McCoach D. B., Johnson K. R. (2023). The CASA criteria for evaluating gifted and talented identification systems: Cost, alignment, sensitivity, and access. Gifted Child Quarterly, 67(2), 137–150. https://doi.org/10.1177/00169862221124887</bibtext> </blist> <blist> <bibtext> Pfeiffer S. I., Jarosewich T. (2003). Gifted Rating Scales manual. Pearson.</bibtext> </blist> <blist> <bibtext> Wisconsin Department of Public Instruction. (2022). Wisconsin Forward Exam: Spring 2022 technical report. https://dpi.wi.gov/sites/default/files/imce/assessment/pdf/2022_Technical_Report_Final.pdf</bibtext> </blist> </ref> <ref id="AN0176065279-17"> <title> Footnotes </title> <blist> <bibtext> The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The first author is employed by Houghton-Mifflin-Harcourt - the company that publishes MAP Growth.</bibtext> </blist> <blist> <bibtext> The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the U.S. Department of Education grant number S206A200007–21.</bibtext> </blist> </ref> <aug> <p>By Scott J. Peters; Matthew C. Makel; Lindsay Ellis Lee; Tamra Stambaugh; Matthew T. McBee; D. Betsy McCoach and Kiana R. Johnson</p> <p>Reported by Author; Author; Author; Author; Author; Author; Author</p> <p></p> <p>Scott J. Peters, PhD, is a Senior Research Scientist at NWEA. Prior to joining NWEA he served as a Professor of Assessment and Research Methodology at the University of Wisconsin–Whitewater. His research work focuses on educational assessment and data use, gifted and talented student identification, equity within advanced educational opportunities, and educational policy.</p> <p>Matthew C. Makel, PhD, is a Professor and Research Chair in High Abilities Studies in the Werklund School of Education at the University of Calgary. His research focuses on academic talent development and open science research methods.</p> <p>Lindsay Ellis Lee, PhD, is an Assistant Research Professor in the Department of Pediatrics and a Faculty Research Affiliate with the Center for Excellence in Early Childhood Learning & Development at East Tennessee State University. Dr. Lee's research interests include equitably identifying advanced students, evaluating psychological and educational measurements, talent development across domains, and developing learning environments that encourage growth.</p> <p>Tamra Stambaugh, PhD, is an Associate Professor and the Margo Long Endowed Chair in Gifted Education at Whitworth University. Her research interests include curriculum and instruction, the development of expertise and talent, rural gifted education, poverty, and professional development.</p> <p>Matthew McBee, PhD, is a recovering academic and is now Vice President of Data Science at Service Management Group.</p> <p>D. Betsy McCoach, PhD, is a Professor of Research Methods, Measurement, and Evaluation in the Educational Psychology department at the University of Connecticut. Dr. McCoach's research interests include latent variable modeling, multilevel modeling, longitudinal modeling, instrument design, and gifted education.</p> <p>Kiana R. Johnson, PhD, is an Associate Professor in the Department of Pediatrics, Quillen College of Medicine, East Tennessee State University.</p> </aug> <nolink nlid="nl1" bibid="bib17" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib10" firstref="ref3"></nolink> <nolink nlid="nl3" bibid="bib12" firstref="ref4"></nolink> <nolink nlid="nl4" bibid="bib20" firstref="ref6"></nolink> <nolink nlid="nl5" bibid="bib16" firstref="ref9"></nolink> <nolink nlid="nl6" bibid="bib11" firstref="ref10"></nolink> <nolink nlid="nl7" bibid="bib21" firstref="ref14"></nolink> <nolink nlid="nl8" bibid="bib14" firstref="ref24"></nolink> <nolink nlid="nl9" bibid="bib18" firstref="ref26"></nolink> <nolink nlid="nl10" bibid="bib23" firstref="ref27"></nolink> <nolink nlid="nl11" bibid="bib19" firstref="ref29"></nolink> <nolink nlid="nl12" bibid="bib13" firstref="ref31"></nolink> <nolink nlid="nl13" bibid="bib22" firstref="ref32"></nolink> <nolink nlid="nl14" bibid="bib15" firstref="ref35"></nolink>
Header DbId: eric
DbLabel: ERIC
An: EJ1417622
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: What Makes for an Effective Gifted and Talented Screener?
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Scott+J%2E+Peters%22">Scott J. Peters</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-2459-3384">0000-0003-2459-3384</externalLink>)<br /><searchLink fieldCode="AR" term="%22Matthew+C%2E+Makel%22">Matthew C. Makel</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-3837-0088">0000-0002-3837-0088</externalLink>)<br /><searchLink fieldCode="AR" term="%22Lindsay+Ellis+Lee%22">Lindsay Ellis Lee</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-4519-7209">0000-0003-4519-7209</externalLink>)<br /><searchLink fieldCode="AR" term="%22Tamra+Stambaugh%22">Tamra Stambaugh</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-5587-1506">0000-0001-5587-1506</externalLink>)<br /><searchLink fieldCode="AR" term="%22Matthew+T%2E+McBee%22">Matthew T. McBee</searchLink><br /><searchLink fieldCode="AR" term="%22D%2E+Betsy+McCoach%22">D. Betsy McCoach</searchLink><br /><searchLink fieldCode="AR" term="%22Kiana+R%2E+Johnson%22">Kiana R. Johnson</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Gifted+Child+Today%22"><i>Gifted Child Today</i></searchLink>. 2024 47(2):98-107.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 10
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2024
– Name: SourceSuprt
  Label: Sponsoring Agency
  Group: SrcSuprt
  Data: Department of Education (ED)
– Name: NumberContract
  Label: Contract Number
  Group: NumCntrct
  Data: S206A20000721
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Descriptive
– Name: Audience
  Label: Education Level
  Group: Audnce
  Data: <searchLink fieldCode="EL" term="%22Elementary+Secondary+Education%22">Elementary Secondary Education</searchLink>
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Academically+Gifted%22">Academically Gifted</searchLink><br /><searchLink fieldCode="DE" term="%22Talent+Identification%22">Talent Identification</searchLink><br /><searchLink fieldCode="DE" term="%22Screening+Tests%22">Screening Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Test+Validity%22">Test Validity</searchLink><br /><searchLink fieldCode="DE" term="%22Test+Reliability%22">Test Reliability</searchLink><br /><searchLink fieldCode="DE" term="%22Student+Evaluation%22">Student Evaluation</searchLink><br /><searchLink fieldCode="DE" term="%22Costs%22">Costs</searchLink><br /><searchLink fieldCode="DE" term="%22Elementary+Secondary+Education%22">Elementary Secondary Education</searchLink>
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1177/10762175231222301
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 1076-2175<br />2162-951X
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Universal screening is one of the most-common topics and well-accepted best practices within the field of gifted and talented education. There appears to be little disagreement that universally screening all students as part of a gifted and talented identification process results in fewer missed students. But surprisingly, there is little guidance on what makes for a quality universal screener--the tool that decides who needs further consideration. In this paper, we provide guidance that can help schools select the universal screener that helps them correctly identify as many students as possible at the lowest possible cost.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2024
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1417622
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1417622
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1177/10762175231222301
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 10
        StartPage: 98
    Subjects:
      – SubjectFull: Academically Gifted
        Type: general
      – SubjectFull: Talent Identification
        Type: general
      – SubjectFull: Screening Tests
        Type: general
      – SubjectFull: Test Validity
        Type: general
      – SubjectFull: Test Reliability
        Type: general
      – SubjectFull: Student Evaluation
        Type: general
      – SubjectFull: Costs
        Type: general
      – SubjectFull: Elementary Secondary Education
        Type: general
    Titles:
      – TitleFull: What Makes for an Effective Gifted and Talented Screener?
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Scott J. Peters
      – PersonEntity:
          Name:
            NameFull: Matthew C. Makel
      – PersonEntity:
          Name:
            NameFull: Lindsay Ellis Lee
      – PersonEntity:
          Name:
            NameFull: Tamra Stambaugh
      – PersonEntity:
          Name:
            NameFull: Matthew T. McBee
      – PersonEntity:
          Name:
            NameFull: D. Betsy McCoach
      – PersonEntity:
          Name:
            NameFull: Kiana R. Johnson
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-print
              Value: 1076-2175
            – Type: issn-electronic
              Value: 2162-951X
          Numbering:
            – Type: volume
              Value: 47
            – Type: issue
              Value: 2
          Titles:
            – TitleFull: Gifted Child Today
              Type: main
ResultId 1