View in EDS HTML Full Text PDF Full Text

'Rater Training' Re-Imagined for Work-Based Assessment in Medical Education

Saved in:

Bibliographic Details
Title:	'Rater Training' Re-Imagined for Work-Based Assessment in Medical Education
Language:	English
Authors:	Tavares, Walter, Kinnear, Benjamin, Schumacher, Daniel J., Forte, Milena
Source:	Advances in Health Sciences Education. 2023 28(5):1697-1709.
Availability:	Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed:	Y
Page Count:	13
Publication Date:	2023
Document Type:	Journal Articles Reports - Evaluative
Descriptors:	Medical Education, Interrater Reliability, Evaluation Methods, Training, Psychometrics, Validity
DOI:	10.1007/s10459-023-10237-8
ISSN:	1382-4996 1573-1677
Abstract:	In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused on modifying faculty behaviours to achieve psychometric ideals (e.g., reliability, inter-rater reliability, accuracy). The authors argue these ideals may now be poorly aligned with contemporary research informing work-based assessment, introducing a compatibility threat, with no clear direction on how to proceed. To address this issue, the authors provide a brief historical review of "rater training" and provide an analysis of the literature examining the effectiveness of rater training programs. They focus mainly on what has served to define effectiveness or improvements. They then draw on philosophical and conceptual shifts in assessment to demonstrate why the function, effectiveness aims, and structure of rater training requires reimagining. These include shifting competencies for assessors, viewing assessment as a complex cognitive task enacted in a social context, evolving views on biases, and reprioritizing which validity evidence should be most sought in medical education. The authors aim to advance the discussion on rater training by challenging implicit incompatibility issues and stimulating ways to overcome them. They propose that "rater training" (a moniker they suggest be reserved for strong psychometric aims) be augmented with "assessor readiness" programs that link to contemporary assessment science and enact the principle of compatibility between that science and ways of engaging with advances in real-world faculty-learner contexts.
Abstractor:	As Provided
Entry Date:	2023
Accession Number:	EJ1403089
Database:	ERIC
Full text is not displayed to guests. Login for full access.

FullText	Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwErDFqhdM0tSWKDfnTnCt5qAAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDC_EymnX6CPMD_8KvwIBEICBmtg4NaUdjDLuJCi0aJ934cl8r1YszNoX0euiHNsMMT_H9wbyzwsChdbjOBJoPROjU0lg2XN0qIXPWvXnuW1_hQDEh1kiHcyL6SE7s4Co_ur7bzmPCiFwZ72V5AhvcnVfdZNs9jU9j6fmKr5C6TbyIDWTlIW1u17aTp2-X9iuTDihU3b6H0T0AKnN3QHzdA1oKU0OZlS0YWzwNKY= Text: Availability: 1 Value: <anid>AN0174029252;oak01dec.23;2023Dec08.04:10;v2.2.500</anid> <title id="AN0174029252-1">"Rater training" re-imagined for work-based assessment in medical education </title> <p>In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to improve rater performance and contributions during assessment events. Historically, rater training programs have focused on modifying faculty behaviours to achieve psychometric ideals (e.g., reliability, inter-rater reliability, accuracy). The authors argue these ideals may now be poorly aligned with contemporary research informing work-based assessment, introducing a compatibility threat, with no clear direction on how to proceed. To address this issue, the authors provide a brief historical review of "rater training" and provide an analysis of the literature examining the effectiveness of rater training programs. They focus mainly on what has served to define effectiveness or improvements. They then draw on philosophical and conceptual shifts in assessment to demonstrate why the function, effectiveness aims, and structure of rater training requires reimagining. These include shifting competencies for assessors, viewing assessment as a complex cognitive task enacted in a social context, evolving views on biases, and reprioritizing which validity evidence should be most sought in medical education. The authors aim to advance the discussion on rater training by challenging implicit incompatibility issues and stimulating ways to overcome them. They propose that "rater training" (a moniker they suggest be reserved for strong psychometric aims) be augmented with "assessor readiness" programs that link to contemporary assessment science and enact the principle of compatibility between that science and ways of engaging with advances in real-world faculty-learner contexts.</p> <p>Keywords: Assessment; Rater training; Work-based assessment; Validity</p> <p>Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</p> <hd id="AN0174029252-2">Rethinking rater training</hd> <p>In this perspective, we critically examine "rater training" as it has been conceptualized and used in medical education. Rater training garnered attention in part because of the increasing use of observational work-based assessments (WBA) in medical education and recognition that raters, not forms or scales, are what warrant attention and support (Holmboe, [<reflink idref="bib22" id="ref1">22</reflink>]). By "raters" we mean faculty (also referred to as observers or assessors) who are tasked with observing and processing the clinical performances of trainees with the aim of formulating a qualitative or quantitative interpretation, judgment, or report for formative or summative purposes. By "rater training" we mean the educational events intended to <emph>improve</emph> rater performance and contributions. Rater training was studied as early as the 1940s as a way of improving assessment outcomes, with researchers arguing that it is the lack of training that is the most usual source of weakness in rating [assessment] programs (Bittner, [<reflink idref="bib1" id="ref2">1</reflink>]). In medical education, while conceding that rater training may not be able to overcome all limitations, researchers have argued that it remains warranted because, "assessment ability is acquired, not innate; it requires deliberate practice and refinement over time (Lockyer et al., [<reflink idref="bib33" id="ref3">33</reflink>], p. 612)." When assessments produce what researchers identify as poor or unexpected results, rater training continues to be proposed as a solution (Gomes et al., [<reflink idref="bib14" id="ref4">14</reflink>]). Published tips and guidelines now exist in medical education on how best to structure these rater training programs (Feldman et al., [<reflink idref="bib10" id="ref5">10</reflink>]; Preusche et al., [<reflink idref="bib42" id="ref6">42</reflink>]).</p> <p>Traditional rater training programs have been designed to modify faculty behaviours with outcomes informed mainly by psychometric ideals (e.g., reliability, inter-rater reliability, accuracy). However, these ideals may now be poorly aligned with contemporary views informing WBA. For instance, assessments in medical education have been described as social exercises shaped by several potentially invisible but meaningful influences that are difficult to identify in advance (Kuper et al., [<reflink idref="bib31" id="ref7">31</reflink>]). Research exploring faculty cognition and behaviour during assessment activities has led to a broadening of our understanding of what can or should be modified in assessment contexts (Eva, [<reflink idref="bib9" id="ref8">9</reflink>]). Real world challenges, such as having to serve formative and summative goals with the same assessment event, navigating and balancing education and safety, and concerns about the impact of assessment on learners, are changing what it means to prepare faculty for assessment activities (Watling &amp; Ginsburg, [<reflink idref="bib70" id="ref9">70</reflink>]; Klassen et al., [<reflink idref="bib26" id="ref10">26</reflink>]; Ott et al., [<reflink idref="bib41" id="ref11">41</reflink>]). In the context of these broadening ways of understanding and enacting assessment, Tavares et al. ([<reflink idref="bib59" id="ref12">59</reflink>], [<reflink idref="bib60" id="ref13">60</reflink>]) advocated for a "compatibility principle," confirming alignment between elements of an assessment program to ensure underlying assumptions promote coherence, activities are logically connected, and claims of effectiveness or quality are defensible. When viewed in this way, compatibility threats may exist between contemporary assessment science and the rater training programs that have been dominant in medical education. That is, logics and underlying assumptions between some assessment practices as advocated (e.g., assessment as interpretivism) and rater training as it currently exists (e.g., enacting positivist/post-positivist ideals) are not compatible.</p> <p>We explore this issue by beginning with a brief historical review of the rater training literature to examine a context focused on error mitigation, attempts to promote accuracy and agreement, and other psychometric ideals. We then provide an analysis of the literature examining the effectiveness of rater training programs, focusing mainly on what has served to define effectiveness or improvements. Next, we draw on philosophical and conceptual shifts in assessment in medical education that suggest why rater training requires reimagining. We then propose considerations for a reimagined rater training agenda. Specifically recognizing the interpretivist and perspectival nature of assessment, we reframe rater training in workplace contexts as "assessor readiness" programs. We end with proposed research and practice recommendations for the medical education community to consider as it continues to navigate and negotiate the role of faculty in WBA. Our aim is to advance the discussion on "rater training" in medical education by challenging implicit incompatibility issues and stimulating ways to overcome them.</p> <hd id="AN0174029252-3">Rater training in medical education</hd> <p>The reliance on fallible humans to complete assessment tasks led researchers to conclude that considerable gains in outcomes could be made through rater training (Landy &amp; Farr, [<reflink idref="bib32" id="ref14">32</reflink>]). A proliferation of training strategies followed, including: (a) rater error training – teaching about avoiding common rating errors, such as leniency, halo, and central tendency; (b) performance dimension training – familiarizing raters with and calibrating definitions and descriptions of relevant assessment dimensions; (c) performance standards training, more commonly referred to as frame-of-reference training – providing raters with assessment standards and expectations along assessment dimensions with practice and feedback; (d) behavioral observation training – practice and feedback on detecting, recalling and classifying behaviors (Feldman et al., [<reflink idref="bib10" id="ref15">10</reflink>]; Smith, [<reflink idref="bib51" id="ref16">51</reflink>]; Woehr &amp; Huffcutt, [<reflink idref="bib72" id="ref17">72</reflink>]). Early reviews of rater training provided evidence in support of some practices. Woehr and Huffcutt ([<reflink idref="bib72" id="ref18">72</reflink>]) found that some gains could be realized in each of these rater training strategies, but concluded that frame-of-reference training led to the greatest improvements in rating accuracy. Despite the limitations of using experts to determine accuracy and using accuracy as an outcome, these studies demonstrated that raters could be trained on a specific theory of performance. Roch et al. ([<reflink idref="bib44" id="ref19">44</reflink>]) later confirmed the proliferation and effectiveness of frame-of-reference training with "accuracy" again as the criterion of choice. Based in part on these earlier reviews, frame-of-reference training was embedded into medical education contexts (Kogan et al., [<reflink idref="bib28" id="ref20">28</reflink>], [<reflink idref="bib29" id="ref21">29</reflink>], [<reflink idref="bib30" id="ref22">30</reflink>]; Newman et al., [<reflink idref="bib39" id="ref23">39</reflink>]).</p> <p>Drawing mainly from non-clinical contexts with similar assessment ideals, medical education researchers advocated for similar rater training strategies (Feldman et al., [<reflink idref="bib10" id="ref24">10</reflink>]). Early on, Spool ([<reflink idref="bib52" id="ref25">52</reflink>]) argued that, "...accuracy in observation can be improved by training observers to minimize rating errors". (pp. 866–867) However, almost immediately afterwards, evidence regarding limitations in effectiveness were emerging. Shortly after Spool's work, Newble et al. ([<reflink idref="bib38" id="ref26">38</reflink>]) introduced rater training in medical and surgical contexts but found no difference in reliability. When researchers used performance dimension and frame-of-reference training to improve faculty rating behaviors in an internal medicine context, they reported improved rater satisfaction and comfort, but no differences in ratings, as well as new unintended and problematic response sets (i.e., reduced range and greater stringency) (Holmboe et al., [<reflink idref="bib23" id="ref27">23</reflink>]). When additional strategies were used, including rater error and behavioral observation training, researchers in medical education found no gains in inter-rater reliability, mean ratings, accuracy, halo, or discrimination between candidates or dimensions (Cook et al., [<reflink idref="bib3" id="ref28">3</reflink>]; Eppich et al., [<reflink idref="bib8" id="ref29">8</reflink>]; Feldman et al., [<reflink idref="bib10" id="ref30">10</reflink>]; Halliday, [<reflink idref="bib21" id="ref31">21</reflink>]; Robertson et al., [<reflink idref="bib43" id="ref32">43</reflink>]; Vergis et al., [<reflink idref="bib69" id="ref33">69</reflink>]; Weitz et al., [<reflink idref="bib71" id="ref34">71</reflink>]). In examining rater training and identifying limited impact, Cook et al. ([<reflink idref="bib3" id="ref35">3</reflink>]) indicated that there may be faculty features limiting intended outcomes, such as imperviousness to rater training and difficulty resisting their own schemas.</p> <p>Despite this less than favorable evidence, some have reported positive outcomes associated with rater training, such as shared understanding of and agreement on constructs, as well as improved accuracy (Eppich et al., [<reflink idref="bib8" id="ref36">8</reflink>]; Kogan et al., [<reflink idref="bib29" id="ref37">29</reflink>]). This has led researchers to argue that, "Training assessors calibrates, enhances, improves, and ensures the validity and reliability of their judgments (Tekian &amp; Norcini, [<reflink idref="bib64" id="ref38">64</reflink>], p. 366)." As such, rater training continues to be advocated as a solution to assessment challenges and, when conceptualized as a psychometric solution, may be useful in some contexts (e.g., to support research outcomes, for linear procedural tasks, or for some high stakes summative assessments) (Feldman et al., [<reflink idref="bib10" id="ref39">10</reflink>]; Kogan et al., [<reflink idref="bib30" id="ref40">30</reflink>]; Robertson et al., [<reflink idref="bib43" id="ref41">43</reflink>]; Vergis et al., [<reflink idref="bib69" id="ref42">69</reflink>]). However, given how WBA science has evolved, traditional rater training strategies are now incomplete and unsatisfying.</p> <hd id="AN0174029252-4">Evolving assessment concepts are shifting what it means to prepare faculty for assessment act...</hd> <p>Underlying assumptions in rater training models are that faculty can be trained to engage in assessment activities in intended and predictable ways, that there is a latent and accessible set of rater behaviors to work toward, and that these are useful concepts. These assumptions represent a philosophical position about assessment, rater training, and faculty that may now present compatibility tensions. Today, assessment science is in a state of flux, with different views on what counts as good assessment and what functions faculty are responsible for (Tavares et al., [<reflink idref="bib60" id="ref43">60</reflink>]). Newer views recognize the complex, dynamic, personal, and social nature of WBA (Gingerich et al., [<reflink idref="bib12" id="ref44">12</reflink>]; Massie &amp; Ali, [<reflink idref="bib35" id="ref45">35</reflink>]; Tavares, Gofton, et al., [<reflink idref="bib61" id="ref46">61</reflink>]). They elaborate the need to leverage assessment for learning, position assessment using socio-cultural theories, and accept that faculty have natural tendencies that are difficult to modify (Schuwirth &amp; van der Vleuten, [<reflink idref="bib48" id="ref47">48</reflink>]; Govaerts &amp; van der Vleuten, [<reflink idref="bib17" id="ref48">17</reflink>]; Govaerts et al., [<reflink idref="bib19" id="ref49">19</reflink>]). Variation between faculty is viewed as not only unavoidable but also meaningful (ten Cate &amp; Regehr, [<reflink idref="bib65" id="ref50">65</reflink>]). Others have argued that we should broaden our views on assessment, and consider validity arguments in a manner that accounts for these different views and assumptions (Cook et al., [<reflink idref="bib6" id="ref51">6</reflink>]; Govaerts et al., [<reflink idref="bib18" id="ref52">18</reflink>]; Govaerts &amp; van der Vleuten, [<reflink idref="bib17" id="ref53">17</reflink>]; Shankar et al., [<reflink idref="bib50" id="ref54">50</reflink>]; St-Onge et al., [<reflink idref="bib53" id="ref55">53</reflink>]; Tavares, Pearce, et al., [<reflink idref="bib62" id="ref56">62</reflink>]).</p> <p>Preparing faculty for the complexity associated with WBA may mean reconsidering the logics that shape these readiness strategies. We consider four literatures that encourage us to think about rater training differently. First, we highlight examples of how assessment designs are changing the role of assessors and the competencies expected of them. Second, we explore the notion of bias in assessment to further highlight what is or is not modifiable. Third, we explore how assessment is a complex cognitive task that includes potentially unmodifiable features and that these tasks occur in a nuanced social context. Finally, we end with evolving views on the reprioritizing of validity evidence and resultant implications for contemporary "rater training" programs.</p> <hd id="AN0174029252-5">Shifting competencies for assessors</hd> <p>In the context of WBA, two relatively recent developments create the need for new assessor competencies. First is the renewed emphasis on assessment as a formative function (Watling &amp; Ginsburg, [<reflink idref="bib70" id="ref57">70</reflink>]). Formative assessments position the assessor's role and contribution as serving a kind of needs analysis of the learners. The goal is to identify opportunities for improvement and to create an environment that fosters a growth orientation. Observation and processing of behaviors still occur, but observers' intents are to translate and anchor those observations to provide information to learners and subsequently navigate conversations that are educational rather than achievement oriented. This approach requires specific competencies and actions of assessors, such as the structuring of data-informed feedback conversations and overcoming summative perceptions to foster formative conditions.</p> <p>A second related compatibility tension arises from recent calls to differentiate "retrospective" and "prospective" assessment decision-making (Schumacher et al., [<reflink idref="bib47" id="ref58">47</reflink>]). Retrospective assessments reflect observed performance and interactions with learners, while prospective assessments obligate faculty to "project ahead, consider the unknown, gauge the level of risk and determine what level of supervision this trainee is ready for in upcoming cases (ten Cate et al., [<reflink idref="bib66" id="ref59">66</reflink>], p. 1666)." Others have elaborated this concept by arguing that retrospective assessments, especially those that include responses such as, "I had to be there," are inherently personal for faculty (Tavares et al., [<reflink idref="bib61" id="ref60">61</reflink>], [<reflink idref="bib62" id="ref61">62</reflink>]). In these instances, using "I" involves more than simply asking faculty to report behaviors, rate them on a scale, summarize them as narratives, or even match behaviors to predefined performance expectations.</p> <p>Collectively, these evolving views and refinements in assessment are shifting desired assessor competencies, and therefore how we best prepare them to complete assessment tasks. For instance, issues such as "accuracy" or the degree of halo observed in reports become less relevant than being able to extract coherent formative narratives or to reflect on the interaction faculty have with learners.</p> <hd id="AN0174029252-6">Shifting views on "biases"</hd> <p>There are at least two broad ways in which bias exists in assessment. First, bias in assessment has historically been viewed as a problematic source of error to be mitigated (e.g., as unwanted variability or subjectivity). However, contemporary views treat rater bias differently: not as harmful, but as meaningful richness shaped by previous and unavoidable experiences, knowledge, values, and interests (Klein et al., [<reflink idref="bib27" id="ref62">27</reflink>]; ten Cate &amp; Regehr, [<reflink idref="bib65" id="ref63">65</reflink>]). These biases lead assessors to see and interpret things uniquely. For example, an assessor might possess knowledge of the clinical reasoning literature that allows them to note specific abilities (e.g., use of semantic qualifiers and illness scripts) and comment on those. With this, they can identify next steps for development based on their observation and interpretation of a trainee's current stage of clinical reasoning. In this way, notions of bias are replaced with meaningful and unavoidable subjectivity to be used and leveraged in assessment, not mitigated.</p> <p>While the leveraging and valuing of subjective interpretations are not seen as problematic biases at all, the same is not true for a second form of bias: that which leads to social injustices. Here, biases refer to unjust ways of seeing and thinking in assessment that disadvantage individuals based on characteristics such as race, ethnicity, gender identity, training background, physical ability, appearance, age, religion, sexual orientation, and several other factors. These biases are pervasive, deeply rooted, unconscious, difficult to overcome despite best intentions, and a source of inequity in assessment contexts (Teherani et al., [<reflink idref="bib63" id="ref64">63</reflink>]; Tannenbaum et al., [<reflink idref="bib56" id="ref65">56</reflink>]; Bullock et al., [<reflink idref="bib2" id="ref66">2</reflink>]; Lucey et al., [<reflink idref="bib34" id="ref67">34</reflink>]; McDade et al., [<reflink idref="bib36" id="ref68">36</reflink>]). It may not be clear to faculty when these forms of biases are influencing assessment tasks or how they can manage them to avoid negative influences. In medical education, attention to these issues have been increasing (Sukhera &amp; Watling, [<reflink idref="bib54" id="ref69">54</reflink>]; Sukhera et al., [<reflink idref="bib55" id="ref70">55</reflink>]; Gonzalez et al., [<reflink idref="bib15" id="ref71">15</reflink>]). However, inclusion in assessor readiness programs still lag. Implications of and solutions for these kinds of biases reach far beyond assessment contexts, but this does not diminish the need to attend to their relevance when preparing faculty for assessment activities.</p> <p>Implications for assessor readiness programs may need to free faculty to leverage their subjective histories, values and experiences for increased meaningfulness and context for learners. However, readiness also means revealing and attending to what may be unconscious and harmful, and preparing faculty for how to navigate those issues in an assessment context.</p> <hd id="AN0174029252-7">Assessment as a complex cognitive process, enacted in a social context</hd> <p>Research exploring assessors in their assessment tasks suggest that faculty are active in the assessment process, unavoidably idiosyncratic, and socially and contextually influenced in many ways that may not be amenable to or aligned with traditional rater training programs (Gingerich et al., [<reflink idref="bib13" id="ref72">13</reflink>]; Melvin et al., [<reflink idref="bib37" id="ref73">37</reflink>]; Forte et al., [<reflink idref="bib11" id="ref74">11</reflink>]). Assessment requires more than simply categorizing behaviors; rather, faculty leverage heuristics (and biases), are influenced by impressions, and often are asked to consider constructs and contexts that exceed human cognitive capabilities, leading to behaviors that may be unintended by assessment designers (Tavares et al., [<reflink idref="bib57" id="ref75">57</reflink>], [<reflink idref="bib58" id="ref76">58</reflink>]). For example, when assessment demands exceed cognitive capacity, assessors engage in selection (i.e., omission of some dimensions of performance or behaviors) and simplifying strategies (e.g., considering only positive or negative examples) (Tavares et al., [<reflink idref="bib57" id="ref77">57</reflink>]). Internal performance schemas are recognized now as powerful influences on what assessors attend to and what is determined to be meaningful or appropriate (Govaerts et al., [<reflink idref="bib19" id="ref78">19</reflink>]). In summarizing the evidence on rater cognition research, Eva ([<reflink idref="bib9" id="ref79">9</reflink>]) argued that assessments should be matched to cognition and rater tendencies, rather than cognition to assessment. A focus on just cognitive tasks and tendencies in assessment is also identified as insufficient and oversimplifying the issue (Govaerts, [<reflink idref="bib16" id="ref80">16</reflink>]). The role of social judgments (e.g., impression formation) is recognized as inherent, idiosyncratic and unavoidable (Gingerich et al., [<reflink idref="bib12" id="ref81">12</reflink>]). Any cognitive assessment processes therefore exist within a social context, the bounds and influence of which are difficult to control, identify, or even fully appreciate in advance.</p> <p>Collectively, this suggests different views may be needed on what is or is not modifiable, or even what should be modified, when considering how best to prepare faculty for WBA tasks. For example, an assessor readiness programs may educate faculty on how to be involved in assessment design, how to recognize and work within inherent limitations, and to respond to or leverage unique social conditions.</p> <hd id="AN0174029252-8">The reframing of what counts as validity</hd> <p>Our evolving conceptualizations of assessment have also changed the way we approach validity and validation. Validity is a foundational concept in assessment, representing a framework intended to promote and demonstrate the defensibility, trustworthiness, or suitability of an assessment program when making claims, inferences, or decisions based on assessment data. Validity evidence is collected to support decisions (interpretations and uses) in a way that demonstrates plausibility of any inferences inherent therein, and provides defensibility of the decisions. Contemporary views treat validity as an argument, bringing attention to what those arguments can be and how they can be supported (Cook et al., [<reflink idref="bib4" id="ref82">4</reflink>], [<reflink idref="bib5" id="ref83">5</reflink>], [<reflink idref="bib6" id="ref84">6</reflink>]; Kinnear et al., [<reflink idref="bib24" id="ref85">24</reflink>]). Quality, rigor and robustness remain important, but in this view, interpretations of what counts as defensible or trustworthy are inherently contextual and interpretative. "Arguments" have broadened, leaving room for different individuals to reach different conclusions about what validity claims mean, what is suitable, and what evidence is necessary to claim it.</p> <p>This again raises compatibility concerns between the aims of rater training, what purposes or outcomes it is structured to promote, and whether those align with broadening claims of how validity is being reprioritized. Traditional views on rater training models have been structured to promote validity by improving the rater's role in the process. Higher degrees of validity evidence were made possible and supported when raters were participating in intended ways (e.g., applying similar frames of reference, avoiding common "errors" such as restriction of range or poor differentiation). If we accept, for example, that assessments can be socially influenced, perspectival, interpretative and contextual, then psychometric performance becomes poorly aligned with broadening validity arguments. Instead, validity evidence related to how an assessor interacts with an assessment task becomes more important (Shankar et al., [<reflink idref="bib50" id="ref86">50</reflink>]). Rather than focusing on interrater reliability or desired rater tendencies, evidence that an assessor is reflexive and transparent with the contextual factors that inform ratings becomes increasingly aligned and supportive of contemporary assessment science, therefore changing or reshuffling what validity arguments are most appropriate.</p> <p>With advances in how validity can be demonstrated, evaluation outcomes for assessor readiness programs will shift as well. This may include for example, whether biases are sufficiently accounted for or acknowledged in how assessments are conducted, whether assessors are meaningfully accounting for social contexts, or whether the formative value of assessor performances are considered.</p> <hd id="AN0174029252-9">Discussion and future considerations</hd> <p>We have described how evolving concepts in assessment are shifting systems and the related efforts to prepare faculty for their tasks. New and broadening ways of thinking about assessment challenge long-held assumptions about core rater training ideals such as accuracy and error mitigation, objectivity, and the assumption that faculty cognitive behaviors can be shaped in intended ways. While being careful to avoid the complete dismissal of existing rater training strategies (i.e., rater training is consistent with aims to promote measurement invariance (Engelhard and Wind, [<reflink idref="bib7" id="ref87">7</reflink>])), there are opportunities to reimagine rater training as "assessor readiness" programs, including the underlying assumptions and conceptual frameworks that shape them.</p> <p>Recognizing advances in assessment and the perspectival and interpretative nature of assessment, assessor readiness programs might need to adopt different approaches. This can include reframing the role of subjectivity: implementing a focus on reflection and the need to attend to implicit and unconscious biases that create social injustices. Critical reflexivity and critical reflection, for example, may be novel curricular intents. Critical reflexivity is, "a process of recognizing one's own position in the world in order to better understand the limitations of one's own knowing and to better appreciate the social realities of others (Ng et al., [<reflink idref="bib40" id="ref88">40</reflink>], p. 1123)." Critical reflection involves attending to one's own assumptions and material manifestations of social assumptions, and may be equally beneficial (Ng et al., [<reflink idref="bib40" id="ref89">40</reflink>]). Other content areas might link directly to contemporary conceptual frameworks or research influencing assessment tasks. These can include: (<reflink idref="bib1" id="ref90">1</reflink>) fostering "<emph>learning conversations"</emph> that ensure intentional and structured conversational choices are selected based on a host of contextual issues including learner and faculty knowledge and experience; (<reflink idref="bib2" id="ref91">2</reflink>) supporting positive learner climates by focusing on the act of assessment on personal, social and organizational determinants and by attending specifically to the potentially oppressive nature of clinical education work (i.e., navigating contexts that may fracture relationships that are crucial for feedback); (<reflink idref="bib3" id="ref92">3</reflink>) attending to several social influences such as whether assessors see themselves as agents of educational institutions or maintaining social accountabilities; (<reflink idref="bib4" id="ref93">4</reflink>) permitting failure safely in clinical contexts; (<reflink idref="bib5" id="ref94">5</reflink>) navigating retrospective and prospective assessment goals; (<reflink idref="bib6" id="ref95">6</reflink>) and understanding learner perceptions motivations, and behaviors (e.g., "gaming" the system, front and backstage performances, avoidant behaviors, impression management, etc.) (Gruppen et al., [<reflink idref="bib20" id="ref96">20</reflink>]; Klasen &amp; Lingard, [<reflink idref="bib25" id="ref97">25</reflink>]; (Tavares et al., [<reflink idref="bib59" id="ref98">59</reflink>], [<reflink idref="bib60" id="ref99">60</reflink>], [<reflink idref="bib61" id="ref100">61</reflink>], [<reflink idref="bib62" id="ref101">62</reflink>]). Performance dimension and frame-of-reference content take on new underlying philosophical assumptions: not ones shaped by consistency or accuracy, but rather by being reflective on what those influences mean and how they can and should be considered in practice to serve different assessment goals. Finally, assessor readiness programs might include strategies on how best to support assessors in feeding insights back into assessment programs (beyond learner reports) given their unique context, position and experiences.</p> <p>Our suggestion to consider "assessor readiness" as a guiding conceptual model for WBAs, introduces opportunities to promote fairness. However, it also raises new validity obligations and questions about how to best evaluate the effectiveness of these programs. From the perspective or rater-invariant measurement, threats to fairness include systematic variance in assessments within subgroups that result in inappropriate interpretations. (Engelhard and Wind, [<reflink idref="bib7" id="ref102">7</reflink>]). With assessor readiness programs, fairness will need to take on a different meaning, ontology, epistemology and axiology where objectivity, stability or consistency is de-emphasized for expert human judgment, shared subjectivity and adaptation to individual characteristics and contexts (Schuwirth &amp; van der Vleuten, [<reflink idref="bib49" id="ref103">49</reflink>]; Valentine et al., [<reflink idref="bib67" id="ref104">67</reflink>], [<reflink idref="bib68" id="ref105">68</reflink>]). Because fairness is a validity issue, new validity arguments will need to be formed. Similarly, claims of effectiveness of new assessor readiness programs will need to consider different outcomes. Promoting more stability in scores, and aiming for measurement invariance are incompatible concepts. Instead, researchers may need to explore issues of fairness outlined by Valentine ([<reflink idref="bib67" id="ref106">67</reflink>], [<reflink idref="bib68" id="ref107">68</reflink>]) or changes in learner behaviors, educational gains, indicators of quality for individuals and systems, or defensibility achieved through diversity and triangulation.</p> <p>Viewed from the logic of a "compatibility principle", we argue that a re-imagining of faculty development for assessment activities is needed. We suggest that "rater training" remain as a moniker reserved for activities with strong psychometric aims, and that we now consider an expansion to include broadened "assessor readiness" programs that reflect and flexibly respond to advances in assessment science and practice, as well as to the needs and personal experiences of faculty (Sachdeva, [<reflink idref="bib45" id="ref108">45</reflink>]; Sargeant et al., [<reflink idref="bib46" id="ref109">46</reflink>]). By maintaining compatibility between the science and the curricular offerings, we may better prepare teachers to participate in assessment activities that reflect contemporary values and goals.</p> <hd id="AN0174029252-10">Conclusion</hd> <p>There is a growing and diverse body of assessment science that suggests the need to think differently about how to prepare assessors for assessment tasks in medical education. In the context of WBA, "rater training" may need to be reserved for psychometric ideals (e.g., promoting inter-rater reliability) but augmented with "assessor readiness" programs that can integrate different advances in and goals of assessment science and practice, newer underlying philosophical positions or ideals, and the everchanging and broadening of assessor roles and quality expectations.</p> <hd id="AN0174029252-11">Acknowledgements</hd> <p>None.</p> <hd id="AN0174029252-12">Author Contribution</hd> <p>All authors are responsible for the content of this manuscript, and we have not excluded any qualified authors in the production of the manuscript. WT and MF provided the initial conceptualization. WT prepared the initial draft of the manuscript. All authors contributed to subsequent versions and the final manuscript.</p> <hd id="AN0174029252-13">Funding</hd> <p>None.</p> <hd id="AN0174029252-14">Declarations</hd> <p></p> <hd id="AN0174029252-15">Competing interests</hd> <p>The authors report no competing interests.</p> <hd id="AN0174029252-16">Publisher's Note</hd> <p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p> <ref id="AN0174029252-17"> <title> References </title> <blist> <bibl id="bib1" idref="ref2" type="bt">1</bibl> <bibtext> Bittner RH. Developing an industrial merit rating procedure. Personnel Psychology. 1948; 1; 4: 403-432. 10.1111/j.1744-6570.1948.tb01319.x</bibtext> </blist> <blist> <bibl id="bib2" idref="ref66" type="bt">2</bibl> <bibtext> Bullock JL, Lai CJ, Lockspeiser T, O'Sullivan PS, Aronowitz P, Dellmore D, Fung CC, Knight C, Hauer KE. In pursuit of honors: A multi-institutional study of students' perceptions of clerkship evaluation and grading. Academic Medicine. 2019; 94; 11S: S48-S56. 10.1097/acm.0000000000002905</bibtext> </blist> <blist> <bibl id="bib3" idref="ref28" type="bt">3</bibl> <bibtext> Cook DA, Dupras DM, Beckman TJ, Thomas KG, Pankratz VS. Effect of rater training on reliability and accuracy of mini-CEX scores: A randomized, controlled trial. Journal of General Internal Medicine. 2009; 24; 1: 74-79. 10.1007/s11606-008-0842-3</bibtext> </blist> <blist> <bibl id="bib4" idref="ref82" type="bt">4</bibl> <bibtext> Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education. 2014; 19; 2: 233-250. 10.1007/s10459-013-9458-4</bibtext> </blist> <blist> <bibl id="bib5" idref="ref83" type="bt">5</bibl> <bibtext> Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: A practical guide to Kane's framework. Medical Education. 2015; 49; 6: 560-575. 10.1111/medu.12678</bibtext> </blist> <blist> <bibl id="bib6" idref="ref51" type="bt">6</bibl> <bibtext> Cook DA, Kuper A, Hatala R, Ginsburg SJAM. When assessment data are words: Validity evidence for qualitative educational assessments. Academic Medicine. 2016; 91; 10: 1359-1369. 10.1097/acm.0000000000001175</bibtext> </blist> <blist> <bibl id="bib7" idref="ref87" type="bt">7</bibl> <bibtext> Engelhard, G, &amp; Wind, S. A. (2019). Invariant measurement with raters and rating scales. Rasch models for rater-mediated assessments.</bibtext> </blist> <blist> <bibl id="bib8" idref="ref29" type="bt">8</bibl> <bibtext> Eppich W, Nannicelli AP, Seivert NP, Sohn MW, Rozenfeld R, Woods DM, Holl JL. A rater training protocol to assess team performance. Journal of Continuing Education in the Health Professions. 2015; 35; 2: 83-90. 10.1002/chp.21270</bibtext> </blist> <blist> <bibl id="bib9" idref="ref8" type="bt">9</bibl> <bibtext> Eva KW. Cognitive influences on complex performance assessment: Lessons from the interplay between medicine and psychology. Journal of Applied Research in Memory and Cognition. 2018; 7; 2: 177-188. 10.1016/j.jarmac.2018.03.008</bibtext> </blist> <blist> <bibtext> Feldman M, Lazzara EH, Vanderbilt AA, DiazGranados D. Rater training to support high-stakes simulation‐based assessments. Journal of Continuing Education in the Health Professions. 2012; 32; 4: 279-286. 10.1002/chp.21156</bibtext> </blist> <blist> <bibtext> Forte M, Morson N, Mirchandani N, Grundland B, Fernando O, Rubenstein W. How teachers adapt their cognitive strategies when using entrustment scales. Academic Medicine. 2021; 96; 11S: S87-S92. 10.1097/acm.0000000000004287</bibtext> </blist> <blist> <bibtext> Gingerich A, Regehr G, Eva KW. Rater-based assessments as social judgments: Rethinking the etiology of rater errors. Academic Medicine. 2011; 86; 10: S1-S7. 10.1097/acm.0b013e31822a6cf8</bibtext> </blist> <blist> <bibtext> Gingerich A, Kogan J, Yeates P, Govaerts M, Holmboe E. Seeing the 'black box' differently: Assessor cognition from three research perspectives. Medical Education. 2014; 48; 11: 1055-1068. 10.1111/medu.12546</bibtext> </blist> <blist> <bibtext> Gomes MM, Driman D, Park YS, Wood TJ, Yudkowsky R, Dudek NL. Teaching and assessing intra-operative consultations in competency-based medical education: Development of a workplace-based assessment instrument. Virchows Archiv. 2021; 479; 4: 803-813. 10.1007/s00428-021-03113-6</bibtext> </blist> <blist> <bibtext> Gonzalez CM, Lypson ML, Sukhera J. Twelve tips for teaching implicit bias recognition and management. Medical Teacher. 2021; 43; 12: 1368-1373. 10.1080/0142159x.2021.1879378</bibtext> </blist> <blist> <bibtext> Govaerts MJB. Competence in assessment: Beyond cognition. Medical Education. 2016; 50; 5: 502-504. 10.1111/medu.13000</bibtext> </blist> <blist> <bibtext> Govaerts MJB, van der Vleuten CPM. Validity in work-based assessment: Expanding our horizons. Medical Education. 2013; 47; 12: 1164-1174. 10.1111/medu.12289</bibtext> </blist> <blist> <bibtext> Govaerts MJB, van der Vleuten CPM, Schuwirth LWT, Muijtjens AMM. Broadening perspectives on clinical performance assessment: Rethinking the nature of in-training assessment. Advances in Health Sciences Education. 2007; 12; 2: 239-260. 10.1007/s10459-006-9043-1</bibtext> </blist> <blist> <bibtext> Govaerts MJB, Van de Wiel MWJ, Schuwirth LWT, Van der Vleuten CPM, Muijtjens AMM. Workplace-based assessment: Raters' performance theories and constructs. Advances in Health Sciences Education. 2013; 18; 3: 375-396. 10.1007/s10459-012-9376-x</bibtext> </blist> <blist> <bibtext> Gruppen LD, Irby DM, Durning SJ, Maggio LA. Conceptualizing learning environments in the health professions. Academic Medicine. 2019; 94; 7: 969-974. 10.1097/acm.0000000000002702</bibtext> </blist> <blist> <bibtext> Halliday, D. A. (2022). Examining the effects of a rater training program on interrater reliability with the Lasater Clinical Judgement Rubric. (Publication No. 29321479). [Doctoral Dissertation, Widener University]. ProQuest Dissertations Publishing.</bibtext> </blist> <blist> <bibtext> Holmboe ES. Faculty and the observation of trainees' clinical skills: Problems and opportunities. Academic Medicine. 2004; 79; 1: 16-22. 10.1097/00001888-200401000-00006</bibtext> </blist> <blist> <bibtext> Holmboe ES, Hawkins R, Huot SJ. Effects of training in direct observation of medical residents' clinical competence. Annals of Internal Medicine. 2004; 140; 11: 874-881. 10.7326/0003-4819-140-11-200406010-00008</bibtext> </blist> <blist> <bibtext> Kinnear B, Schumacher DJ, Driessen EW, Varpio L. How argumentation theory can inform assessment validity: A critical review. Medical Education. 2022; 56; 11: 1064-1075. 10.1111/medu.14882</bibtext> </blist> <blist> <bibtext> Klasen JM, Lingard LA. Allowing failure for educational purposes in postgraduate clinical training: A narrative review. Medical Teacher. 2019; 41; 11: 1263-1269. 10.1080/0142159x.2019.1630728</bibtext> </blist> <blist> <bibtext> Klasen JM, Driessen E, Teunissen PW, Lingard LA. Whatever you cut, I can fix it': Clinical supervisors' interview accounts of allowing trainee failure while guarding patient safety. BMJ Quality &amp; Safety. 2020; 29; 9: 727-734. 10.1136/bmjqs-2019-009808</bibtext> </blist> <blist> <bibtext> Klein R, Ufere NN, Rao SR, Koch J, Volerman A, Snyder ED, Schaeffer S, Thompson V, Warner AS, Julian KA, Kalamara A. Association of gender with learner assessment in graduate medical education. JAMA Network Open. 2020; 3; 7: e2010888. 10.1001/jamanetworkopen.2020.10888</bibtext> </blist> <blist> <bibtext> Kogan JR, Conforti LN, Bernabeo E, Iobst W, Holmboe E. How faculty members experience workplace-based assessment rater training: A qualitative study. Medical Education. 2015; 49; 7: 692-708. 10.1111/medu.12733</bibtext> </blist> <blist> <bibtext> Kogan JR, Conforti LN, Yamazaki K, Iobst W, Holmboe ES. Commitment to change and challenges to implementing changes after workplace-based assessment rater training. Academic Medicine. 2017; 92; 3: 394-402. 10.1097/acm.0000000000001319</bibtext> </blist> <blist> <bibtext> Kogan, J. R, Dine, C. J, Conforti, L. N, &amp; Holmboe, E. S. (2022). Can rater training improve the quality and accuracy of workplace-based assessment narrative comments and entrustment ratings? A randomized controlled trial. Academic Medicine, 101097. https://doi.org/10.1097/acm.0000000000004819</bibtext> </blist> <blist> <bibtext> Kuper A, Reeves S, Albert M, Hodges BD. Assessment: Do we need to broaden our methodological horizons?. Medical Education. 2007; 41; 12: 1121-1123. 10.1111/j.1365-2923.2007.02945.x</bibtext> </blist> <blist> <bibtext> Landy FJ, Farr JL. Performance rating. Psychological Bulletin. 1980; 87; 1: 72-107. 10.1037/0033-2909.87.1.72</bibtext> </blist> <blist> <bibtext> Lockyer J, Carraccio C, Chan MK, Hart D, Smee S, Touchie C, Holmboe ES, Frank JR. Core principles of assessment in competency-based medical education. Medical Teacher. 2017; 39; 6: 609-616. 10.1080/0142159x.2017.1315082</bibtext> </blist> <blist> <bibtext> Lucey CR, Hauer KE, Boatright D, Fernandez A. Medical education's wicked problem: Achieving equity in assessment for medical learners. Academic Medicine. 2020; 95; 12S: S98-S108. 10.1097/acm.0000000000003717</bibtext> </blist> <blist> <bibtext> Massie J, Ali JM. Workplace-based assessment: A review of user perceptions and strategies to address the identified shortcomings. Advances in Health Sciences Education. 2016; 21; 2: 455-473. 10.1007/s10459-015-9614-0</bibtext> </blist> <blist> <bibtext> McDade W, Vela MB, Sánchez JP. Anticipating the impact of the USMLE Step 1 pass/fail scoring decision on underrepresented-in-medicine students. Academic Medicine. 2020; 95; 9: 1318-1321. 10.1097/acm.0000000000003490</bibtext> </blist> <blist> <bibtext> Melvin L, Rassos J, Stroud L, Ginsburg S. Tensions in assessment: The realities of entrustment in internal medicine. Academic Medicine. 2019; 95; 4: 609-615. 10.1097/acm.0000000000002991</bibtext> </blist> <blist> <bibtext> Newble DI, Hoare J, Sheldrake PF. The selection and training of examiners for clinical examinations. Medical Education. 1980; 14; 5: 345-349. 10.1111/j.1365-2923.1980.tb02379.x</bibtext> </blist> <blist> <bibtext> Newman LR, Brodsky D, Jones RN, Schwartzstein RM, Atkins KM, Roberts DH. Frame-of-reference training: Establishing reliable assessment of teaching effectiveness. Journal of Continuing Education in the Health Professions. 2016; 36; 3: 206-210. 10.1097/ceh.0000000000000086</bibtext> </blist> <blist> <bibtext> Ng SL, Wright SR, Kuper A. The divergence and convergence of critical reflection and critical reflexivity: Implications for health professions education. Academic Medicine. 2019; 94; 8: 1122-1128. 10.1097/acm.0000000000002724</bibtext> </blist> <blist> <bibtext> Ott MC, Pack R, Cristancho S, Chin M, Van Koughnett JA, Ott M. The most crushing thing": Understanding resident assessment burden in a competency-based curriculum. Journal of Graduate Medical Education. 2022; 14; 5: 583-592. 10.4300/jgme-d-22-00050.1</bibtext> </blist> <blist> <bibtext> Preusche I, Schmidts M, Wagner-menghin M. Twelve tips for designing and implementing a structured rater training in OSCEs. Medical Teacher. 2012; 34; 5: 368-372. 10.3109/0142159x.2012.652705</bibtext> </blist> <blist> <bibtext> Robertson RL, Park J, Gillman L, Vergis A. The impact of rater training on the psychometric properties of standardized surgical skill assessment tools. The American Journal of Surgery. 2020; 220; 3: 610-615. 10.1016/j.amjsurg.2020.01.019</bibtext> </blist> <blist> <bibtext> Roch SG, Woehr DJ, Mishra V, Kieszczynska U. Rater training revisited: An updated meta-analytic review of frame‐of‐reference training. Journal of Occupational and Organizational Psychology. 2012; 85; 2: 370-395. 10.1111/j.2044-8325.2011.02045.x</bibtext> </blist> <blist> <bibtext> Sachdeva AK. Continuing professional development in the twenty-first century. Journal of Continuing Education in the Health Professions. 2016; 36: S8-S13. 10.1097/ceh.0000000000000107</bibtext> </blist> <blist> <bibtext> Sargeant J, Wong BM, Campbell CM. CPD of the future: A partnership between quality improvement and competency-based education. Medical Education. 2018; 52; 1: 125-135. 10.1111/medu.13407</bibtext> </blist> <blist> <bibtext> Schumacher DJ, Cate O, Damodaran A, Richardson D, Hamstra SJ, Ross S, Hodgson J, Touchie C, Molgaard L, Gofton W, Carraccio C. Clarifying essential terminology in entrustment. Medical Teacher. 2021; 43; 7: 737-744. 10.1080/0142159x.2021.1924365</bibtext> </blist> <blist> <bibtext> Schuwirth LWT, van der Vleuten CPM. Programmatic assessment: From assessment of learning to assessment for learning. Medical Teacher. 2011; 33; 6: 478-485. 10.3109/0142159x.2011.565828</bibtext> </blist> <blist> <bibtext> Schuwirth LW, van der Vleuten CP. A history of assessment in medical education. Advances in Health Sciences Education. 2020; 25; 5: 1045-1056. 10.1007/s10459-020-10003-0</bibtext> </blist> <blist> <bibtext> Shankar S, St-Onge C, Young ME. When I say... response process validity evidence. Medical Education. 2022; 56; 9: 878-880. 10.1111/medu.14853</bibtext> </blist> <blist> <bibtext> Smith DE. Training programs for performance appraisal: A review. Academy of Management. 1986; 11; 1: 22-40. 10.2307/258329</bibtext> </blist> <blist> <bibtext> Spool MD. Training programs for observers of behavior: A review. Personnel Psychology. 1978; 31; 4: 853-888. 10.1111/j.1744-6570.1978.tb02128.x</bibtext> </blist> <blist> <bibtext> St-Onge C, Young M, Eva KW, Hodges B. Validity: One word with a plurality of meanings. Advances in Health Sciences Education. 2017; 22; 4: 853-867. 10.1007/s10459-016-9716-3</bibtext> </blist> <blist> <bibtext> Sukhera J, Watling C. A framework for integrating implicit bias recognition into health professions education. Academic Medicine. 2018; 93; 1: 35-40. 10.1097/acm.0000000000001819</bibtext> </blist> <blist> <bibtext> Sukhera J, Watling CJ, Gonzalez CM. Implicit bias in health professions: From recognition to transformation. Academic Medicine. 2020; 95; 5: 717-723. 10.1097/acm.0000000000003173</bibtext> </blist> <blist> <bibtext> Tannenbaum ER, Tavares W, Kuper A. Performance is in the eye of the beholder. Medical Education. 2019; 53; 8: 759-762. 10.1111/medu.13873</bibtext> </blist> <blist> <bibtext> Tavares W, Ginsburg S, Eva KW. Selecting and simplifying: Rater behavior when considering multiple competencies. Teaching and Learning in Medicine. 2016; 28; 1: 41-51. 10.1080/10401334.2015.1107489</bibtext> </blist> <blist> <bibtext> Tavares W, Sadowski A, Eva KW. Asking for less and getting more: The impact of broadening a rater's focus in formative assessment. Academic Medicine. 2018; 93; 10: 1584-1590. 10.1097/acm.0000000000002294</bibtext> </blist> <blist> <bibtext> Tavares W, Eppich W, Cheng A, Miller S, Teunissen PW, Watling CJ, Sargeant J. Learning conversations: An analysis of the theoretical roots and their manifestations of feedback and debriefing in medical education. Academic Medicine. 2020; 95; 7: 1020-1025. 10.1097/acm.0000000000002932</bibtext> </blist> <blist> <bibtext> Tavares W, Kuper A, Kulasegaram K, Whitehead C. The compatibility principle: On philosophies in the assessment of clinical competence. Advances in Health Sciences Education. 2020; 25; 4: 1003-1018. 10.1007/s10459-019-09939-9.</bibtext> </blist> <blist> <bibtext> Tavares W, Gofton W, Bhanji F, Dudek N. Reframing the O-SCORE as a retrospective supervision scale using validity theory. Journal of Graduate Medical Education. 2022; 14; 1: 22-24. 10.4300/jgme-d-21-00592.1</bibtext> </blist> <blist> <bibtext> Tavares W, Pearce J, Eva KWBrown MEL, Veen M, Finn GM. Tracing philosophical shifts in health professions assessment. Applied Philosophy for Health Professions Education. 2022: Singapore; Springer: 67-84. 10.1007/978-981-19-1512-3_6</bibtext> </blist> <blist> <bibtext> Teherani A, Hauer KE, Fernandez A, King TE, Lucey C. How small differences in assessed clinical performance amplify to large differences in grades and awards: A cascade with serious consequences for students underrepresented in medicine. Academic Medicine. 2018; 93; 9: 1286-1292. 10.1097/acm.0000000000002323</bibtext> </blist> <blist> <bibtext> Tekian A, Norcini JJWimmers P, Mentkowski M. Faculty development in assessment: What the faculty need to know and do. Assessing competence in Professional Performance across Disciplines and Professions. 2016: Cham; Springer: 355-374. 10.1007/978-3-319-30064-1_16</bibtext> </blist> <blist> <bibtext> ten Cate O, Regehr G. The power of subjectivity in the assessment of medical trainees. Academic Medicine. 2019; 94; 3: 333-337. 10.1097/acm.0000000000002495</bibtext> </blist> <blist> <bibtext> ten Cate O, Schwartz A, Chen HC. Assessing trainees and making entrustment decisions: On the nature and use of entrustment-supervision scales. Academic Medicine. 2020; 95; 11: 1662-1669. 10.1097/acm.0000000000003427</bibtext> </blist> <blist> <bibtext> Valentine N, Durning S, Shanahan EM, Schuwirth L. Fairness in human judgement in assessment: A hermeneutic literature review and conceptual framework. Advances in Health Sciences Education. 2021; 26: 713-738. 10.1007/s10459-020-10002-1</bibtext> </blist> <blist> <bibtext> Valentine N, Durning SJ, Shanahan EM, van der Vleuten C, Schuwirth L. The pursuit of fairness in assessment: Looking beyond the objective. Medical Teacher. 2022; 44; 4: 353-359. 10.1080/0142159X.2022.2031943</bibtext> </blist> <blist> <bibtext> Vergis A, Leung C, Roberston R. Rater training in medical education: A scoping review. Cureus. 2020; 12; 11: e11613. 10.7759/cureus.11363</bibtext> </blist> <blist> <bibtext> Watling CJ, Ginsburg S. Assessment, feedback and the alchemy of learning. Medical Education. 2019; 53; 1: 76-85. 10.1111/medu.13645</bibtext> </blist> <blist> <bibtext> Weitz, G, Vinzentius, C, Twesten, C, Lehnert, H, Bonnemeier, H, &amp; König, I. R. (2014). Effects of a rater training on rating accuracy in a physical examination skills assessment. GMS Zeitschrift für Medizinische Ausbildung, 31(4), https://doi.org/10.3205/zma000933</bibtext> </blist> <blist> <bibtext> Woehr DJ, Huffcutt AI. Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology. 1994; 67; 3: 189-205. 10.1111/j.2044-8325.1994.tb00562.x</bibtext> </blist> </ref> <aug> <p>By Walter Tavares; Benjamin Kinnear; Daniel J. Schumacher and Milena Forte</p> <p>Reported by Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib22" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib33" firstref="ref3"></nolink> <nolink nlid="nl3" bibid="bib14" firstref="ref4"></nolink> <nolink nlid="nl4" bibid="bib10" firstref="ref5"></nolink> <nolink nlid="nl5" bibid="bib42" firstref="ref6"></nolink> <nolink nlid="nl6" bibid="bib31" firstref="ref7"></nolink> <nolink nlid="nl7" bibid="bib70" firstref="ref9"></nolink> <nolink nlid="nl8" bibid="bib26" firstref="ref10"></nolink> <nolink nlid="nl9" bibid="bib41" firstref="ref11"></nolink> <nolink nlid="nl10" bibid="bib59" firstref="ref12"></nolink> <nolink nlid="nl11" bibid="bib60" firstref="ref13"></nolink> <nolink nlid="nl12" bibid="bib32" firstref="ref14"></nolink> <nolink nlid="nl13" bibid="bib51" firstref="ref16"></nolink> <nolink nlid="nl14" bibid="bib72" firstref="ref17"></nolink> <nolink nlid="nl15" bibid="bib44" firstref="ref19"></nolink> <nolink nlid="nl16" bibid="bib28" firstref="ref20"></nolink> <nolink nlid="nl17" bibid="bib29" firstref="ref21"></nolink> <nolink nlid="nl18" bibid="bib30" firstref="ref22"></nolink> <nolink nlid="nl19" bibid="bib39" firstref="ref23"></nolink> <nolink nlid="nl20" bibid="bib52" firstref="ref25"></nolink> <nolink nlid="nl21" bibid="bib38" firstref="ref26"></nolink> <nolink nlid="nl22" bibid="bib23" firstref="ref27"></nolink> <nolink nlid="nl23" bibid="bib21" firstref="ref31"></nolink> <nolink nlid="nl24" bibid="bib43" firstref="ref32"></nolink> <nolink nlid="nl25" bibid="bib69" firstref="ref33"></nolink> <nolink nlid="nl26" bibid="bib71" firstref="ref34"></nolink> <nolink nlid="nl27" bibid="bib64" firstref="ref38"></nolink> <nolink nlid="nl28" bibid="bib12" firstref="ref44"></nolink> <nolink nlid="nl29" bibid="bib35" firstref="ref45"></nolink> <nolink nlid="nl30" bibid="bib61" firstref="ref46"></nolink> <nolink nlid="nl31" bibid="bib48" firstref="ref47"></nolink> <nolink nlid="nl32" bibid="bib17" firstref="ref48"></nolink> <nolink nlid="nl33" bibid="bib19" firstref="ref49"></nolink> <nolink nlid="nl34" bibid="bib65" firstref="ref50"></nolink> <nolink nlid="nl35" bibid="bib18" firstref="ref52"></nolink> <nolink nlid="nl36" bibid="bib50" firstref="ref54"></nolink> <nolink nlid="nl37" bibid="bib53" firstref="ref55"></nolink> <nolink nlid="nl38" bibid="bib62" firstref="ref56"></nolink> <nolink nlid="nl39" bibid="bib47" firstref="ref58"></nolink> <nolink nlid="nl40" bibid="bib66" firstref="ref59"></nolink> <nolink nlid="nl41" bibid="bib27" firstref="ref62"></nolink> <nolink nlid="nl42" bibid="bib63" firstref="ref64"></nolink> <nolink nlid="nl43" bibid="bib56" firstref="ref65"></nolink> <nolink nlid="nl44" bibid="bib34" firstref="ref67"></nolink> <nolink nlid="nl45" bibid="bib36" firstref="ref68"></nolink> <nolink nlid="nl46" bibid="bib54" firstref="ref69"></nolink> <nolink nlid="nl47" bibid="bib55" firstref="ref70"></nolink> <nolink nlid="nl48" bibid="bib15" firstref="ref71"></nolink> <nolink nlid="nl49" bibid="bib13" firstref="ref72"></nolink> <nolink nlid="nl50" bibid="bib37" firstref="ref73"></nolink> <nolink nlid="nl51" bibid="bib11" firstref="ref74"></nolink> <nolink nlid="nl52" bibid="bib57" firstref="ref75"></nolink> <nolink nlid="nl53" bibid="bib58" firstref="ref76"></nolink> <nolink nlid="nl54" bibid="bib16" firstref="ref80"></nolink> <nolink nlid="nl55" bibid="bib24" firstref="ref85"></nolink> <nolink nlid="nl56" bibid="bib40" firstref="ref88"></nolink> <nolink nlid="nl57" bibid="bib20" firstref="ref96"></nolink> <nolink nlid="nl58" bibid="bib25" firstref="ref97"></nolink> <nolink nlid="nl59" bibid="bib49" firstref="ref103"></nolink> <nolink nlid="nl60" bibid="bib67" firstref="ref104"></nolink> <nolink nlid="nl61" bibid="bib68" firstref="ref105"></nolink> <nolink nlid="nl62" bibid="bib45" firstref="ref108"></nolink> <nolink nlid="nl63" bibid="bib46" firstref="ref109"></nolink>
Header	DbId: eric DbLabel: ERIC An: EJ1403089 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: 'Rater Training' Re-Imagined for Work-Based Assessment in Medical Education – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Tavares%2C+Walter%22">Tavares, Walter</searchLink><br /><searchLink fieldCode="AR" term="%22Kinnear%2C+Benjamin%22">Kinnear, Benjamin</searchLink><br /><searchLink fieldCode="AR" term="%22Schumacher%2C+Daniel+J%2E%22">Schumacher, Daniel J.</searchLink><br /><searchLink fieldCode="AR" term="%22Forte%2C+Milena%22">Forte, Milena</searchLink> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Advances+in+Health+Sciences+Education%22"><i>Advances in Health Sciences Education</i></searchLink>. 2023 28(5):1697-1709. – Name: Avail Label: Availability Group: Avail Data: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/ – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 13 – Name: DatePubCY Label: Publication Date Group: Date Data: 2023 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Evaluative – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Medical+Education%22">Medical Education</searchLink><br /><searchLink fieldCode="DE" term="%22Interrater+Reliability%22">Interrater Reliability</searchLink><br /><searchLink fieldCode="DE" term="%22Evaluation+Methods%22">Evaluation Methods</searchLink><br /><searchLink fieldCode="DE" term="%22Training%22">Training</searchLink><br /><searchLink fieldCode="DE" term="%22Psychometrics%22">Psychometrics</searchLink><br /><searchLink fieldCode="DE" term="%22Validity%22">Validity</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1007/s10459-023-10237-8 – Name: ISSN Label: ISSN Group: ISSN Data: 1382-4996<br />1573-1677 – Name: Abstract Label: Abstract Group: Ab Data: In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused on modifying faculty behaviours to achieve psychometric ideals (e.g., reliability, inter-rater reliability, accuracy). The authors argue these ideals may now be poorly aligned with contemporary research informing work-based assessment, introducing a compatibility threat, with no clear direction on how to proceed. To address this issue, the authors provide a brief historical review of "rater training" and provide an analysis of the literature examining the effectiveness of rater training programs. They focus mainly on what has served to define effectiveness or improvements. They then draw on philosophical and conceptual shifts in assessment to demonstrate why the function, effectiveness aims, and structure of rater training requires reimagining. These include shifting competencies for assessors, viewing assessment as a complex cognitive task enacted in a social context, evolving views on biases, and reprioritizing which validity evidence should be most sought in medical education. The authors aim to advance the discussion on rater training by challenging implicit incompatibility issues and stimulating ways to overcome them. They propose that "rater training" (a moniker they suggest be reserved for strong psychometric aims) be augmented with "assessor readiness" programs that link to contemporary assessment science and enact the principle of compatibility between that science and ways of engaging with advances in real-world faculty-learner contexts. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2023 – Name: AN Label: Accession Number Group: ID Data: EJ1403089
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1403089
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1007/s10459-023-10237-8 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 13 StartPage: 1697 Subjects: – SubjectFull: Medical Education Type: general – SubjectFull: Interrater Reliability Type: general – SubjectFull: Evaluation Methods Type: general – SubjectFull: Training Type: general – SubjectFull: Psychometrics Type: general – SubjectFull: Validity Type: general Titles: – TitleFull: 'Rater Training' Re-Imagined for Work-Based Assessment in Medical Education Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Tavares, Walter – PersonEntity: Name: NameFull: Kinnear, Benjamin – PersonEntity: Name: NameFull: Schumacher, Daniel J. – PersonEntity: Name: NameFull: Forte, Milena IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Type: published Y: 2023 Identifiers: – Type: issn-print Value: 1382-4996 – Type: issn-electronic Value: 1573-1677 Numbering: – Type: volume Value: 28 – Type: issue Value: 5 Titles: – TitleFull: Advances in Health Sciences Education Type: main
ResultId	1