View in EDS HTML Full Text PDF Full Text

Metacognitive Errors in the Classroom: The Role of Variability of Past Performance on Exam Prediction Accuracy

Saved in:

Bibliographic Details
Title:	Metacognitive Errors in the Classroom: The Role of Variability of Past Performance on Exam Prediction Accuracy
Language:	English
Authors:	Geraci, Lisa (ORCID 0000-0001-9302-2871), Kurpad, Nayantara, Tirso, Robert, Gray, Kathryn N., Wang, Yan
Source:	Metacognition and Learning. Apr 2023 18(1):219-236.
Availability:	Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed:	Y
Page Count:	18
Publication Date:	2023
Document Type:	Journal Articles Reports - Research
Descriptors:	Prediction, Tests, Scores, Low Achievement, Accuracy, Hypothesis Testing, Correlation, Metacognition, Learning Processes
DOI:	10.1007/s11409-022-09326-7
ISSN:	1556-1623 1556-1631
Abstract:	Students often make incorrect predictions about their exam performance, with the lowest-performing students showing the greatest inaccuracies in their predictions. The reasons why low-performing students make inaccurate predictions are not fully understood. In two studies, we tested the hypothesis that low-performing students erroneously predict their exam performance in part because their past performance varies considerably, yielding unreliable data from which to make their predictions. In contrast, high-performing students tend to have consistently high past performance that they can rely on to make relatively accurate predictions of future test performance. Results showed that across different exams (Study 1) and different courses (Study 2), low-performing students had more variable past performance than high-performing students. Further, results from Study 2 showed that variability in past course performance (but not past exam performance) was associated with poor calibration. Results suggest that variability in past performance may be one factor that contributes to low-performing students' erroneous performance predictions.
Abstractor:	As Provided
Entry Date:	2023
Accession Number:	EJ1371112
Database:	ERIC
Full text is not displayed to guests. Login for full access.

FullText	Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwE_TIvUzs2h6YLp5n0hdrDyAAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDEkFL1ZYhXh22U5dLAIBEICBmqTM6D-tHdYh19YvzNAxchQh1kzzLVfs4y8IFiHi7dQ-EmxpCLIVKniBEook5zYAqYPXFJpSpo5iKiqf1NIS85gWmgR1psdqV3zVivnfkL8L5TQFsTTYqnr4YYvkb4d4SnFfDildk2fgGAYBaL5U27ST-gqqHJfYzq8PeYGov1WSbGekeoLH1UXXfFBJhQUsNC0vsQUAfwLJioo= Text: Availability: 1 Value: <anid>AN0162469109;[3d0h]01apr.23;2023Mar20.02:45;v2.2.500</anid> <title id="AN0162469109-1">Metacognitive errors in the classroom: The role of variability of past performance on exam prediction accuracy </title> <p>Students often make incorrect predictions about their exam performance, with the lowest-performing students showing the greatest inaccuracies in their predictions. The reasons why low-performing students make inaccurate predictions are not fully understood. In two studies, we tested the hypothesis that low-performing students erroneously predict their exam performance in part because their past performance varies considerably, yielding unreliable data from which to make their predictions. In contrast, high-performing students tend to have consistently high past performance that they can rely on to make relatively accurate predictions of future test performance. Results showed that across different exams (Study 1) and different courses (Study 2), low-performing students had more variable past performance than high-performing students. Further, results from Study 2 showed that variability in past course performance (but not past exam performance) was associated with poor calibration. Results suggest that variability in past performance may be one factor that contributes to low-performing students' erroneous performance predictions.</p> <p>Keywords: Variability; Metacognition; Low performers; High performers; Predictions</p> <p>Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</p> <p>Research shows that students often make inaccurate evaluations of their knowledge (Foster et al., [<reflink idref="bib10" id="ref1">10</reflink>]; Hacker et al., [<reflink idref="bib11" id="ref2">11</reflink>]). When this happens, they are said to have poor metacognitive monitoring, which is defined as the ability to accurately assess one's knowledge. The general pattern of results shows that the lowest performers on the task are generally overconfident in their performance compared to the highest performers on the task, who are sometimes underconfident in their performance (Al-Harthy et al., [<reflink idref="bib1" id="ref3">1</reflink>]; de Bruin et al., [<reflink idref="bib3" id="ref4">3</reflink>]; Dunning et al., [<reflink idref="bib9" id="ref5">9</reflink>]; Foster et al., [<reflink idref="bib10" id="ref6">10</reflink>]; Hacker et al., [<reflink idref="bib12" id="ref7">12</reflink>], [<reflink idref="bib11" id="ref8">11</reflink>]; Händel &amp; Fritzsche, [<reflink idref="bib14" id="ref9">14</reflink>]; Miller &amp; Geraci, [<reflink idref="bib22" id="ref10">22</reflink>]; Saenz et al., [<reflink idref="bib26" id="ref11">26</reflink>]; Serra &amp; DeMarree, [<reflink idref="bib28" id="ref12">28</reflink>]; Tirso &amp; Geraci, [<reflink idref="bib31" id="ref13">31</reflink>]; Tirso et al., [<reflink idref="bib32" id="ref14">32</reflink>]). In terms of magnitude of these errors, the lowest performers are more poorly calibrated than the highest performers, as there is a larger difference between low performers' performance predictions and their actual performance than there is between high performers' performance predictions and their performance.</p> <p>Calibration errors can be problematic, especially for individuals who overestimate their knowledge and abilities. Overestimations of ability may lead people to engage in challenges that they are not prepared to tackle or to perform tasks, even dangerous ones, that they cannot accomplish successfully. The potential problems of overconfidence are perhaps best highlighted by studies showing that people overestimate their gun safety knowledge (Stark &amp; Sachau, [<reflink idref="bib29" id="ref15">29</reflink>]). But, overconfidence can lead to other serious (though less dangerous) problems, especially for students. In educational settings, being able to accurately monitor one's current knowledge allows for an efficient learning process, such that students can use the information from their monitoring to inform their study methods and select poorly-understood topics for further study (Dunlosky et al., [<reflink idref="bib6" id="ref16">6</reflink>]; Dunlosky &amp; Ariel, [<reflink idref="bib5" id="ref17">5</reflink>]; Dunlosky &amp; Rawson, [<reflink idref="bib7" id="ref18">7</reflink>]; Thiede, [<reflink idref="bib30" id="ref19">30</reflink>]). This process breaks down for students who have poor calibration. For low-performing students, they err on the side of predicting that they know more than they do. The problem with erring on the side of overconfidence (rather than underconfidence) is that students may decide that additional preparation is unnecessary, leading them to perform poorly on exams and other assignments.</p> <p>Because of the potential problems with inaccurate metacognitive monitoring, there's been a great deal of research aimed at understanding the source of these metacognitive errors. For example, one theory suggests that metacognitive errors arise in response to poor learning processes especially in the low performers, whereby lack of knowledge of the material also yields lack of awareness (e.g., Kruger &amp; Dunning, [<reflink idref="bib19" id="ref20">19</reflink>]). There are many reasons for this lack of knowledge and accompanying lack of awareness, including the possibility that students do not always engage in effective study strategies, such as self-testing, which would yield accurate knowledge (Karpicke &amp; Roediger, [<reflink idref="bib16" id="ref21">16</reflink>]). There is also evidence that low-performing students are not particularly confident in their predictions (Miller &amp; Geraci, [<reflink idref="bib22" id="ref22">22</reflink>]) and that they rely on other factors when making predictions, including the desire to perform well, which leads them to make high performance predictions (Saenz et al., [<reflink idref="bib26" id="ref23">26</reflink>], [<reflink idref="bib27" id="ref24">27</reflink>]; Serra &amp; DeMarree, [<reflink idref="bib28" id="ref25">28</reflink>]). According to the information-motivation theory (Tirso &amp; Geraci, [<reflink idref="bib31" id="ref26">31</reflink>]), low performers err in the direction of overconfidence when they lack useful information because they opt to rely on motivational information to aid their predictions of future performance. This theory is consistent with prior research showing that people's predictions about the future are often based on their aspirations rather than their past behaviors (e.g., Helzer &amp; Dunning, [<reflink idref="bib13" id="ref27">13</reflink>]).</p> <p>The current studies were designed to examine another possible contributor to poor calibration: variability in past performance. We hypothesized that low-performing students erroneously predict their exam performance in part because they rely on a history of variable past performance when making predictions. In other words, the lowest-performing students may have obtained some high grades and some low grades on past exams. Reliance on this type of noisy data may present low-performing students with a difficult signal extraction problem whereby they may be unsure which of their prior grades is most indicative of their true level of ability, opting to focus on their higher grades. In contrast, high-performing students may have more consistently high past performance than low-performing students. Thus, high-performing students would have more consistent and reliable information upon which to base predictions of future test performance.</p> <p>There is some research that is consistent with this hypothesis. We know that, in general, the association between any two constructs can be attenuated by increased measurement error in either or both constructs (e.g., Crocker &amp; Algina, [<reflink idref="bib4" id="ref28">4</reflink>]; Worthen et al., [<reflink idref="bib35" id="ref29">35</reflink>]). That is, when observed variables (e.g., attention and academic performance) have random measurement errors, their relationship to each other is weaker than the relationship between the true scores of these variables. So, students with less consistent academic records can be thought of as having more measurement error than those with consistent records, whereby the reduced consistency in prior performance would attenuate any association between their true ability and their metacognitive judgments. Also, if we conceptualize prior performance as a cue that students use to varying degrees when making metacognitive judgments (e.g., Koriat, [<reflink idref="bib18" id="ref30">18</reflink>]), then that cue will be less reliable (and thus have less utility) for students with noisier histories.</p> <p>We tested the hypotheses that low-performing students have more variable past performance than high-performing students and that this increased variability is associated with worse calibration. We focused on global predictions using a measure of absolute accuracy (how close a student's overall exam score prediction is to the exam score itself), rather than on relative accuracy of individual test items, for example, which is often measured using gamma to correlate judgments with target performance (see Dunlosky &amp; Thiede, [<reflink idref="bib8" id="ref31">8</reflink>] for review). We chose to focus on global predictions of exam performance because we think these are common assessments that students naturally make, the accuracy of which can have important consequences for their study behaviors and assessments of the course or their major. In Study 1, we examined performance data from several undergraduate psychology courses to determine whether low-performing students' exam grades were more variable than high-performing students' exam grades. In Study 2, we examined whether variability in academic performance across students' classroom tests and across their undergraduate careers predicted calibration on tests within a single course.</p> <hd id="AN0162469109-2">Study 1</hd> <p>To examine whether low-performing students showed greater variability in test performance than high-performing students, we examined variability in exam performance using data from several introductory and intermediate-level psychology courses that each included multiple exams.</p> <hd id="AN0162469109-3">Method</hd> <p></p> <hd id="AN0162469109-4">Participants</hd> <p>Six hundred and forty-nine undergraduate students' deidentified class exam data from a large southwestern university were analyzed. The data were originally collected across multiple semesters from Introduction to Psychology and Cognitive Psychology courses, all taught by the same instructor. Because the policy for these classes was to allow students to drop their lowest exam score, many students opted not to take one of the four exams. After excluding these participants' data, the final dataset consisted of 368 students with data for four exams. We chose to examine data from students with scores on all four exams (rather than just three exams) to obtain a more reliable measure of variability.</p> <hd id="AN0162469109-5">Materials and procedure</hd> <p>Data used in the current analyses were gathered from Introduction to Psychology and Cognitive Psychology courses offered across eight semesters (6 classes from fall semesters and 2 classes from spring semesters). In each course, students took four exams as part of the normal course requirements. The format of the exams was multiple-choice, and each exam consisted of 33 questions. The final exam in all courses was cumulative.</p> <p>Students' exam scores were compiled and de-identified. To examine whether variability in performance was greater for low-performing students compared to high-performing students, we sorted students into low- and high-performing groups based on their performance on a cumulative exam score (the total of all exam scores). This score was then used to divide the students into four quartiles, such that high performers' average grades were above the 75<sups>th</sups> percentile and low performers' average grades were in the 25<sups>th</sups> percentile and below. This process resulted in 93 low-performing students and 93 high-performing students.</p> <hd id="AN0162469109-6">Results</hd> <p>Variability was assessed using the standard deviation of each student's exam scores (see Table 1 for students' average exam scores by performance quartile). Based on examination of standard deviations, the results were consistent with our hypothesis that low performers exhibited more variability in their exam performance compared to high performers, <emph>t</emph>(138.28) = 13.20, <emph>d</emph> = 1.935, <emph>p</emph> &lt; 0.001, 95% CI [6.8, 9.19] (Levene's test for homogeneity of variance was violated, therefore, the corrected degrees of freedom[<reflink idref="bib1" id="ref32">1</reflink>] are reported). One might worry that sorting students into high and low performers may ensure that variability will be lower in high performers due to a ceiling effect. Thus, it may be helpful to examine variability in performance for students performing in the upper middle-range, as they would be expected to be less variable than the low performers, but this pattern of reduced variability would not be attributable to a ceiling effect. Therefore, we also compared mean variability for high performers (above the 75<sups>th</sups> percentile), upper middle performers (between the 50<sups>th</sups> and 75<sups>th</sups> percentiles), and low performers (below the 25<sups>th</sups> percentile); see Table 2. Results showed that the high and upper middle performers had significantly lower mean variability than low performers (all <emph>t</emph>'s &gt; 5.16, <emph>p</emph>'s &lt; 0.05), which supports our hypothesis that performance and variability are negatively related.</p> <p>Table 1 Mean exam performance (out of 100%) across four exams by performance group in Study 1</p> <p> <ephtml> &lt;table frame="hsides" rules="groups"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" /&gt;&lt;th align="left"&gt;&lt;p&gt;Exam 1&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Exam 2&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Exam 3&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Exam 4&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Low performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;72.14 (10.31)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;51.78 (6.32)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;43.23 (6.88)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;46.99 (5.88)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Lower middle performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;77.78 (13.22)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;61.08 (6.68)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;54.73 (7.61)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;57.37 (7.38)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Upper middle performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;81.76 (12.98)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;75.50 (12.65)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;71.96 (13.56)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;75.11 (13.16)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;High performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;90.58 (6.44)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;92.83 (5.36)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;88.71 (6.97)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;93.09 (5.52)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Standard deviations are in parentheses</p> <p>Table 2 Mean variability in test scores by performance group in Study 1</p> <p> <ephtml> &lt;table frame="hsides" rules="groups"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" /&gt;&lt;th align="left"&gt;&lt;p&gt;Variability&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Low performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;13.87 (5.20)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Lower middle performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;13.72 (4.81)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Upper middle performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;9.93 (5.16)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;High performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;5.88 (2.70)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Standard deviations are in parentheses</p> <p>Though our goal was to use courses with the best opportunity for observing performance variability—those with four exams—we note that the results are consistent using the larger sample, which included courses with just three exams in addition to those with four exams: Low performers had more variability in their exam performance compared to high performers (average variability of low performers = 19.06; average variability of high performers = 10.91; <emph>t</emph>(<reflink idref="bib322" id="ref33">322</reflink>) = 77.991, <emph>d</emph> = 8.413, <emph>p</emph> &lt; 0.001, 95% CI [9.097, 7.727]).</p> <hd id="AN0162469109-7">Study 2</hd> <p>Results from Study 1 supported our hypothesis that, relative to high-performing students, low-performing students exhibit greater variability in their exam grades across the semester. The goal of Study 2 was to determine whether variability in past performance is associated with prediction accuracy. We examined two types of variability: variability in past exam performance within a single course, similar to Study 1, and variability in past academic performance as measured by past course grades, across an entire university academic record. Students were recruited from six different classes over the course of a single semester. Just before taking each of their regular exams, students were asked to indicate how well they thought they would perform on the exam. These grade predictions were compared to actual performance on the tests. Variability of past performance was determined both by students' performance across the exams in the class and by students' past course performance across semesters based on their overall college record. We predicted that greater variability in students' academic records would be associated with poorer metacognition.</p> <hd id="AN0162469109-8">Method</hd> <p></p> <hd id="AN0162469109-9">Participants</hd> <p>Data were collected from a new set of six different psychology classes at a large public southwestern university. These classes included Psychology of Women (Class 1), Introduction to Statistics (Class 2 and Class 3), Research Methods (Class 4), Psychology of Women of Color (Class 5), and Cognitive Neuroscience (Class 6). All six courses used a traditional in-person lecture-based format (all data collection occurred before the COVID-19 pandemic), and students in all six classes were offered extra credit in their course for participation in the study. Out of a total of 482 students, 440 consented to participate in the study. Of these 440 students, 436 (99.1%) consented to release their exam grades to the study authors for inclusion in the analyses, and 395 (89.8%) consented to provide an electronic copy of their unofficial university transcript so that variability of past performance could be determined. Of these 395 students, only 76 (19.2%) provided a copy of their unofficial transcripts. Thus, Study 2 analyses measuring variability in performance across students' academic careers were based on data provided by these 76 students. In contrast, all analyses that measured variability in performance across exams within a single course were based on data from 404 students—the number of students who had both exam and prediction data. Demographic data were not collected to keep the questionnaires administered to students prior to testing as short as possible.</p> <hd id="AN0162469109-10">Materials and procedure</hd> <p>All materials and procedures were compliant with the Family Educational Rights and Privacy Act (FERPA) and approved by the IRB. The researchers visited the six classes to recruit participants. Students were told that, should they choose to participate in the study, they would be asked to complete a brief questionnaire immediately prior to each of their course exams in exchange for extra credit in their course and that there would be no penalties if they ceased participation at any time. Students were also told that the data they provided by participating in the study would be kept by the research team instead of their instructor. Thus, their instructor would not know if they participated in the study. Next, consent forms were distributed, and students had the opportunity to opt in to release their exam grades and/or transcripts to the authors for inclusion in analyses; if students opted to release their transcripts, they were asked to provide an email address so that they could be contacted later to share a copy of their unofficial transcript. For all courses, this recruitment process took place within students' classrooms before one of their regularly scheduled lectures.</p> <p>After the initial recruitment session, research assistants visited each class on exam days to remind students of the study and administer the test prediction questionnaire.[<reflink idref="bib2" id="ref34">2</reflink>] This questionnaire asked students to predict the numerical grade they would receive on the exam they were about to take. This questionnaire also asked students to predict what they thought the average, highest, and lowest grades on the exam would be, but these data were not used. All predictions were made on a 0–100% scale.</p> <p>Classes varied in the number of exams, from three to five exams, and the dates on which their exams took place. Out of the six classes in Study 2, two classes (Class 1 and Class 5) had five exams, two classes had four exams (Class 2 and Class 3), one class had four exams with the fourth one being optional (Class 6), and the remaining class had three exams (Class 4). None of the exams were cumulative. See Table 3 for descriptive data.</p> <p>Table 3 Mean exam performance and grade predictions for Study 2</p> <p> <ephtml> &lt;table frame="hsides" rules="groups"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" rowspan="2"&gt;&lt;p&gt;Class&lt;/p&gt;&lt;/th&gt;&lt;th align="left" rowspan="2"&gt;&lt;p&gt;&lt;italic&gt;N&lt;/italic&gt;&lt;/p&gt;&lt;/th&gt;&lt;th align="left" colspan="2"&gt;&lt;p&gt;Exam 1&lt;/p&gt;&lt;/th&gt;&lt;th align="left" colspan="2"&gt;&lt;p&gt;Exam 2&lt;/p&gt;&lt;/th&gt;&lt;th align="left" colspan="2"&gt;&lt;p&gt;Exam 3&lt;/p&gt;&lt;/th&gt;&lt;th align="left" colspan="2"&gt;&lt;p&gt;Exam 4&lt;/p&gt;&lt;/th&gt;&lt;th align="left" colspan="2"&gt;&lt;p&gt;Exam 5&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left"&gt;&lt;p&gt;Predicted&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Actual&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Predicted&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Actual&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Predicted&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Actual&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Predicted&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Actual&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Predicted&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Actual&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Psychology of Women (1)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;104&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74.13 (9.89)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;80.80 (7.81)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;71.47 (8.17)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;79.65 (7.76)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.06 (8.10)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;79.65 (7.88)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;81.46 (7.09)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;82.43 (7.83)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;77.69 (8.02)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Statistics (2)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;66&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;80.52 (7.51)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74.83 (12.59)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;75.22 (9.98)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;72.41 (12.69)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;76.19 (78.41)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74.39 (10.28)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Statistics (3)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;80.25 (7.55)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;75.99 (12.19)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;76.46 (11.14)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;72.41 (11.74)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.41 (9.62)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;76.59 (11.89)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Research Methods (4)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;65&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;84.97 (6.68)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;79.80 (7.10)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;80.54 (16.43)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;81.66 (11.27)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;82.15 (13.29)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;86.14 (5.88)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Psychology of Women of Color (5)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;32&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;81.81 (8.98)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;83.61 (8.28)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;71.08 (12.94)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.22 (15.23)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;79.91 (8.00)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;80.88 (7.52)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;76.03 (10.31)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;79.16 (10.47)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;81.41 (8.89)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Cognitive Neuroscience (6)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;63&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;77.86 (12.54)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;81.27 (20.69)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74.48 (16.77)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74.98 (18.76)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74.86 (14.36)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74.16 (16.50)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;74.52 (15.61)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;77.02 (17.94)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Standard deviations are in parentheses</p> <p>Once the semester was over and final grades were posted, students who consented to be contacted for a copy of their transcript were sent an email that reminded them of the study and asked them to complete a 2-min Qualtrics survey that allowed them to securely upload a copy of their transcript. The university at which Study 2 took place allows students to download free electronic copies of their unofficial transcript. Before uploading their transcripts, the students were informed that all identifying information from their transcripts would be redacted and assigned a code known only to the third author. Students were then given step-by-step instructions for downloading a copy of their transcript and uploading the file to the survey, after which the survey ended. Six reminder emails were sent to participants over the course of three months to boost response rates. Out of the 395 students that opted to provide a copy of their transcript, 365 students provided useable email addresses. Ultimately, 91 surveys were started, and 76 transcripts were successfully uploaded.</p> <hd id="AN0162469109-11">Analytical plan</hd> <p></p> <hd id="AN0162469109-12">Variability in exam performance as measured within a single course</hd> <p>The analyses were based on data from 404 students—the number of participating students who had both exam and prediction data available—with differences in degrees of freedom being due to missing data. We began by testing whether low- and high-performing students differed in variability of prior exam grades before testing whether this variability moderated the relationship between grade predictions and actual grades. Low- and high-performing students were defined as students whose average grade fell within the first and third tertiles of the performance distribution. In Study 2 we sorted students into high and low performers based on a tertile split instead of a quartile split used in Study 1 because data were available from fewer participants since only 76 students provided transcripts. Similar to Study 1, variability of past performance was determined by calculating the standard deviation of students' exam grades leading up to their final exam; thus, variability was based on performance from 2–3 prior exams. <emph>t</emph>-tests were first conducted to compare variability in past exam performance between low-, middle-, and high-performing students.</p> <p>Next, multilevel modeling[<reflink idref="bib3" id="ref35">3</reflink>] was used to investigate how variability in performance interacted with numerical grade predictions to predict actual exam grades. Multilevel modeling was used because the data were characterized by a multilevel structure in which students were nested within different classes—a 2-level structure. The outcome variable was students' actual exam grades, which were on the scale of 0–100. Final exam prediction, prior exam variability, and prior exam performance were level 1 predictors, and course average exam grade (grand mean centered) was a level 2 predictor. Final exam prediction was on the scale of 0–100 and centered around their respective class means so that zero represents the average prediction for a given final exam. Thus, positive values represented grade predictions that were higher than the average predictions, whereas negative values represented grade predictions that were lower than the average prediction. Prior exam variability was transformed using a tertile split (0 = lowest variability, 1 = moderate variability, 2 = highest variability). Prior exam average was centered around the class means so that zero represents the average prior exam performance within a given class. Course average exam grade was conceptualized as a measure of course difficulty and calculated as the average of all exam grades across all students for each course.[<reflink idref="bib4" id="ref36">4</reflink>] Adding course difficulty to the model allowed us to determine and account for whether differing exam grades across courses influenced the relationship between variability and prediction accuracy, as it was conceivable that differences in the relative difficulty of the classes used in this study may have affected students' grade prediction accuracy.</p> <p>The model-building process was an iterative one, beginning with the empty, null model (Model 1.0) to serve as a baseline and calculating the intraclass correlation coefficients (ICC). The ICC indicates the percentage of variance in final exam grades accounted for by the clustering variables themselves. We then added additional predictors with each subsequent model and compared each model's overall fit to the fit of the preceding model using a likelihood-ratio test. Random intercept was included in all multilevel models to allow for the variation in final exam grades across classes. All multilevel models were run in R version 3.6.3 (R Core Team, [<reflink idref="bib25" id="ref37">25</reflink>]) using RStudio version 1.2.5042 (RStudio Team, [<reflink idref="bib25" id="ref38">25</reflink>]), using the packages lme4 (Bates et al., [<reflink idref="bib2" id="ref39">2</reflink>]), lmerTest (Kuznetsova et al., [<reflink idref="bib20" id="ref40">20</reflink>]), merTools (Knowles &amp; Frederick, [<reflink idref="bib17" id="ref41">17</reflink>]), jtools (Long, [<reflink idref="bib21" id="ref42">21</reflink>]), reghelper (Hughes, [<reflink idref="bib15" id="ref43">15</reflink>]), and ggplot2 (Wickham, [<reflink idref="bib33" id="ref44">33</reflink>]). All models also used restricted maximum likelihood estimation and <emph>p</emph> &lt; 0.05 as the threshold for statistical significance. All <emph>p</emph> values were calculated using Satterthwaite's formula for degrees of freedom.</p> <hd id="AN0162469109-13">Variability in performance as measured across one's college career</hd> <p>We also tested whether variability in performance across one's academic career was associated with poor prediction accuracy. We started by testing whether low-, middle-, and high-performing students differed in the variability present in their GPAs before testing whether this variability moderated the relationship between grade predictions and actual grades. We used a multilevel modeling approach to address this question, but this time we used the variability in students' grades across their college careers (GPA variability) as the measure of variability in performance instead of variability in their exam scores. We investigated whether GPA variability interacted with the relationship between grade predictions and actual performance across all exams. The models also accounted for the effects of course difficulty and the relationship between GPA and GPA variability (GPA variability decreases as GPA approaches ceiling). The analyses that follow were based on data from the 76 students for which transcript data were available.</p> <p>Because we were interested in predictions across all exams (i.e., a longitudinal design), the structure of the models used differed from the 2-level design used in Model 1. Instead, the analyses that follow all made use of a 3-level design in which exams were nested within students, and students were nested within classes. One noteworthy advantage of using multilevel modeling with a longitudinal design was that it allowed us to make use of all available observations. Therefore, the following models were based on 246 observations (exam-prediction data points) from a sample of 76 students.</p> <p>The modeling process began with the empty model (Model 2.0), with additional predictors being added with each subsequent model. The grades students received on each exam served as the dependent variable. Grade predictions were group mean centered based on the average prediction of all of the participants within each class for each exam so that a prediction of zero represented the average prediction on that exam. GPA variability was calculated as the standard deviation of each student's GPA based on their transcript and transformed using a tertile split (0 = low variability, 1 = moderate variability, 2 = high variability). Course difficulty was the average exam grade for a given course as in Model 1, but overall GPA was substituted for prior exam performance to account for the fact that variability in GPA will decrease as GPA approaches ceiling. Grade predictions were entered at level 1, overall GPA and GPA variability were entered at level 2, and course difficulty was entered at level 3.</p> <hd id="AN0162469109-14">Results</hd> <p></p> <hd id="AN0162469109-15">Variability in exam performance as measured within a single course</hd> <p>A pair of Welch's <emph>t</emph>-tests indicated that low-performing students (<emph>N</emph> = 137, <emph>M</emph> = 7.97, <emph>SD</emph> = 5.30) had numerically, but not significantly, more variable past performance than middle- (<emph>N</emph> = 132, <emph>M</emph> = 6.99, <emph>SD</emph> = 5.32, <emph>t</emph>(266.58) = 1.51, <emph>p</emph> = 0.133, <emph>d</emph><subs><emph>s</emph></subs> = 0.18) and high-performing students (<emph>N</emph> = 135, <emph>M</emph> = 6.88, <emph>SD</emph> = 5.66, <emph>t</emph>(268.31) = 1.63, <emph>p</emph> = 0.103, <emph>d</emph><subs><emph>s</emph></subs> = 0.20; see Tables 3 and 4 for descriptive statistics). Full results of multilevel models can be found in Table 5. In the final model, Model 1.2, there were significant main effects of prior exam performance (<emph>b</emph> = 0.62, <emph>SE</emph> = 0.04, <emph>t</emph>(364.82) = 15.12, <emph>p</emph> &lt; 0.001), course difficulty (<emph>b</emph> = 1.37, <emph>SE</emph> = 0.32, <emph>t</emph>(5.85) = 4.32, <emph>p</emph> = 0.005), and grade predictions on the final exam (<emph>b</emph> = 0.19, <emph>SE</emph> = 0.08, <emph>t</emph>(365.19) = 2.54, <emph>p</emph> = 0.011). The hypothesized interaction between predictions and variability was absent—there was no difference in the strength of the association between predictions and grades when comparing the low variability group to the moderate (<emph>b</emph> = 0.04, <emph>SE</emph> = 0.12, <emph>t</emph>(366.39) = 0.34, <emph>p</emph> = 0.732) or high (<emph>b</emph> = -0.03, <emph>SE</emph> = 0.10, <emph>t</emph>(365.86) = -0.31, <emph>p</emph> = 0.759) variability groups. Interestingly, this interaction was notably closer to the <emph>p</emph> &lt; 0.05 cutoff prior to adding prior exam performance to the model (Model 1.1b; <emph>p</emph>s = 0.053). This finding suggests that, although it might appear that prediction resolution decreased as grade variability increased, this potential relationship is confounded by students' prior grades because variability decreases as grades approach ceiling.</p> <p>Table 4 Mean variability in test scores and academic records by performance group in Study 2</p> <p> <ephtml> &lt;table frame="hsides" rules="groups"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" /&gt;&lt;th align="left"&gt;&lt;p&gt;&lt;italic&gt;N&lt;/italic&gt;&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Prior Exam Variability&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Academic Record (GPA) Variability&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;High Performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;135&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;6.88 (5.66)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.36 (0.28)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Middle Performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;132&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;6.99 (5.32)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.70 (0.28)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Low Performers&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;137&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;7.97 (5.30)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.80 (0.36)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Standard deviations are in parentheses</p> <p>Table 5 Full results from Model 1 for Study 2</p> <p> <ephtml> &lt;table frame="hsides" rules="groups"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" colspan="6"&gt;&lt;p&gt;Model 1 Parameter Estimates&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left"&gt;&lt;p&gt;Parameters&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 1.0&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 1.1a*&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 1.1b&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 1.1c&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 1.2&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Intercept&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.24 (1.68)*&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.24 (1.68)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.56 (1.84)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.17 (1.85)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.39 (1.07)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Final Exam Prediction&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.29 (0.05)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.48 (0.10)*&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.20 (0.08)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.15 (0.08) ~ &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Prior Exam Variability (low vs. moderate)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.80 (1.35)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.03 (1.06)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.10 (1.07)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Prior Exam Variability (low vs. high)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.68 (1.35)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;1.00 (1.09)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.71 (1.11)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Final Exam Prediction x Prior Exam Variability (low vs. moderate)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.26 (0.13) ~ &lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.02 (0.11)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.04 (0.12)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Final Exam Prediction x Prior Exam Variability (low vs. high)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.25 (0.13) ~ &lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.08 (0.10)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.03 (0.10)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Prior Exam Average&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.63 (0.04)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.64 (0.04)**&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Prior Exam Average x Final Exam Prediction&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.01 (0.01)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.02 (0.01)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Prior Exam Average x Final Exam Prediction x Prior Exam Variability (low vs. moderate)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.01 (0.01)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.01 (0.01) ~ &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Prior Exam Average x Final Exam Prediction x Prior Exam Variability (low vs. high)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.01 (0.01)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.01 (0.01)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Course Average&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;1.37 (0.31)*&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Course Average x Final Exam Prediction&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.04 (0.03)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Course Average x Final Exam Prediction x Prior Exam Variability (low vs. moderate)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.02 (0.05)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Course Average x Final Exam Prediction x Prior Exam Variability (low vs. high)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.04 (0.04)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Standard errors are in parentheses. Asterisks indicate the significance level of effects and improvement in overall model fit compared to the preceding model ~ <emph>p</emph> &lt;.10, <emph>p</emph> &lt;.05, <emph>p</emph> &lt;.01, <emph>p</emph> &lt;.001</p> <hd id="AN0162469109-16">Variability in performance as measured across one's college career</hd> <p>A pair of Welch's <emph>t</emph>-tests indicated that low performers (<emph>M</emph> = 0.80, <emph>SD</emph> = 0.36) possessed more variable past academic records than high performers (<emph>M</emph> = 0.36, <emph>SD</emph> = 0.28, <emph>t</emph>(31.57) = 4.85, <emph>p</emph> &lt; 0.001, <emph>d</emph><subs><emph>s</emph></subs> = 1.41) but did not differ significantly from middle performers (<emph>M</emph> = 0.70, <emph>SD</emph> = 0.28, <emph>t</emph>(32.68) = 1.00, <emph>p</emph> = 0.325, <emph>d</emph><subs><emph>s</emph></subs> = 0.30). The multilevel modeling process began with the empty model (Model 2.0, ICC ρ<subs><emph>students</emph></subs> = 0.37, ρ<subs><emph>classes</emph></subs> = 0.12) and full model results are reported in Table 6. In the final model (Model 2.3), there were significant main effects of grade prediction (<emph>b</emph> = 0.75, <emph>SE</emph> = 0.34, <emph>t</emph>(157.39) = 2.21, <emph>p</emph> = 0.028), overall GPA (<emph>b</emph> = 8.07, <emph>SE</emph> = 2.55, <emph>t</emph>(71.61) = 3.16, <emph>p</emph> = 0.002), and course difficulty (<emph>b</emph> = 1.10, <emph>SE</emph> = 0.35, <emph>t</emph>(7.92) = 3.17, <emph>p</emph> = 0.013). Additionally, and most importantly, there was a significant interaction between grade predictions and GPA variability, showing that the association between predictions and performance was stronger for the lowest variability group than for the highest variability group (<emph>b</emph> = -0.80, <emph>SE</emph> = 0.39, <emph>t</emph>(182.92) = -2.08, <emph>p</emph> = 0.039; see Fig. 1).</p> <p>Table 6 Full results from Model 2 for Study 2</p> <p> <ephtml> &lt;table frame="hsides" rules="groups"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" colspan="6"&gt;&lt;p&gt;Model 2 Parameter Estimates&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left"&gt;&lt;p&gt;Parameters&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 2.0&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 2.1&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 2.2a&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 2.2b&lt;/p&gt;&lt;/th&gt;&lt;th align="left"&gt;&lt;p&gt;Model 2.3 ~ &lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Intercept&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;79.25 (2.19)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;78.86 (2.05)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;82.01 (2.02)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;79.08 (2.20)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;79.02 (1.87)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Grade Prediction&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.28 (0.07)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.64 (0.17)*&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.71 (0.35)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.75 (0.34)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;GPA Variability (low vs. moderate)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-3.18 (2.04)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-1.21 (2.08)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.42 (2.13)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;GPA Variability (low vs. high)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-8.35 (2.01)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.66 (3.05)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-.0.95 (3.01)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Grade Prediction x GPA Variability (low vs. moderate)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.10 (0.23)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.19 (0.38)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.30 (0.37)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Grade Prediction x GPA Variability (low vs. high)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.49 (0.19)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.75 (0.39) ~ &lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.80 (0.39)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Overall GPA&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;9.50 (2.63) &lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;8.07 (2.55)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Overall GPA x Grade Prediction&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.41 (0.70)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.33 (0.69)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Overall GPA x Grade Prediction x GPA Variability (low vs. moderate)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.80 (0.86)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;0.73 (0.86)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Overall GPA x Grade Prediction x GPA Variability (low vs. high)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.05 (0.79)&lt;/p&gt;&lt;/td&gt;&lt;td align="left"&gt;&lt;p&gt;-0.07 (0.78)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Course Average&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;1.10 (0.35)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Course Average x Grade Prediction&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;-0.07 (0.04)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Course Average x Grade Prediction x GPA Variability (low vs. moderate)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.03 (0.08)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;&lt;p&gt;Course Average x Grade Prediction x GPA Variability (low vs. high)&lt;/p&gt;&lt;/td&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left" /&gt;&lt;td align="left"&gt;&lt;p&gt;0.10 (0.07)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Standard errors are in parentheses. Asterisks indicate the significance level of effects and improvement in overall model fit compared to the preceding model ~ <emph>p</emph> &lt;.10, <emph>p</emph> &lt;.05, <emph>p</emph> &lt;.01, *<emph>p</emph> &lt;.001</p> <p>Graph: Fig. 1 Plot of the interaction between GPA variability and grade predictions from Model 2.3</p> <hd id="AN0162469109-17">Discussion</hd> <p>In summary, we examined the relationship between variability in prior performance and grade prediction accuracy in two different ways using multilevel modeling—across tests within a single course and across all our students' prior coursework. Within a single course, we found that (<reflink idref="bib1" id="ref45">1</reflink>) higher performance on earlier exams was associated with higher performance on the final exam, (<reflink idref="bib2" id="ref46">2</reflink>) higher course averages (in other words, easier courses) were associated with higher final exam grades, and (<reflink idref="bib3" id="ref47">3</reflink>) higher grade predictions on the final exam were associated with higher grades on the final exam (indicating that students had some degree of accuracy in their predictions, on average). Most importantly however, the hypothesized interaction between predictions and variability (within the span of a single course) was absent. However, this interaction was notably closer to the <emph>p</emph> &lt; 0.05 cutoff prior to adding average prior exam performance (not variability) to the model (Model 1.1b; <emph>p</emph>s = 0.053). This finding might suggest that, although it might appear that prediction resolution decreased as grade variability increased, this potential relationship is confounded by students' prior grades and this ceiling effect.</p> <p>When variability in performance was measured across courses, greater variability in the academic record was associated with reduced grade prediction accuracy. The main effects we observed meant that (<reflink idref="bib1" id="ref48">1</reflink>) higher grade predictions were associated with higher actual grades on the exams, (<reflink idref="bib2" id="ref49">2</reflink>) higher GPAs were associated with higher exam grades, and (<reflink idref="bib3" id="ref50">3</reflink>) higher course averages (easier courses) were associated with higher exam grades. Relevant to our predictions, there was a significant interaction between grade predictions and GPA variability. This interaction showed that the association between predictions and performance was stronger for the lowest variability group than for the highest variability group (see Fig. 1). This finding is consistent with our hypothesis that variability in past performance, which is primarily a problem for low-performing students, leads to worse metacognitive accuracy, at least when variability is defined across one's academic career. It is important to note that in the final model, the interaction between GPA variability and grade predictions remained significant <emph>even after accounting for students' overall GPAs despite the clear relationship between GPA and GPA variability</emph> (as GPA approaches ceiling, GPA variability must approach zero). This finding suggests that GPA variability provides incremental validity in predicting the relationship between students' grade predictions and actual grades that is not captured by a student's average level of ability; this finding also highlights the need to disentangle variability in performance from overall performance.</p> <hd id="AN0162469109-18">General discussion</hd> <p>Two studies were designed to test the hypotheses that low-performing students have greater variability in their past performance compared to high-performing students and that increased variability is associated with poor calibration. These hypotheses were tested by examining the variability in students' past performance across exams within a single course (Studies 1 and 2) and across courses within an academic career (Study 2). Results showed that across different exams (Study 1) and different courses (Study 2), low-performing students had more variable past performance than high-performing students. Increased variability in performance across one's academic career (but not across exams within a course) was associated with increased errors on future exam and class predictions in Study 2. Thus, results showed that low performers have more variable past performance than high performers, and that variability across one's academic career is associated with increased prediction errors.</p> <p>The results showing that variability in performance across tests is not associated with increased calibration errors might have arisen because variability was somewhat limited in Study 2. For example, some courses had only three exams, thus limiting our estimates of variability to just the two exams prior to the one for which students made predictions. Thus, it is possible that using more tests, one would be able to observe greater variability in exam performance, which would be associated with worse calibration. Supporting this hypothesis, the average variability in test performance within a single course was higher in Study 1, using four tests, than it was in Study 2 using between two and five tests. In Study 1, the average variability was 13.87 for low performers and 5.88 for high performers, whereas in Study 2, it was 7.97 for low performers and 6.88 for high performers.</p> <p>Of course, it is possible that variability within a single course is simply not associated with students' prediction accuracy on a later test. Other studies (Foster et al., [<reflink idref="bib10" id="ref51">10</reflink>]) have examined students' use of past performance in making predictions and found no relationship between students' prior exam performance and their predictions. However, Foster et al. ([<reflink idref="bib10" id="ref52">10</reflink>]) examined the role of the prior test performance from a single test and not variability of all prior tests on subsequent calibration. But, the fact that students in their study did not rely on their performance on the most recent prior test when making predictions suggests that students may not pay a lot of attention to their prior test performance within a single class when making future test performance predictions.</p> <p>Results from Study 2 showed that increased variability across one's academic career is associated with poor calibration. Low-performing students possessed far more varied academic records than high-performing students. High variability in one's academic record was clearly associated with worse calibration, showing that this led to near zero ability to predict one's performance (see Fig. 1). Given that low-performing students possessed more varied prior performance than high-performing students, and that variability in prior course performance was related to calibration, variability in past performance might provide one explanation for why low-performing students exhibit poor calibration.</p> <p>We offer one explanation for why low-performing students may have poor calibration in general. But, we note that the typical pattern is not simply one of poor calibration, but also one of over predicting performance. Thus, one may ask why low-performing students are inaccurate in the direction of overconfidence, rather than sometimes under predicting and sometimes over predicting their performance. The design of Study 2 does not allow us to determine the reason for the typical direction of the calibration errors. However, we speculate that, with less reliable information available and a tendency to attribute poor performance to external rather than internal causes (Hacker et al., [<reflink idref="bib12" id="ref53">12</reflink>]), low-performing students may turn to other sources of information when making their predictions. For example, research suggests that low performers report relying on motivational factors, such as the grades they wish to earn, more than high performers do, and this reliance on motivational factors predicts poor calibration in these students (see Serra &amp; DeMarre, [<reflink idref="bib28" id="ref54">28</reflink>]; Saenz et al., [<reflink idref="bib26" id="ref55">26</reflink>]). Because low-performing students' desired grades typically exceed their actual performance (Saenz et al., [<reflink idref="bib26" id="ref56">26</reflink>]), this would result in overconfidence, especially when combined with a lack of reliable information. Thus, consistent with the information-motivation theory (Tirso &amp; Geraci, [<reflink idref="bib31" id="ref57">31</reflink>]), we suggest that in the face of impoverished information about prior performance, low performers may rely on other factors that lead them to err in the direction of overconfidence.</p> <p>It is also possible that, with increased variability in past performance, low performers may choose which past performance they believe or hope is most representative of their abilities. That is, they may select to remember their past successes when predicting the future, choosing to focus on past "wins" rather than past "losses", so to speak. This variability in past performance could allow people to focus more on their potential rather than their past, given that their past performance doesn't yield a clear pattern of performance. Research shows that people tend to think of themselves in terms of who they want to be, instead of who they have been in the past (e.g., Williams &amp; Gilovich, [<reflink idref="bib34" id="ref58">34</reflink>]), and their predictions about the future tend to be based on their aspirations (Helzer &amp; Dunning, [<reflink idref="bib13" id="ref59">13</reflink>]). So it may be that, if there is a lot of variability in past performance, this pattern allows people to believe in future possibilities. Regardless of the potential mechanism for overconfidence, the current results show that low performers have more variable past performance than high performers and that the increased variability in one's past academic record is associated with increased calibration errors.</p> <p>As with all studies, this study has some limitations. For example, our ability to examine the relationship between variability of past exam performance and predictions was limited by the number of exams in each course in Study 2. Future studies should examine the role of variable past performance in situations where there are more tests to gain a better estimate of variability to investigate the potential relationship between past performance variability and prediction accuracy. We only examined global performance predictions ("What is your predicted grade"), but future studies might examine whether past performance variability influences local performance judgments ("What is the likelihood that this answer is correct?"). On the one hand, there is likely additional variability inherent in local judgments, as even the top-performing students will miss some items and experience variable item performance. Therefore, one might predict a smaller variability difference between high and low performers for local judgments compared to global judgments. Alternatively, one might predict that low performers would miss some items and get others correct, whereas high performers might get these items mostly correct, such that differences in variability for these groups of students would persist. One might also assess the relationship between performance variability and postdictions (performance predictions that are taken after the test is completed). One hypothesis is that postdictions might be relatively unaffected by past performance variability because people would have recent diagnostic test experience to inform their judgments. Also, the current studies examined the potential role of past performance variability in prediction accuracy in a sample of undergraduate students, so we don't know how the results would generalize to other student or community samples. Further, we did not measure the other factors that may influence prediction accuracy (e.g., course type, perceived course difficulty, type of prior courses, college major, test type: cumulative or not, etc.). For example, it would be interesting to examine whether performance across courses varies less within a students' major compared to outside the major, and whether this influences the accuracy of performance predictions. Also, future studies might examine which courses students are considering when they are making assessments of future course performance. Future studies might examine the contribution of these and other factors to a potential association between performance variability and performance predictions. Finally, the sample size (<emph>N</emph> = 76) was relatively small in Study 2 due to the limited number of students that provided copies of their unofficial transcripts. Future research is needed to replicate these findings using a larger sample.</p> <p>There are several practical implications of these results. For example, the results have implications for the type of feedback one might provide to help low performers, in particular, improve their calibration. One suggestion would be to provide students with the full range of their past performance, drawing their attention to the low and the high grades. The results from Study 2 showed that variability in past academic performance predicts calibration, which may mean that students are selectively recalling or choosing which past grades to consider when predicting their future performance. Therefore, it may be important to provide students with the full range of their past performance to impede the potential for selective recall. Even if students do consider the full range of their past performance, they may determine that they will perform well on an upcoming exam because they have performed well before. After all, most students have performed well in at least one course. Therefore, it may be important to draw student's attention to, not simply the full range of past performance, but also the frequency or average of their past performance. It may also be helpful to warn students about the potential pitfalls of variable past performance.</p> <p>In addition to having implications for the type of feedback that might be effective for helping students improve their calibration, the current results offer another factor for instructors and institutional researchers to consider when determining which students might benefit from extra support, and what type of support they might need. The current results suggest that variability of past performance and not just overall GPA may signal a student in need of certain forms of academic support. A student with a 2.5 GPA with consistent performance may be quite different from a student with the same GPA but variable performance. Both students could likely benefit from extra support in the form of tutoring, learning about effective study strategies, and time management, but the student with consistently poor performance may have different challenges than the student with variable performance, and so the interventions for the two students might be different. Of course, future research is needed to directly test these hypotheses and explore the efficacy of potential interventions, but for now the current results suggest that variability in past performance is an important factor to target for future interventions.</p> <p>In sum, we found that low-performing students had more variable past performance within a single course (Study 1) and across their academic careers (Study 2) than high-performing students. We also found that variability in one's academic career (but not in a single course) was related to calibration on an upcoming exam. These results suggest that variability in past performance may be an important factor that contributes to students' metacognitive knowledge, particularly among low-performing students.</p> <hd id="AN0162469109-19">Author contributions</hd> <p>The first author conceived of the studies, contributed to the design of the studies, facilitated data collection for Study 1, and contributed to the writing of the manuscript. The second author contributed to study design and data analysis for Study 1, and to the writing of the manuscript. The third author contributed to the study design, data collection, and data analysis for Study 2, and to the writing of the manuscript. The fourth author contributed to the data scoring and data analysis of Study 2, and to manuscript preparation. The last author contributed to the data analysis, data interpretation, and writing of the results for both studies.</p> <hd id="AN0162469109-20">Declarations</hd> <p></p> <hd id="AN0162469109-21">Conflict of interest</hd> <p>The authors report no relevant financial or non-financial interests to disclose. The authors declare that they have no conflict of interest. This study was approved by the Texas A&amp;M University Institutional Review Board (#2019-0783D).</p> <hd id="AN0162469109-22">Informed consent</hd> <p>Informed consent was obtained by all participants in the studies.</p> <hd id="AN0162469109-23">Publisher's note</hd> <p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p> <ref id="AN0162469109-24"> <title> References </title> <blist> <bibl id="bib1" idref="ref3" type="bt">1</bibl> <bibtext> Al-Harthy IS, Was CA, Hassan AS. Poor performers are poor predictors of performance and they know it: Can they improve their prediction accuracy. Journal of Global Research in Education and Social Science. 2015; 4; 2: 93-100</bibtext> </blist> <blist> <bibl id="bib2" idref="ref34" type="bt">2</bibl> <bibtext> Bates, D, Maechler, M, &amp; Bolker, B, &amp; Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01.</bibtext> </blist> <blist> <bibl id="bib3" idref="ref4" type="bt">3</bibl> <bibtext> de Bruin AB, Kok EM, Lobbestael J, de Grip A. The impact of an online tool for monitoring and regulating learning at university: Overconfidence, learning strategy, and personality. Metacognition and Learning. 2017; 12; 1: 21-43. 10.1007/s11409-016-9159-5</bibtext> </blist> <blist> <bibl id="bib4" idref="ref28" type="bt">4</bibl> <bibtext> Crocker, L, &amp; Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston.</bibtext> </blist> <blist> <bibl id="bib5" idref="ref17" type="bt">5</bibl> <bibtext> Dunlosky, J, &amp; Ariel, R. (2011). Self-regulated learning and the allocation of study time. Psychology of Learning and Motivation (Vol. 54, pp. 103–140). Academic Press.</bibtext> </blist> <blist> <bibl id="bib6" idref="ref16" type="bt">6</bibl> <bibtext> Dunlosky J, Hertzog C, Kennedy MR, Thiede KW. The self-monitoring approach for effective learning. Cognitive Technology. 2005; 10; 1: 4-11</bibtext> </blist> <blist> <bibl id="bib7" idref="ref18" type="bt">7</bibl> <bibtext> Dunlosky J, Rawson KA. Overconfidence produces underachievement: Inaccurate self evaluations undermine students' learning and retention. Learning and Instruction. 2012; 22; 4: 271-280. 10.1016/j.learninstruc.2011.08.003</bibtext> </blist> <blist> <bibl id="bib8" idref="ref31" type="bt">8</bibl> <bibtext> Dunlosky J, Thiede KW. Four cornerstones of calibration research: Why understanding students' judgments can improve their achievement. Learning and Instruction. 2013; 24: 58-61. 10.1016/j.learninstruc.2012.05.002</bibtext> </blist> <blist> <bibl id="bib9" idref="ref5" type="bt">9</bibl> <bibtext> Dunning D, Johnson K, Ehrlinger J, Kruger J. Why people fail to recognize their own incompetence. Current Directions in Psychological Science. 2003; 12; 3: 83-87. 10.1111/1467-8721.01235</bibtext> </blist> <blist> <bibtext> Foster NL, Was CA, Dunlosky J, Isaacson RM. Even after thirteen class exams, students are still overconfident: The role of memory for past exam performance in student predictions. Metacognition and Learning. 2017; 12; 1: 1-19. 10.1007/s11409-016-9158-6</bibtext> </blist> <blist> <bibtext> Hacker DJ, Bol L, Bahbahani K. Explaining calibration accuracy in classroom contexts: The effects of incentives, reflection, and explanatory style. Metacognition and Learning. 2008; 3; 2: 101-121. 10.1007/s11409-008-9021-5</bibtext> </blist> <blist> <bibtext> Hacker DJ, Bol L, Horgan DD, Rakow EA. Test prediction and performance in a classroom context. Journal of Educational Psychology. 2000; 92; 1: 160-170. 10.1037/0022-0663.92.1.160</bibtext> </blist> <blist> <bibtext> Helzer EG, Dunning D. Why and when peer prediction is superior to self-prediction: The weight given to future aspirations versus past achievement. Journal of Personality and Social Psychology. 2012; 103; 1: 38-53. 10.1037/a0028124</bibtext> </blist> <blist> <bibtext> Händel M, Fritzsche ES. Unskilled but subjectively aware: Metacognitive monitoring ability and respective awareness in low-performing students. Memory &amp; Cognition. 2016; 44; 2: 229-241. 10.3758/s13421-015-0552-0</bibtext> </blist> <blist> <bibtext> Hughes, J. (2020). Reghelper: Helper functions for regression analysis. R package version 0.3.6. https://CRAN.R-project.org/package=reghelper.</bibtext> </blist> <blist> <bibtext> Karpicke JD, Roediger HL III. The critical importance of retrieval for learning. Science. 2008; 319; 5865: 966-968. 10.1126/science.1152408</bibtext> </blist> <blist> <bibtext> Knowles, J. E, &amp; Frederick, C. (2019). merTools: Tools for analyzing mixed effect regression models. R package version 0.5.0. https://CRAN.R-project.org/package=merTools.</bibtext> </blist> <blist> <bibtext> Koriat A. Monitoring one's own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General. 1997; 126; 4: 349-370. 10.1037/0096-3445.126.4.349</bibtext> </blist> <blist> <bibtext> Kruger J, Dunning D. Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology. 1999; 77; 6: 1121-1134. 10.1037/0022-3514.77.6.1121</bibtext> </blist> <blist> <bibtext> Kuznetsova, A, Brockhoff, P. B, &amp; Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82, 1–26. https://doi.org/10.18637/jss.v082.i13.</bibtext> </blist> <blist> <bibtext> Long, J. A. (2020). Jtools: Analysis and presentation of social scientific data. R package version 2.1.0. https://CRAN.R-project.org/package=jtools.</bibtext> </blist> <blist> <bibtext> Miller TM, Geraci L. Training metacognition in the classroom: The influence of incentives and feedback on exam predictions. Metacognition and Learning. 2011; 6; 3: 303-314. 10.1007/s11409-011-9083-7</bibtext> </blist> <blist> <bibtext> Moser BK, Stevens GR, Watts CL. The two-sample t test versus Satterthwaite's approximate f test. Communications in Statistics: Theory and Methods. 1989; 18; 11: 3963-3975. 10.1080/03610928908830135</bibtext> </blist> <blist> <bibtext> Nguyen, D. T, Kim, E, Rodriguez de Gil, P, Kellermann, A, Chen, Y.-H, Kromrey, J. D, &amp; Bellara, A. (2016) Parametric tests for two population means under normal and non-normal distributions. Journal of Modern Applied Statistical Methods, 15(1), 9. https://doi.org/10.22237/jmasm/1462075680.</bibtext> </blist> <blist> <bibtext> RStudio Team (2020). RStudio: Integrated development for R. RStudio, Inc. <ulink href="http://www.rstudio.com/">http://www.rstudio.com/</ulink>.</bibtext> </blist> <blist> <bibtext> Saenz GD, Geraci L, Miller TM, Tirso R. Metacognition in the classroom: The association between students' exam predictions and their desired grades. Consciousness and Cognition. 2017; 51: 125-139. 10.1016/j.concog.2017.03.002</bibtext> </blist> <blist> <bibtext> Saenz GD, Geraci L, Tirso R. Improving metacognition: A comparison of interventions. Applied Cognitive Psychology. 2019; 33; 5: 918-929. 10.1002/acp.3556</bibtext> </blist> <blist> <bibtext> Serra MJ, DeMarree KG. Unskilled and unaware in the classroom: College students' desired grades predict their biased grade predictions. Memory &amp; Cognition. 2016; 44; 7: 1127-1137. 10.3758/s13421-016-0624-9</bibtext> </blist> <blist> <bibtext> Stark, E, &amp; Sachau, D. (2016). Lake Wobegon's guns: Overestimating our gun-related competences. Journal of Social and Political Psychology, 4(1). https://doi.org/10.5964/jspp.v4i1.464.</bibtext> </blist> <blist> <bibtext> Thiede KW. The importance of monitoring and self-regulation during multitrial learning. Psychonomic Bulletin &amp; Review. 1999; 6; 4: 662-667. 10.3758/BF03212976</bibtext> </blist> <blist> <bibtext> Tirso R, Geraci L. Taking another perspective on overconfidence in cognitive ability: A comparison of self and other metacognitive judgments. Journal of Memory and Language. 2020; 114: 1-14. 10.1016/j.jml.2020.104132</bibtext> </blist> <blist> <bibtext> Tirso R, Geraci L, Saenz GD. Examining underconfidence among high-performing students: A test of the false consensus hypothesis. Journal of Applied Research in Memory and Cognition. 2019; 8; 2: 154-165. 10.1016/j.jarmac.2019.04.003</bibtext> </blist> <blist> <bibtext> Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springter-Verlag.</bibtext> </blist> <blist> <bibtext> Williams EF, Gilovich T. Do people really believe they are above average?. Journal of Experimental Social Psychology. 2008; 44: 1121-1128. 10.1016/j.jesp.2008.01.002</bibtext> </blist> <blist> <bibtext> Worthen BR, White KR, Fan X, Sudweeks RR. Measurement and assessment in schools. 19992; Allyn &amp; Bacon/Longman</bibtext> </blist> </ref> <ref id="AN0162469109-25"> <title> Footnotes </title> <blist> <bibtext> A Satterthwaite approximation for the degrees of freedom was used as it has been shown to have adequate Type I error control and statistical power given mild violations of normality and unequal variances (Moser et al., [23]; Nguyen et al., [24]).</bibtext> </blist> <blist> <bibtext> The questionnaire was given after the first exam of the research methods course (Class 4) instead of before due to a scheduling error. Analyses indicated this may have strengthened the association between predictions and performance, but this did not affect the overall pattern of results—the inclusion or exclusion of the exam one data from Class 4 did not noticeably change any results.</bibtext> </blist> <blist> <bibtext> Normality assumption was checked for multilevel models and the outcome was approximately normally distributed.</bibtext> </blist> <blist> <bibtext> For Classes 1 and 5, grade data were available for the first exam even though no prediction data were collected for those exams (see Table 3). Nevertheless, these grade data were included in course difficulty calculations to obtain a more accurate estimate of overall course difficulty. Students' average grade on their previous exams was also added to the analyses to account for the relationship between variability and grades because variability decreases as grades approach ceiling (r(402) = -.12, p = .017).</bibtext> </blist> </ref> <aug> <p>By Lisa Geraci; Nayantara Kurpad; Robert Tirso; Kathryn N. Gray and Yan Wang</p> <p>Reported by Author; Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib10" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib11" firstref="ref2"></nolink> <nolink nlid="nl3" bibid="bib12" firstref="ref7"></nolink> <nolink nlid="nl4" bibid="bib14" firstref="ref9"></nolink> <nolink nlid="nl5" bibid="bib22" firstref="ref10"></nolink> <nolink nlid="nl6" bibid="bib26" firstref="ref11"></nolink> <nolink nlid="nl7" bibid="bib28" firstref="ref12"></nolink> <nolink nlid="nl8" bibid="bib31" firstref="ref13"></nolink> <nolink nlid="nl9" bibid="bib32" firstref="ref14"></nolink> <nolink nlid="nl10" bibid="bib29" firstref="ref15"></nolink> <nolink nlid="nl11" bibid="bib30" firstref="ref19"></nolink> <nolink nlid="nl12" bibid="bib19" firstref="ref20"></nolink> <nolink nlid="nl13" bibid="bib16" firstref="ref21"></nolink> <nolink nlid="nl14" bibid="bib27" firstref="ref24"></nolink> <nolink nlid="nl15" bibid="bib13" firstref="ref27"></nolink> <nolink nlid="nl16" bibid="bib35" firstref="ref29"></nolink> <nolink nlid="nl17" bibid="bib18" firstref="ref30"></nolink> <nolink nlid="nl18" bibid="bib322" firstref="ref33"></nolink> <nolink nlid="nl19" bibid="bib25" firstref="ref37"></nolink> <nolink nlid="nl20" bibid="bib20" firstref="ref40"></nolink> <nolink nlid="nl21" bibid="bib17" firstref="ref41"></nolink> <nolink nlid="nl22" bibid="bib21" firstref="ref42"></nolink> <nolink nlid="nl23" bibid="bib15" firstref="ref43"></nolink> <nolink nlid="nl24" bibid="bib33" firstref="ref44"></nolink> <nolink nlid="nl25" bibid="bib34" firstref="ref58"></nolink>
Header	DbId: eric DbLabel: ERIC An: EJ1371112 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: Metacognitive Errors in the Classroom: The Role of Variability of Past Performance on Exam Prediction Accuracy – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Geraci%2C+Lisa%22">Geraci, Lisa</searchLink> (ORCID <externalLink term="http://orcid.org/0000-0001-9302-2871">0000-0001-9302-2871</externalLink>)<br /><searchLink fieldCode="AR" term="%22Kurpad%2C+Nayantara%22">Kurpad, Nayantara</searchLink><br /><searchLink fieldCode="AR" term="%22Tirso%2C+Robert%22">Tirso, Robert</searchLink><br /><searchLink fieldCode="AR" term="%22Gray%2C+Kathryn+N%2E%22">Gray, Kathryn N.</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Yan%22">Wang, Yan</searchLink> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Metacognition+and+Learning%22"><i>Metacognition and Learning</i></searchLink>. Apr 2023 18(1):219-236. – Name: Avail Label: Availability Group: Avail Data: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/ – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 18 – Name: DatePubCY Label: Publication Date Group: Date Data: 2023 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Prediction%22">Prediction</searchLink><br /><searchLink fieldCode="DE" term="%22Tests%22">Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Scores%22">Scores</searchLink><br /><searchLink fieldCode="DE" term="%22Low+Achievement%22">Low Achievement</searchLink><br /><searchLink fieldCode="DE" term="%22Accuracy%22">Accuracy</searchLink><br /><searchLink fieldCode="DE" term="%22Hypothesis+Testing%22">Hypothesis Testing</searchLink><br /><searchLink fieldCode="DE" term="%22Correlation%22">Correlation</searchLink><br /><searchLink fieldCode="DE" term="%22Metacognition%22">Metacognition</searchLink><br /><searchLink fieldCode="DE" term="%22Learning+Processes%22">Learning Processes</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1007/s11409-022-09326-7 – Name: ISSN Label: ISSN Group: ISSN Data: 1556-1623<br />1556-1631 – Name: Abstract Label: Abstract Group: Ab Data: Students often make incorrect predictions about their exam performance, with the lowest-performing students showing the greatest inaccuracies in their predictions. The reasons why low-performing students make inaccurate predictions are not fully understood. In two studies, we tested the hypothesis that low-performing students erroneously predict their exam performance in part because their past performance varies considerably, yielding unreliable data from which to make their predictions. In contrast, high-performing students tend to have consistently high past performance that they can rely on to make relatively accurate predictions of future test performance. Results showed that across different exams (Study 1) and different courses (Study 2), low-performing students had more variable past performance than high-performing students. Further, results from Study 2 showed that variability in past course performance (but not past exam performance) was associated with poor calibration. Results suggest that variability in past performance may be one factor that contributes to low-performing students' erroneous performance predictions. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2023 – Name: AN Label: Accession Number Group: ID Data: EJ1371112
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1371112
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1007/s11409-022-09326-7 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 18 StartPage: 219 Subjects: – SubjectFull: Prediction Type: general – SubjectFull: Tests Type: general – SubjectFull: Scores Type: general – SubjectFull: Low Achievement Type: general – SubjectFull: Accuracy Type: general – SubjectFull: Hypothesis Testing Type: general – SubjectFull: Correlation Type: general – SubjectFull: Metacognition Type: general – SubjectFull: Learning Processes Type: general Titles: – TitleFull: Metacognitive Errors in the Classroom: The Role of Variability of Past Performance on Exam Prediction Accuracy Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Geraci, Lisa – PersonEntity: Name: NameFull: Kurpad, Nayantara – PersonEntity: Name: NameFull: Tirso, Robert – PersonEntity: Name: NameFull: Gray, Kathryn N. – PersonEntity: Name: NameFull: Wang, Yan IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 04 Type: published Y: 2023 Identifiers: – Type: issn-print Value: 1556-1623 – Type: issn-electronic Value: 1556-1631 Numbering: – Type: volume Value: 18 – Type: issue Value: 1 Titles: – TitleFull: Metacognition and Learning Type: main
ResultId	1