Effect Size Reporting Practices in Applied Linguistics Research: A Study of One Major Journal
Saved in:
| Title: | Effect Size Reporting Practices in Applied Linguistics Research: A Study of One Major Journal |
|---|---|
| Language: | English |
| Authors: | Wei, Rining, Hu, Yuhang (ORCID |
| Source: | SAGE Open. Apr 2019 9(2). |
| Availability: | SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: http://sagepub.com |
| Peer Reviewed: | Y |
| Page Count: | 11 |
| Publication Date: | 2019 |
| Document Type: | Journal Articles Information Analyses |
| Descriptors: | Applied Linguistics, Language Research, Periodicals, Effect Size, Research Reports, Journal Articles |
| DOI: | 10.1177/2158244019850035 |
| ISSN: | 2158-2440 |
| Abstract: | Many surveys of effect size (ES) reporting practices have been conducted in social science fields such as psychology and education, but few such studies are available in applied linguistics. To bridge this gap and to echo the recent calls for more robust statistics from scholars in applied linguistics and beyond, this study represents the first attempt, in the field of applied linguistics, to focus upon ES reporting practices. With an innovative "two-standards" approach for coding, which overcomes the limitations with similar studies in other social science fields (e.g., communication), this study assesses the ES reporting practices over a span of 6 years in a major journal. Findings include the following: (a) the ES reporting rate is about 50% and (b) some improvement of ES reporting over time is in evidence. Future research directions (e.g., examining whether and how ES is interpreted after being reported) are suggested. |
| Abstractor: | As Provided |
| Entry Date: | 2019 |
| Accession Number: | EJ1221340 |
| Database: | ERIC |
|
Full text is not displayed to guests.
Login for full access.
|
|
| FullText | Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwHeItFeMlZKOiZk5q3fETVIAAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDApQ3g35Xd-5xYJyyAIBEICBmgUA5KpRIqqspluPwniNaJF0sdD2iaYXQPi7ItLWNeS_MtTtwod9JhC19i5SPDWPpMThur5XMvHHK5bAZ88d7Ci_JlNN3e6qfDcOyWsZjLi901w6dcRl3yDiAWJ_4QOJPLzY3TZJMqGI3z6pVLWnO2T2Kxv1zZNgapSO0O3Al8SnBl7o_aKfB2Dbi9dKxR2UT-4gYE-xEanvKgE= Text: Availability: 1 Value: <anid>AN0137323678;[kbz6]01apr.19;2019Jul08.03:37;v2.2.500</anid> <title id="AN0137323678-1">Effect Size Reporting Practices in Applied Linguistics Research: A Study of One Major Journal </title> <p>Many surveys of effect size (ES) reporting practices have been conducted in social science fields such as psychology and education, but few such studies are available in applied linguistics. To bridge this gap and to echo the recent calls for more robust statistics from scholars in applied linguistics and beyond, this study represents the first attempt, in the field of applied linguistics, to focus upon ES reporting practices. With an innovative "two-standards" approach for coding, which overcomes the limitations with similar studies in other social science fields (e.g., communication), this study assesses the ES reporting practices over a span of 6 years in a major journal. Findings include the following: (a) the ES reporting rate is about 50% and (b) some improvement of ES reporting over time is in evidence. Future research directions (e.g., examining whether and how ES is interpreted after being reported) are suggested.</p> <p>Keywords: effect size; null hypothesis significance testing; quantitative method; p value; effect size reporting; effect size interpretation</p> <hd id="AN0137323678-2">Introduction</hd> <p>The importance of effect size vis-à-vis the inherent limitations of Null Hypothesis Significance Testing (NHST; including the significance level, viz. the <emph>p</emph> value) has been underlined since five decades ago (e.g., [<reflink idref="bib16" id="ref1">16</reflink>]) not only in psychology ([<reflink idref="bib3" id="ref2">3</reflink>], [<reflink idref="bib4" id="ref3">4</reflink>]; Wilkinson &amp; [<reflink idref="bib51" id="ref4">51</reflink>]) and education ([<reflink idref="bib2" id="ref5">2</reflink>]) but also, more recently, in applied linguistics ([<reflink idref="bib24" id="ref6">24</reflink>]; [<reflink idref="bib36" id="ref7">36</reflink>]; [<reflink idref="bib37" id="ref8">37</reflink>]). The limitations of NHST are not delineated here due to space constraints (see [<reflink idref="bib22" id="ref9">22</reflink>], for an excellent and detailed account, and [<reflink idref="bib17" id="ref10">17</reflink>]; [<reflink idref="bib37" id="ref11">37</reflink>]; [<reflink idref="bib44" id="ref12">44</reflink>], for summaries). One major limitation is that the <emph>p</emph> value depends upon the sample size; in other words, an increased sample size will eventually yield a small enough <emph>p</emph> value (viz. <emph>p</emph> being.05 or smaller; [<reflink idref="bib7" id="ref13">7</reflink>]; [<reflink idref="bib48" id="ref14">48</reflink>]).</p> <p>On the contrary, effect size, simply put, is "an objective and (usually) standardized measure of the magnitude of observed effect" ([<reflink idref="bib13" id="ref15">13</reflink>], p. 56). Compared with the <emph>p</emph> value, effect size is much less influenced by sample size (cf., [<reflink idref="bib12" id="ref16">12</reflink>]).[<reflink idref="bib3" id="ref17">3</reflink>] In this sense, effect size is as important as, if not more important than, the significance level. [<reflink idref="bib11" id="ref18">11</reflink>] uses the two-sides-of-one-coin analogy to argue that the <emph>p</emph> value and effect size complement but do not substitute for each other, suggesting that researchers report both in their quantitative studies. [<reflink idref="bib25" id="ref19">25</reflink>] goes a step further by stating that "effect size is much more important than a null significance hypothesis test" (the <emph>p</emph> value included) (p. 472).</p> <p>The importance of effect size notwithstanding, only a handful of journals in applied linguistics make such reporting practices mandatory in their editorial policies. While [<reflink idref="bib24" id="ref20">24</reflink>], p. 114) claims that "currently, the only journal in the second language research field which requires effect sizes is <emph>Language Learning</emph>," to our knowledge, <emph>TESOL Quarterly</emph> has required the reporting of effect size measures in quantitative studies since the early 2000s; since about the same time, <emph>The Modern Language Journal</emph> has released similar editorial policies with respect to effect size reporting ([<reflink idref="bib40" id="ref21">40</reflink>]). To date, another five journals have similar requirements: <emph>Foreign Language Annuals, Language Learning &amp; Technology, Language Testing, Second Language Research</emph>, and <emph>Studies in Second Language Acquisition</emph>.</p> <p>In contrast to the increasing awareness of the importance of effect size, little is known about the current status of effect size reporting in the field of applied linguistics. Although [<reflink idref="bib39" id="ref22">39</reflink>] and [<reflink idref="bib29" id="ref23">29</reflink>] note that the effect size reporting rates in their sampled papers are not high (25% and 49%, respectively), the focus of these studies is not on effect size reporting practices in the field. In contrast, many studies in such fields as education, psychology, and communication (e.g., [<reflink idref="bib31" id="ref24">31</reflink>]; [<reflink idref="bib44" id="ref25">44</reflink>]) have focused upon the effect size reporting practices (see "Literature Review" section).</p> <p>In view of the undue neglect of effect size reporting in applied linguistics, this article aims to contribute to our understanding by surveying such practices in <emph>System</emph>, subtitled <emph>An International Journal of Educational Technology and Applied Linguistics</emph>. This journal is selected because of two considerations. First, it is a major journal in the field, as reflected in the fact that it is indexed in Social Sciences Citation Index (SSCI) and has been regarded as "major" by previous studies (e.g., [<reflink idref="bib6" id="ref26">6</reflink>]; [<reflink idref="bib19" id="ref27">19</reflink>]; [<reflink idref="bib47" id="ref28">47</reflink>]). Second, it does not mandate effect size reporting in its editorial policy, as is the case with most journals in the field, which means that it may better reflect the general situation of effect size reporting in applied linguistics, than the few above-mentioned journals that mandate effect size reporting.</p> <p>This exploratory study focuses upon the effect size reporting practices concerning five statistical procedures: <emph>t</emph> test, analysis of variance (ANOVA),[<reflink idref="bib4" id="ref29">4</reflink>] correlation, regression, and chi-square (χ<sups>2</sups>) test. They are selected primarily because they are "the top five" most frequently used methods in four major second language acquisition (SLA) academic journals ([<reflink idref="bib15" id="ref30">15</reflink>]), and thus presumably most frequently used in the wider field of applied linguistics ([<reflink idref="bib25" id="ref31">25</reflink>]). Furthermore, these five tests are also the focal methods in other fields of social sciences such as communication ([<reflink idref="bib44" id="ref32">44</reflink>]) and education ([<reflink idref="bib1" id="ref33">1</reflink>]); findings from our study may have wider implications beyond the field of applied linguistics <emph>per se</emph>.</p> <p>Three research questions are pursued:</p> <p></p> <ulist> <item> <bold> Research Question 1: </bold> To what extent are measures of effect size reported?</item> <p></p> <item> <bold> Research Question 2: </bold> Do the effect size reporting practices vary across the years?</item> <p></p> <item> <bold> Research Question 3: </bold> For each of the five focal statistical methods, what is the effect size reporting rate and what effect size measures are typically reported?</item> </ulist> <p>In the remainder of this article, after providing a more detailed introduction to the definition and use of effect size with an illustrative example in a published study, we review relevant studies from such fields as education and psychology as well as from applied linguistics. We then report upon the data collection and analysis methods of our study. After presenting and discussing major findings, we conclude the article by offering suggestions for effect size reporting practices and for further studies that help contribute to the ongoing methodological reform in applied linguistics.</p> <hd id="AN0137323678-3">Effect Size</hd> <p></p> <hd id="AN0137323678-4">Definitions of Effect Size</hd> <p>Effect size is defined as an objective and standardized measure of the magnitude of an observed effect, with the wording "(usually)" removed from the above-cited concise definition by [<reflink idref="bib13" id="ref34">13</reflink>], p. 56). Although some other definitions (e.g., [<reflink idref="bib31" id="ref35">31</reflink>]; [<reflink idref="bib44" id="ref36">44</reflink>]) are so broad that they include nonstandardized forms (e.g., raw mean difference), it is strongly recommended that effect size measures be confined to standardized forms only, so as to maximize the benefits of these forms such as letting "the reader compare effects across groups" and "meta-analysts compare studies even if they use different original measures" ([<reflink idref="bib27" id="ref37">27</reflink>], p. 135). Furthermore, the danger of relying on raw mean difference, vis-à-vis the benefits of drawing upon standardized forms of effect size, will be illustrated with one authentic example of effect size reporting below.</p> <p>Dozens of effect size measures are available, each with relative strengths and weaknesses for particular purposes ([<reflink idref="bib10" id="ref38">10</reflink>]; [<reflink idref="bib17" id="ref39">17</reflink>]; [<reflink idref="bib21" id="ref40">21</reflink>]). Two types[<reflink idref="bib5" id="ref41">5</reflink>] of effect sizes highly relevant to applied linguistic research are the <emph>d</emph> family and the <emph>r</emph> family. The <emph>d</emph> family is based on standardized measures of mean differences (e.g., Cohen's <emph>d</emph>), whereas the <emph>r</emph> family includes standardized measures of strength of relations based on the proportion of variance accounted for (e.g., the <emph>r</emph> squared in regression) or correlation between two variables. Table 1 provides some frequently used effect size measures for the focal statistical methods, where only Cohen's <emph>d</emph> and Hedges's <emph>g</emph> belong to the <emph>d</emph> family and the others to the <emph>r</emph> family.</p> <p>Graph: Table 1. Effect Size Benchmarks.</p> <p></p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" rowspan="3"&gt;Method&lt;/th&gt;&lt;th align="center" rowspan="3"&gt;Frequently used effect size&lt;/th&gt;&lt;th align="center" colspan="4"&gt;Benchmark system&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="center" colspan="3"&gt;From Cohen&lt;/th&gt;&lt;th align="center" rowspan="2"&gt;From researchers in applied linguistics&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="center"&gt;Small&lt;/th&gt;&lt;th align="center"&gt;Medium&lt;/th&gt;&lt;th align="center"&gt;Large&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;italic&gt;t&lt;/italic&gt; test&lt;/td&gt;&lt;td&gt;&lt;italic&gt;r&lt;/italic&gt;Cohen's &lt;italic&gt;d&lt;/italic&gt;, Hedges's g&lt;/td&gt;&lt;td&gt;.10&lt;/td&gt;&lt;td&gt;.30&lt;/td&gt;&lt;td&gt;.50&lt;/td&gt;&lt;td&gt;&lt;xref ref-type="bibr" rid="bibr41"&gt;Plonsky and Oswald (2014)&lt;/xref&gt; for &lt;italic&gt;r&lt;/italic&gt;:.25 as small,.40 medium, and.60 large.&lt;xref ref-type="bibr" rid="bibr49"&gt;Wei and Hu (2018)&lt;/xref&gt; for &lt;italic&gt;R&lt;/italic&gt;2:.005 as small,.01 "typical" or medium,.02 large, and.09 very large&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td /&gt;&lt;td&gt;0.20&lt;/td&gt;&lt;td&gt;0.50&lt;/td&gt;&lt;td&gt;0.80&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Correlation&lt;/td&gt;&lt;td&gt;Pearson's &lt;italic&gt;r&lt;/italic&gt;Spearman's rho&lt;italic&gt;r&lt;/italic&gt;2&lt;/td&gt;&lt;td&gt;.10&lt;/td&gt;&lt;td&gt;.30&lt;/td&gt;&lt;td&gt;.50&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td /&gt;&lt;td&gt;.10&lt;/td&gt;&lt;td&gt;.30&lt;/td&gt;&lt;td&gt;.50&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td /&gt;&lt;td&gt;.01&lt;/td&gt;&lt;td&gt;.09&lt;/td&gt;&lt;td&gt;.25&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Chi-square&lt;/td&gt;&lt;td&gt;Cramer's &lt;italic&gt;V&lt;/italic&gt;, Phi&lt;/td&gt;&lt;td&gt;0.10&lt;/td&gt;&lt;td&gt;0.30&lt;/td&gt;&lt;td&gt;0.50&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Analysis of variance&lt;/td&gt;&lt;td&gt;&amp;#951;2&lt;/td&gt;&lt;td&gt;0.01&lt;/td&gt;&lt;td&gt;0.06&lt;/td&gt;&lt;td&gt;0.14&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Regression&lt;/td&gt;&lt;td&gt;&lt;italic&gt;R&lt;/italic&gt;2&lt;/td&gt;&lt;td&gt;.02&lt;/td&gt;&lt;td&gt;.13&lt;/td&gt;&lt;td&gt;.26&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>1 <emph>Source.</emph> Adapted from [<reflink idref="bib10" id="ref42">10</reflink>] and [<reflink idref="bib8" id="ref43">8</reflink>].</p> <p>Table 1 also lists some benchmarks for interpreting effect sizes recommended by [<reflink idref="bib8" id="ref44">8</reflink>] and by researchers in applied linguistics. [<reflink idref="bib8" id="ref45">8</reflink>] benchmark system had better be reserved "as a last resort" ([<reflink idref="bib10" id="ref46">10</reflink>], p. 42), although it has been used by too many researchers as iron-clad criteria without reference to the measurements taken, the study design, or the practical importance of the findings. Whenever possible, researchers should try to interpret effect sizes by grounding them in a meaningful context (e.g., comparisons with previous studies vis-à-vis the measurements and study design) or by assessing their contribution to knowledge (e.g., in terms of practical or clinical value). The two benchmark systems from researchers in applied linguistics (see Table 1) provide more nuanced guidance in interpreting the effect size in question than Cohen's system: [<reflink idref="bib41" id="ref47">41</reflink>] benchmarks are highly relevant to experiment-based studies in what they called "L2 research," and [<reflink idref="bib49" id="ref48">49</reflink>] to survey-based studies examining the effects of sociobiographical variables (e.g., gender and multilingualism) on (socio-)psychological variables (e.g., L2 joy and tolerance of ambiguity).</p> <hd id="AN0137323678-5">Consequences of Not Reporting Effect Size</hd> <p>As [<reflink idref="bib53" id="ref49">53</reflink>] point out, "not reporting effect size can be detrimental" (p. 212). Presenting one authentic example helps drive home the consequences of failing to report effect sizes. Table 2 is adapted from [<reflink idref="bib50" id="ref50">50</reflink>] analysis of the respondents' self-reported data concerning their English spoken proficiency and other variables from the largest language survey in China. The major modification made to [<reflink idref="bib50" id="ref51">50</reflink>] original table was that we added a column containing Cohen's <emph>d</emph> values. We suggest that an effect size from either the <emph>r</emph> or the <emph>d</emph> family can be used, and in fact, one can be easily converted into the other (see [<reflink idref="bib24" id="ref52">24</reflink>], pp. 117-119, for conversion formulas). Take <emph>t</emph> tests as an example. As indicated in Table 1, both <emph>r</emph> and Cohen's <emph>d</emph> can be used as effect size measures. Although many textbooks on statistical procedures rigidly recommend Cohen's <emph>d</emph> for <emph>t</emph> tests, [<reflink idref="bib13" id="ref53">13</reflink>] textbook is one interesting exception, in which he writes that "I'm going to stick with the effect size <emph>r</emph> because it's widely understood, frequently used, and yes, I'll admit it, I actually like it!" (p. 332).</p> <p>Graph: Table 2. An Authentic Example: Spoken Proficiency in English of People With English Learning Experience.</p> <p></p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="center"&gt;Area&lt;/th&gt;&lt;th align="center"&gt;&lt;italic&gt;SD&lt;/italic&gt;&lt;/th&gt;&lt;th align="center"&gt;&lt;italic&gt;M&lt;/italic&gt;&lt;/th&gt;&lt;th align="center"&gt;&lt;italic&gt;M&lt;/italic&gt; difference&lt;/th&gt;&lt;th align="center"&gt;&lt;italic&gt;t&lt;/italic&gt;&lt;/th&gt;&lt;th align="center"&gt;&lt;italic&gt;p&lt;/italic&gt;&lt;/th&gt;&lt;th align="center"&gt;&lt;italic&gt;r&lt;/italic&gt; (effect size)&lt;/th&gt;&lt;th align="center"&gt;Cohen's &lt;italic&gt;d&lt;/italic&gt; (effect size)&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Beijing (&lt;italic&gt;n&lt;/italic&gt;&lt;sub&gt;1&lt;/sub&gt; = 376)&lt;/td&gt;&lt;td&gt;0.868&lt;/td&gt;&lt;td&gt;2.194&lt;/td&gt;&lt;td&gt;0.269&lt;/td&gt;&lt;td&gt;5.981&lt;/td&gt;&lt;td&gt;.000&lt;/td&gt;&lt;td&gt;.295&lt;/td&gt;&lt;td&gt;0.270&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Shanghai (&lt;italic&gt;n&lt;/italic&gt;&lt;sub&gt;2&lt;/sub&gt; = 357)&lt;/td&gt;&lt;td&gt;0.839&lt;/td&gt;&lt;td&gt;2.132&lt;/td&gt;&lt;td&gt;0.205&lt;/td&gt;&lt;td&gt;4.623&lt;/td&gt;&lt;td&gt;.000&lt;/td&gt;&lt;td&gt;.238&lt;/td&gt;&lt;td&gt;0.223&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Tianjin (&lt;italic&gt;n&lt;/italic&gt;&lt;sub&gt;3&lt;/sub&gt; = 95)&lt;/td&gt;&lt;td&gt;0.896&lt;/td&gt;&lt;td&gt;2.547&lt;/td&gt;&lt;td&gt;0.621&lt;/td&gt;&lt;td&gt;6.753&lt;/td&gt;&lt;td&gt;.000&lt;/td&gt;&lt;td&gt;.572&lt;/td&gt;&lt;td&gt;0.672&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Guangzhou (&lt;italic&gt;n&lt;/italic&gt;&lt;sub&gt;4&lt;/sub&gt; = 307)&lt;/td&gt;&lt;td&gt;0.794&lt;/td&gt;&lt;td&gt;2.010&lt;/td&gt;&lt;td&gt;0.083&lt;/td&gt;&lt;td&gt;1.842&lt;/td&gt;&lt;td&gt;.066&lt;/td&gt;&lt;td&gt;.105&lt;/td&gt;&lt;td&gt;0.089&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Shenzhen (&lt;italic&gt;n&lt;/italic&gt;&lt;sub&gt;5&lt;/sub&gt; = 104)&lt;/td&gt;&lt;td&gt;0.747&lt;/td&gt;&lt;td&gt;2.183&lt;/td&gt;&lt;td&gt;0.256&lt;/td&gt;&lt;td&gt;3.499&lt;/td&gt;&lt;td&gt;.001&lt;/td&gt;&lt;td&gt;.326&lt;/td&gt;&lt;td&gt;0.297&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Chongqing (&lt;italic&gt;n&lt;/italic&gt;&lt;sub&gt;6&lt;/sub&gt; = 248)&lt;/td&gt;&lt;td&gt;0.567&lt;/td&gt;&lt;td&gt;1.956&lt;/td&gt;&lt;td&gt;0.029&lt;/td&gt;&lt;td&gt;0.704&lt;/td&gt;&lt;td&gt;.482&lt;/td&gt;&lt;td&gt;.045&lt;/td&gt;&lt;td&gt;0.030&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Dalian (&lt;italic&gt;n&lt;/italic&gt;&lt;sub&gt;7&lt;/sub&gt; = 159)&lt;/td&gt;&lt;td&gt;0.779&lt;/td&gt;&lt;td&gt;2.346&lt;/td&gt;&lt;td&gt;0.420&lt;/td&gt;&lt;td&gt;6.789&lt;/td&gt;&lt;td&gt;.000&lt;/td&gt;&lt;td&gt;.475&lt;/td&gt;&lt;td&gt;0.454&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <ulist> <item>2 <emph>Source.</emph> Adapted from [<reflink idref="bib50" id="ref54">50</reflink>], p. 182).</item> <item>3 <emph>Note.</emph> In each of the <emph>t</emph> tests, the degree of freedom (<emph>df</emph>) equals to the sample size concerned minus one. A five-point Likert-type scale was used for self-rated reading proficiency, with 5 = <emph>able to act as interpreters on formal occasions</emph>, 4 = <emph>able to converse quite fluently</emph>, 3 = <emph>able to conduct daily conversations</emph>, 2 = <emph>able to say some greetings</emph>, and 1 = <emph>able to utter a few words</emph>. The national average was 1.928 (<emph>SD</emph> = 0.922) based on 55,737 valid responses.</item> </ulist> <p>The corresponding research question for Table 2 asks, with regard to English spoken proficiency, was there a significant difference between the national average and the city average for each of the seven selected cities? These authors answer the question with results (see Table 2) from a series of one-sample <emph>t</emph> tests.</p> <p>Two important observations can be made regarding Table 2. First, if one relies on the raw mean differences (viz. the city mean minus the national mean) for Beijing (0.269) and Shenzhen (0.256), one might reach a conclusion that Beijing performed better than Shenzhen with the national average being a baseline. But an entirely opposite conclusion that Shenzhen performed better than Beijing is true because the effect size for the former (0.326) was higher than that for the latter (0.295). In this example, effect size, rather than raw mean difference, is the appropriate measure reflecting the magnitude of the real difference. In other words, failure to use effect size and reliance upon unstandardized measures (e.g., raw mean difference) can lead to a completely opposite conclusion. Second, many researchers with traditional training tend to erroneously believe that "the smaller the <emph>p</emph> value, the larger the effect" ([<reflink idref="bib52" id="ref55">52</reflink>], p. 68), and consequently many might conclude that Shanghai, Dalian, and Tianjin performed equally well because of the same level (0.000) of their <emph>p</emph> values. However, the true scenario revealed by the effect sizes is that Shanghai (0.238) scored higher than the national average, Dalian (0.475) was better than Shanghai, and Tianjin (0.572) was even better than Dalian; put differently, the actual mean difference (reflected by effect size rather than raw mean difference) between Tianjin and the nation was largest, whereas that between Shanghai and the nation was smallest. All in all, failure to report effect size along with the <emph>p</emph> value masks a lot of (more) important information in the results of an inferential statistical procedure.</p> <p>It is noteworthy that [<reflink idref="bib50" id="ref56">50</reflink>] explain why they do not attempt to interpret the <emph>r</emph> values after reporting these effect sizes. Their main justification is that simply drawing upon the frequently used [<reflink idref="bib8" id="ref57">8</reflink>] guidelines is not instructive and there have been no similar studies reporting relevant effect sizes for their analysis to compare and contrast. Considering the random sampling approach and the large sample size used in their study, [<reflink idref="bib50" id="ref58">50</reflink>] suggest future studies of similar topics use their effect size values as a baseline for cross-study comparisons. These practices resonate with the earlier suggestion that referring to effect size values from relevant previous studies is a better practice for interpreting effect size, compared with using [<reflink idref="bib8" id="ref59">8</reflink>] benchmarks.</p> <hd id="AN0137323678-6">Literature Review</hd> <p>Many surveys of effect size reporting (and to a less extent, interpreting) practices have been conducted in such fields as psychology, education, and communication. For example, in gifted education research, with all the 723 papers from six full volumes of three selected journals, [<reflink idref="bib38" id="ref60">38</reflink>], p. 69) report that "28.9% of the quantitative research blocks contained effect size estimates"; the so-called "quantitative research blocks" include three subgroups: descriptives, univariate blocks, and multivariate blocks; the effect size reporting rates for the latter two blocks were 17.9% and 52.2%, respectively. To these authors, there is no need for papers utilizing only descriptive statistics to report effects sizes.</p> <p>More recently, in the fields of education and psychology, [<reflink idref="bib45" id="ref61">45</reflink>] survey of 1,243 articles published in 14 journals from three full volumes (2005-2007) reveals an effect size reporting rate of 49%. In the field of communication, after examining four full volumes (2003-2006) of four influential journals, [<reflink idref="bib44" id="ref62">44</reflink>] find a relatively high effect size reporting rate (about 75%) in their 224 sampled papers. One major limitation with [<reflink idref="bib44" id="ref63">44</reflink>] study is that their coding method tends to overestimate the effect size reporting rate. If, in one particular article, two or more focal statistical procedures (say, <emph>t</emph> test and ANOVA) are used but only one procedure (say, <emph>t</emph> test) has an effect size reported, [<reflink idref="bib44" id="ref64">44</reflink>], p. 333) give "benefit of the doubt" to that article by coding this article as one that reports effect size. Another coding standard, which is more stringent than [<reflink idref="bib44" id="ref65">44</reflink>], is that one article using two or more focal statistical procedures has to report effect sizes for all the procedures to qualify as one paper that reports effect size in the coding process. This more stringent standard is likely to yield a lower effect size reporting rate, compared with the situation when [<reflink idref="bib44" id="ref66">44</reflink>] standard is adopted.</p> <p>However, in the field of applied linguistics, no studies focus upon effect size reporting practices. Although [<reflink idref="bib39" id="ref67">39</reflink>] finds an effect size reporting rate of 25% by examining 606 articles from <emph>Language Learning</emph> and <emph>Studies in Second Language Acquisition</emph>, which mandate effect size reporting in their submission guidelines, his study does not focus on effect size reporting, but rather on a wider range of features reflecting the study quality (e.g., designs, statistical analyses, reporting practices, and outcomes). Understandably, no sufficient coding details are provided regarding papers with two or more statistical procedures, although in [<reflink idref="bib39" id="ref68">39</reflink>], p. 669) sample, up to 60% of the articles use multiple statistical techniques. Another study that sheds light upon effect size reporting is [<reflink idref="bib29" id="ref69">29</reflink>], who finds 49% of the 96 (quasi)experimental studies reported in 90 articles in 19 volumes (1997-2015) of <emph>Language Teaching Research</emph> report effect sizes. Again, it is unclear how a particular paper/study with two or more statistical procedures is dealt with. Based on the single effect size reporting rate from [<reflink idref="bib39" id="ref70">39</reflink>] and [<reflink idref="bib29" id="ref71">29</reflink>], it seems that only one standard for coding papers with multiple statistical procedures is used, although neither study clarifies whether a relatively loose standard such as [<reflink idref="bib44" id="ref72">44</reflink>] or a more stringent one (e.g., the above-proposed "two-standards" approach) is adopted.</p> <p>To date, no studies concerning effect size reporting in applied linguistic research endeavor to make explicit the coding standard regarding papers with multiple statistical procedures, let alone adopt two standards to arrive at a more comprehensive picture of the reporting practices. Furthermore, no studies have surveyed papers from journals that do not mandate effect size reporting, as all the journals covered in [<reflink idref="bib39" id="ref73">39</reflink>] and [<reflink idref="bib29" id="ref74">29</reflink>] have such a mandate.</p> <hd id="AN0137323678-7">The Study</hd> <p></p> <hd id="AN0137323678-8">Sampling</hd> <p>To contribute to the current understanding of effect size reporting in the field, six full volumes (2011-2016) of <emph>System</emph> were examined. A span of 6 years was decided upon for three reasons. First, publications over a span of 6 years should be sufficient for studies of an exploratory nature such as the present one to show whether any systematic change in the effect size reporting practices took place across the time (see Research Question 2). Second, similar (and often shorter) time spans were adopted in studies of effect size reporting from other social science areas such as communication (e.g., 4 years, see [<reflink idref="bib44" id="ref75">44</reflink>]), education (e.g., 2 years, see [<reflink idref="bib1" id="ref76">1</reflink>]), and psychology (e.g., 2 years, see [<reflink idref="bib9" id="ref77">9</reflink>]). Third, the number of articles used for coding (see below) in this study was comparable with similar research in other fields (e.g., [<reflink idref="bib44" id="ref78">44</reflink>]), which should be manageable in studies of an exploratory nature.</p> <p>The sampling frame for this study was based on all of the 414 full-length research articles from the six selected volumes of <emph>System</emph>. Our first two rounds of initial review, as per the first six questions on a checklist developed for the present study (see the appendix), led to the identification of a total of 217 articles that are supposed to report effect sizes, which formed the core dataset of this study. Specifically speaking, our first initial review identified 17 nonempirical research articles, which were irrelevant to the development of the core dataset. Empirical research articles refer to those that are data-based, characterized by systematic collection and analysis of data (cf., [<reflink idref="bib14" id="ref79">14</reflink>]). Our second review identified 96 empirical articles utilizing only qualitative data and another 83 that use purely descriptive statistics (e.g., frequencies and percentages; cf., [<reflink idref="bib9" id="ref80">9</reflink>]; [<reflink idref="bib31" id="ref81">31</reflink>]). After these articles were excluded, the remaining articles totaled 218 in which there is one meta-analysis. Following [<reflink idref="bib20" id="ref82">20</reflink>], p. 117) rationale that "reporting conventions for meta-analytic reviews are remarkably different from those for individual (primary) studies," we removed this meta-analysis paper in the development of the core dataset.</p> <p>Finally, the remaining 217 articles formed the core dataset, each of which should have effect size(s) reported. This total number was used as the denominator to generate the overall effect size reporting rates for Research Question 1.</p> <hd id="AN0137323678-9">Coding</hd> <p>The unit of analysis was each article. Each article in the core dataset was coded in terms of the research topic, publication year, nature of paper (empirical or not), types of statistical procedures, practices of effect size reporting, types of effect size measures, and authors' awareness of effect size (see the appendix). Two coding standards are used for the situations where two or more of the focal statistical procedures are used in one single paper, so as to achieve a more comprehensive picture of effect size reporting in applied linguistics and facilitate comparisons with findings from other fields: one is [<reflink idref="bib44" id="ref83">44</reflink>] standard, which tends to give "benefit of the doubt" and hence is relatively loose, and the other is the more stringent one proposed in "Effect Size" section.</p> <p>The use of these two standards might introduce an element of subjectivity, although most of the coded variables are dichotomous and involve little subjective judgment (e.g., reported vs. not reported). To ensure consistent application of the checklist, the first and second authors independently coded a common set of articles (20.3% of the core dataset, totaling 44). The intercoder agreement rate was 93.1% (with an acceptable agreement rate ranging between 85% and 90%; cf., [<reflink idref="bib32" id="ref84">32</reflink>]), and the points of disagreement were resolved through collegial discussion. Once consistency was established, the second author continued to record information pertaining to the other articles.</p> <hd id="AN0137323678-10">Data Analysis</hd> <p>After data were coded, both descriptive and inferential statistics were generated with the statistical package SPSS 21.0. For Research Question 1 regarding the extent of effect size reporting practices, only descriptive statistics in the forms of percentage and frequency were generated. To answer Research Question 2 concerning whether the effect size reporting practices change over time, a series of chi-square tests were performed, using Cramer's <emph>V</emph> as an effect size. For Research Question 3 concerning the types of effect size typically reported, frequencies and percentages were used, supplemented with qualitatively enumerated examples.</p> <hd id="AN0137323678-11">Findings and Discussion</hd> <p></p> <hd id="AN0137323678-12">Research Question 1: To What Extent Are Measures of Effect Size Reported?</hd> <p>As Table 3 shows, overall, 73.27% of the sampled papers that should have effect size reported do report effect size(s), when [<reflink idref="bib44" id="ref85">44</reflink>] standard is adopted for situations involving papers with two or more statistical procedures. This effect size reporting rate diminishes to 52.07% when a standard more stringent than [<reflink idref="bib44" id="ref86">44</reflink>] is adopted. It is unfortunate that effect sizes, the importance of which is no less than that of the <emph>p</emph> value, get reported in only half of the papers that are supposed to report effect sizes. These findings are indicative of an alarming tip of the iceberg, namely, the troubling situation of a lack of effect size reporting in applied linguistics research.</p> <p>Graph: Table 3. No. of Articles Reporting Effect Size in Selected Years.</p> <p></p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" rowspan="2"&gt;Year&lt;/th&gt;&lt;th align="center" rowspan="2"&gt;Total no. of articles that need to report effect size&lt;/th&gt;&lt;th align="center" colspan="2"&gt;No. of articles reporting effect size (%)&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="center"&gt;By &lt;xref ref-type="bibr" rid="bibr44"&gt;Sun and Fan's (2010)&lt;/xref&gt; standard&lt;/th&gt;&lt;th align="center"&gt;By a more stringent standard than &lt;xref ref-type="bibr" rid="bibr44"&gt;Sun and Fan's (2010)&lt;/xref&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;2011&lt;/td&gt;&lt;td&gt;19&lt;/td&gt;&lt;td&gt;13 (68.42)&lt;/td&gt;&lt;td&gt;11 (57.89)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2012&lt;/td&gt;&lt;td&gt;27&lt;/td&gt;&lt;td&gt;21 (77.78)&lt;/td&gt;&lt;td&gt;16 (59.26)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2013&lt;/td&gt;&lt;td&gt;42&lt;/td&gt;&lt;td&gt;28 (66.67)&lt;/td&gt;&lt;td&gt;17 (40.48)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2014&lt;/td&gt;&lt;td&gt;45&lt;/td&gt;&lt;td&gt;32 (71.11)&lt;/td&gt;&lt;td&gt;19 (42.22)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2015&lt;/td&gt;&lt;td&gt;41&lt;/td&gt;&lt;td&gt;34 (82.92)&lt;/td&gt;&lt;td&gt;29 (70.73)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2016&lt;/td&gt;&lt;td&gt;43&lt;/td&gt;&lt;td&gt;31 (72.09)&lt;/td&gt;&lt;td&gt;21 (48.83)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Total&lt;/td&gt;&lt;td&gt;217&lt;/td&gt;&lt;td&gt;159 (73.27)&lt;/td&gt;&lt;td&gt;113 (52.07)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>These remarks may seem overly critical toward the field of applied linguistics. To be fair, we need to situate the discussion in a broader context by reiterating that the underreporting of effect sizes has also been observed in other fields. One comparable study is a survey of 256 papers from the <emph>Journal of Counseling &amp; Development</emph> over 11 years, where [<reflink idref="bib5" id="ref87">5</reflink>] find an effect size reporting rate of less than 50% among the papers that conduct statistical significance tests and, hence, need to report effect sizes.</p> <hd id="AN0137323678-13">Research Question 2: Do the Effect Size Reporting Practices Vary Across the Years?</hd> <p>A chi-square test, χ<sups>2</sups>(<reflink idref="bib5" id="ref88">5</reflink>) = 3.533, <emph>p</emph> =.618, based on the effect size reporting frequencies according to [<reflink idref="bib44" id="ref89">44</reflink>] standard, revealed a small-to-medium level of association (Cramer's <emph>V</emph> = 0.128) between the reporting rates and the publication year. Another chi-square test, χ<sups>2</sups>(<reflink idref="bib5" id="ref90">5</reflink>) = 10.730, <emph>p</emph> =.057, based on the effect size reporting frequencies according to the more stringent standard, also revealed a small-to-medium level of association (Cramer's <emph>V</emph> = 0.222) between these two variables. Although the corresponding <emph>p</emph> values (e.g.,.057) were higher than the conventional statistical significance level (<emph>p</emph> =.05), this does not diminish the importance of the results reflected by effect size. In the words of authorities on statistics, "surely, God loves the 0.06 nearly as much as the 0.05" ([<reflink idref="bib43" id="ref91">43</reflink>], as cited in [<reflink idref="bib10" id="ref92">10</reflink>], p. 49). The <emph>p</emph> value will become small enough in a future replication study based on a large enough sample.</p> <p>The effect size values reported above are higher than those reported in previous studies. The counterpart Cramer's <emph>V</emph> in [<reflink idref="bib45" id="ref93">45</reflink>] was only 0.07, although their corresponding <emph>p</emph> value was small, <emph>p</emph> =.06, χ<sups>2</sups>(<reflink idref="bib2" id="ref94">2</reflink>) = 5.66, probably because of their large enough sample size. This suggests that the variable of interest (effect size reported or not) was associated with the publication year at a negligible-to-small level according to Cohen's benchmarks, depending on the discipline.</p> <p>All in all, the answer to Research Question 2 is that effect size reporting practices do vary across time. The strength of association lies between [<reflink idref="bib8" id="ref95">8</reflink>] small and medium benchmarks.</p> <hd id="AN0137323678-14">Research Question 3: For Each of the Five Focal Statistical Methods, What Is the Effect Size...</hd> <p>For papers that used correlation analysis, 94.29% of them (see Table 4) reported an effect size measure. This extremely high reporting rate can be attributed to the fact that the test statistic (i.e., correlation coefficient) in itself is effect size ([<reflink idref="bib45" id="ref96">45</reflink>]). Similarly high effect size reporting rates for correlation analysis can be found in other fields. For instance, in the field of communication, [<reflink idref="bib44" id="ref97">44</reflink>], p. 334) note that "nearly 100% of studies" that used Pearson correlation reported effect size measures, whereas the corresponding rate in [<reflink idref="bib1" id="ref98">1</reflink>] reached 100% in the field of education. In this study, the effect size measures typically used were correlation coefficients such as Pearson's <emph>r</emph> and Spearman's rho.</p> <p>Graph: Table 4. No. of Articles Using the Focal Statistical Procedures and Reporting Effect Sizes.</p> <p></p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="center"&gt;Procedure&lt;/th&gt;&lt;th align="center"&gt;No. of articles in which the procedure is used&lt;/th&gt;&lt;th align="center"&gt;No. of articles in which effect size is reported&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;italic&gt;t&lt;/italic&gt; test&lt;/td&gt;&lt;td&gt;67&lt;/td&gt;&lt;td&gt;23 (34.32%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Analysis of variance&lt;/td&gt;&lt;td&gt;78&lt;/td&gt;&lt;td&gt;50 (64.10%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Chi-square&lt;/td&gt;&lt;td&gt;23&lt;/td&gt;&lt;td&gt;7 (30.43%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Correlation&lt;/td&gt;&lt;td&gt;70&lt;/td&gt;&lt;td&gt;66 (94.29%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Regression&lt;/td&gt;&lt;td&gt;25&lt;/td&gt;&lt;td&gt;21 (84.00%)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>For the papers that used regression analysis, about 84.00% (refer to Table 4) reported effect sizes. High effect size reporting rates for regression analysis can be found in other fields such as communication (nearly 100%, see [<reflink idref="bib44" id="ref99">44</reflink>]) and education (100%, see [<reflink idref="bib1" id="ref100">1</reflink>]). The effect size measure mostly used was adjusted <emph>R</emph><sups>2</sups>, which is consistent with observations from other fields (e.g., [<reflink idref="bib1" id="ref101">1</reflink>]; [<reflink idref="bib44" id="ref102">44</reflink>]). The underlying reason might be that, as noted by [<reflink idref="bib21" id="ref103">21</reflink>], adjusted <emph>R</emph><sups>2</sups> is readily available in the statistical output for regression analyses generated by popular statistics packages such as SPSS.</p> <p>More than 60% (64.10%, see Table 4) of the papers using ANOVA reported effect size measures. This was highly similar to its counterparts, namely, 56.5% and 57%, respectively, from [<reflink idref="bib44" id="ref104">44</reflink>] and [<reflink idref="bib1" id="ref105">1</reflink>]. The reporting rate for ANOVA was lower than that for regression, partly because effect sizes for ANOVA are not as readily available as those for regression in statistics packages. Take SPSS as an example. In SPSS, ANOVA can be realized through three ways. The most common way is to initiate the test by clicking "Compare Means → One-way ANOVA," but an effect size measure for ANOVA, eta-squared, cannot be generated in the output this way, misleading many researchers into believing that SPSS does not provide eta-squared for ANOVA ([<reflink idref="bib52" id="ref106">52</reflink>]). However, this effect size can be generated in the two less commonly used ways in SPSS (cf., [<reflink idref="bib42" id="ref107">42</reflink>]).[<reflink idref="bib6" id="ref108">6</reflink>] Therefore, when effect sizes were reported, (partial) eta-squared[<reflink idref="bib7" id="ref109">7</reflink>] was unsurprisingly most reported, which is consistent with the earlier findings (e.g., [<reflink idref="bib44" id="ref110">44</reflink>]). In light of the observation that "researchers arbitrarily selected one of these two" (i.e., eta-squared and partial eta-squared) from the field of communication ([<reflink idref="bib44" id="ref111">44</reflink>], p. 338) and a most recent discussion of the misuses of (partial) eta-squared in the field of L2 research ([<reflink idref="bib35" id="ref112">35</reflink>]), future research needs to investigate whether these effect sizes have been correctly used when being reported.</p> <p>Twenty-three (34.32%) of a total of 67 articles that used <emph>t</emph> tests reported effect sizes. Similarly, moderate reporting rates are found in other fields. In [<reflink idref="bib1" id="ref113">1</reflink>] sampled papers from five educational journals that do not mandate effect size reporting, the corresponding rate was 31%. In [<reflink idref="bib44" id="ref114">44</reflink>] sampled papers from two communication journals that do not require reporting effect size for statistically significant results, the corresponding rate was 25%. These lower reporting rates for <emph>t</emph> test, compared with those for ANOVA, should be understandable considering that SPSS does not provide effect size measures for various <emph>t</emph> tests (independent samples, one sample, or paired sample); these measures will have to be calculated by hand or by inputting relevant values (e.g., the values of <emph>t</emph> and degree of freedom) onto some webpages (cf., [<reflink idref="bib10" id="ref115">10</reflink>]). Most papers in our sample used Cohen's <emph>d</emph>, whereas only six reported <emph>r</emph> as an effect size measure for <emph>t</emph> tests and another two used eta-squared. One of the above-mentioned six papers gives the following justification for choosing <emph>r</emph> rather than <emph>d</emph>: "Two commonly used effect sizes of <emph>t</emph>-tests are Cohen's <emph>d</emph> and a point-biserial correlation coefficient (i.e., <emph>r</emph>), and this study adopted the latter as <emph>r</emph> ranges from 0 (no effect) to 1 (a perfect effect)" ([<reflink idref="bib23" id="ref116">23</reflink>], p. 176). This practice echoes our suggestion in "Effect Size" section that an effect size index from either the <emph>r</emph> or the <emph>d</emph> family can be used, although some textbooks only recommend using Cohen's <emph>d</emph> for <emph>t</emph> tests.</p> <p>Seven (30.43%) of the 23 articles that used chi-square tests reported effect sizes. Similarly, low reporting rates are in evidence elsewhere. In [<reflink idref="bib1" id="ref117">1</reflink>] sampled papers from five educational journals that do not require effect size reporting, the corresponding rate was 17%. In [<reflink idref="bib44" id="ref118">44</reflink>] sampled papers from two communication journals without effect size reporting requirements, none of the five papers that used chi-square tests reported effect size; to account for this, the authors speculate that "it is likely that neither Cramer's <emph>V</emph> nor <emph>φ</emph> is well known to communication researchers" ([<reflink idref="bib44" id="ref119">44</reflink>], p. 338). In our sample, most papers correctly used Cramer's <emph>V</emph>, with only two using odds ratios.</p> <hd id="AN0137323678-15">Conclusion</hd> <p>This study has examined the effect size reporting practices in one major applied linguistics journal. The effect size reporting practices seem to have improved in the past few years, while the identified reporting rate of about 50% is inadequate. Although such improvement is encouraging, evidence from other disciplines suggests that such advances of effect size reporting can be lost without continued vigilance ([<reflink idref="bib30" id="ref120">30</reflink>]). Therefore, journal editors, researchers, and researcher trainers need to (continue to) encourage and/or implement good reporting practices (e.g., reporting effect sizes along with the exact <emph>p</emph> values).</p> <p>Although this exploratory study is innovative in terms of the "two-standards" approach for coding and the target journal selection, it has three major limitations. First, it would have benefited from a larger sample size. The above findings and conclusions are tentative, which require verification and/or falsification in future research. In terms of generalizability, the results may not be representative of the use of effect sizes in applied linguistics in general, as this study has only focused on one journal in the field. Second, the findings here provide limited information about effect size reporting practices for statistical procedures (such as factor analysis and structural equation modeling) other than the five focal ones. Third, the present study has provided evidence of frequency of application of effect sizes in the focal journal, but does not indicate whether these effect sizes have been correctly applied or not (see [<reflink idref="bib35" id="ref121">35</reflink>], for a review of the misuses of [partial] eta-squared in L2 research).</p> <p>To contribute to the on-going methodological reform in applied linguistics ([<reflink idref="bib27" id="ref122">27</reflink>]), more studies on effect size reporting are needed. Future studies will stand to gain by expanding the sample size and/or comparing the reporting practices across journal types (journals with vs. without a requirement for effect size reporting). It is also useful to examine whether effect sizes are reported more frequently for statistically significant results than their nonsignificant counterparts, as [<reflink idref="bib39" id="ref123">39</reflink>] notices that some authors tend to report effect sizes solely for statistically significant results, although such information was "not coded for throughout the entire sample" in his study. Furthermore, future studies of effect size reporting need to incorporate an element of effect size interpretation in a more systematic way, as the reporting of effect sizes should not be treated "as an end in itself" ([<reflink idref="bib27" id="ref124">27</reflink>], p. 135). It is useful to know how effect sizes are interpreted after being reported.</p> <p>[<reflink idref="bib10" id="ref125">10</reflink>] predicts that "If history is anything to go by, statistical reforms adopted in psychology will eventually spread to other social science disciplines" (p. xiv). Recently, the editors of <emph>Basic and Applied Social Psychology</emph> ([<reflink idref="bib46" id="ref126">46</reflink>]) have issued a journal-wide ban on NHST. This ban represents a natural progression of the long-standing critiques[<reflink idref="bib8" id="ref127">8</reflink>] of NHST and a strong call for the use of more robust statistics (e.g., effect size) in our reporting practices. It is our firm belief that applied linguistics will soon be one of these disciplines in Ellis's prediction.</p> <p>Graph</p> <p></p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="center"&gt;No.&lt;/th&gt;&lt;th align="center"&gt;Item&lt;/th&gt;&lt;th align="center"&gt;Note/check&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Title&lt;/td&gt;&lt;td&gt;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;Year&lt;/td&gt;&lt;td&gt;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;Issue number&lt;/td&gt;&lt;td&gt;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;Is this paper empirical?&lt;/td&gt;&lt;td&gt;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;Which type of empirical research was adopted (quantitative, qualitative, or mixed methods)?&lt;/td&gt;&lt;td&gt;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;Are the statistics purely descriptive?&lt;/td&gt;&lt;td&gt;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="3"&gt;&lt;italic&gt;Whether or not each of the following is used as a major statistical procedure?&lt;/italic&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7.1&lt;/td&gt;&lt;td&gt;&lt;italic&gt;t&lt;/italic&gt; test&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7.2&lt;/td&gt;&lt;td&gt;Analysis of variance&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7.3&lt;/td&gt;&lt;td&gt;Chi-square&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7.4&lt;/td&gt;&lt;td&gt;Correlation&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7.5&lt;/td&gt;&lt;td&gt;Regression&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7.6&lt;/td&gt;&lt;td&gt;The other procedures&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="3"&gt;&lt;italic&gt;For each of the major statistical procedures used&lt;/italic&gt;:&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8.1&lt;/td&gt;&lt;td&gt;Is effect size reported?&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8.2&lt;/td&gt;&lt;td&gt;Is effect size reported with clear awareness on the part of the author(s)?&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8.3&lt;/td&gt;&lt;td&gt;What effect size measure(s) is(are) used?&lt;/td&gt;&lt;td&gt;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&amp;#95;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8.4&lt;/td&gt;&lt;td&gt;Is effect size interpreted?&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;9&lt;/td&gt;&lt;td&gt;Overall, does this paper report effect sizes according to &lt;xref ref-type="bibr" rid="bibr44"&gt;Sun and Fan's (2010)&lt;/xref&gt; standard?&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;10&lt;/td&gt;&lt;td&gt;Overall, does this paper report effect sizes according to a more stringent standard?&lt;/td&gt;&lt;td&gt;&amp;#9633;Yes&amp;#9633;No&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>4 <emph>Note.</emph> This checklist is adapted from [<reflink idref="bib1" id="ref128">1</reflink>] and [<reflink idref="bib45" id="ref129">45</reflink>]. A major statistical procedure is defined as a method that is used to directly address at least one research question in the article concerned.</p> <p>The authors would like to extend their sincere thanks to the anonymous reviewers and the article editor for their constructive comments on an earlier version of this article. All the remaining inadequacies are the authors' responsibility.</p> <hd id="AN0137323678-16">Author Biographies</hd> <p>Rining Wei (Tony), PhD, teaches courses related to bilingualism and research methods at undergraduate and postgraduate levels, at the Department of English, Xi'an Jiaotong-Liverpool University. He has supervised master's and doctoral dissertation projects concerning bilingual education, TESOL, and language policy. He has published in journals including English Today, and World Englishes. He serves on the editorial board of the TESOL International Journal. Yuhang Hu (Sophie) is a master student at the Department of Linguistics with a concentration in Applied Linguistics, Georgetown University. Her areas of research include (socio-)psychological variables in bilingualism and quantitative methodology. She will commence her PhD study in Applied Linguistics at Northern Arizona University this Fall.Jianhui Xiong, PhD, conducts research concerning educational policy and comparative education at the National Center for Education Development Research, Ministry of Education of the People's Republic of China. His recent research interests include internationalization of education and the use of big data in education.</p> <ref id="AN0137323678-17"> <title> References </title> <blist> <bibl id="bib1" idref="ref33" type="bt">1</bibl> <bibtext> Alhija F., Levy A. (2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69, 245-265.</bibtext> </blist> <blist> <bibl id="bib2" idref="ref5" type="bt">2</bibl> <bibtext> American Educational Research Association. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35(6), 33-40.</bibtext> </blist> <blist> <bibl id="bib3" idref="ref2" type="bt">3</bibl> <bibtext> American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author.</bibtext> </blist> <blist> <bibl id="bib4" idref="ref3" type="bt">4</bibl> <bibtext> American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.</bibtext> </blist> <blist> <bibl id="bib5" idref="ref41" type="bt">5</bibl> <bibtext> Bangert A. W., Baumberger J. P. (2005). Research and statistical techniques used in the Journal of Counseling &amp; Development: 1990-2001. Journal of Counseling &amp; Development, 83, 480-487.</bibtext> </blist> <blist> <bibl id="bib6" idref="ref26" type="bt">6</bibl> <bibtext> Benson P., Chik A., Gao X., Huang J., Wang W. (2009). Qualitative research in language teaching and learning journals, 1997–2006. The Modern Language Journal, 93, 79-90.</bibtext> </blist> <blist> <bibl id="bib7" idref="ref13" type="bt">7</bibl> <bibtext> Biskin B. H. (1998). Comment on significance testing. Measurement and Evaluation in Counseling and Development, 31, 58-62.</bibtext> </blist> <blist> <bibl id="bib8" idref="ref43" type="bt">8</bibl> <bibtext> Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.</bibtext> </blist> <blist> <bibl id="bib9" idref="ref77" type="bt">9</bibl> <bibtext> Dunleavy E. M., Barr C. D., Glenn D. M., Miller K. M. (2006). Effect size reporting in applied psychology: How are we doing. The Industrial-Organizational Psychologist, 43(4), 29-37.</bibtext> </blist> <blist> <bibtext> Ellis P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge, UK: Cambridge University Press.</bibtext> </blist> <blist> <bibtext> Fan X. (2001). Statistical significance and effect size in education research: Two sides of a coin. The Journal of Educational Research, 94, 275-282.</bibtext> </blist> <blist> <bibtext> Fan X., Konold T. R. (2010). Statistical significance versus effect size. In Peterson P., Baker E., McGaw B. (Eds.), International encyclopedia of education (Vol. 7, pp. 444-450). Oxford: Elsevier.</bibtext> </blist> <blist> <bibtext> Field A. P. (2009). Discovering statistics using SPSS: (and sex and drugs and rock "n" roll) (3rd ed.). Los Angeles, CA: Sage.</bibtext> </blist> <blist> <bibtext> Gao Y., Li L., Lü J. (2001). Trends in research methods in applied linguistics: China and the west. English for Specific Purposes, 20, 1-14.</bibtext> </blist> <blist> <bibtext> Gass S. (2009). A historical survey of SLA research. In William C. R., Tej K. B. (Eds.), The new handbook of second language acquisition (pp. 3-28). Bingley, UK: Emerald.</bibtext> </blist> <blist> <bibtext> Hays W. L. (1963). Statistics for psychologists. New York, NY: Holt, Rinehart and Winston.</bibtext> </blist> <blist> <bibtext> Henson R. K. (2006). Effect-size measures and meta-analytic thinking in counseling psychology research. The Counseling Psychologist, 34, 601-629.</bibtext> </blist> <blist> <bibtext> Huberty C. J., Lowman L. L. (2000). Group overlap as a basis for effect size. Educational and Psychological Measurement, 60, 543-563.</bibtext> </blist> <blist> <bibtext> Jung U. O. (2004). Paris in London revisited or the foreign language teacher's top-most journals. System, 32, 357-361.</bibtext> </blist> <blist> <bibtext> Keaton S. A., Bodie G. D. (2013). The statistical and methodological acuity of scholarship appearing in the "international journal of listening" (1987-2011). International Journal of Listening, 27, 115-135.</bibtext> </blist> <blist> <bibtext> Kirk R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746-759.</bibtext> </blist> <blist> <bibtext> Kline R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavior research (1st ed.). Washington, DC: American Psychological Association.</bibtext> </blist> <blist> <bibtext> Koga T. (2010). Dynamicity of motivation, anxiety and cooperativeness in a semester course. System, 38, 172-184.</bibtext> </blist> <blist> <bibtext> Larson-Hall J. (2010). A guide to doing statistics in second language research using SPSS. New York, NY: Routledge.</bibtext> </blist> <blist> <bibtext> Larson-Hall J. (2012). How to run statistical analyses. In Mackey A., Gass S. M. (Eds.), Research methods in second language acquisition: A practical guide (pp. 245-274). Chichester, UK: Wiley-Blackwell.</bibtext> </blist> <blist> <bibtext> Larson-Hall J. (2016). A guide to doing statistics in second language research using SPSS and R. New York, NY: Routledge.</bibtext> </blist> <blist> <bibtext> Larson-Hall J., Plonsky L. (2015). Reporting and interpreting quantitative research findings: What gets reported and recommendations for the field. Language Learning, 65(S1), 127-159.</bibtext> </blist> <blist> <bibtext> Levine T. R., Hullett C. R. (2002). Eta squared, partial eta squared, and misreporting of effect size in communication research. Human Communication Research, 28, 612-625.</bibtext> </blist> <blist> <bibtext> Lindstromberg S. (2016). Inferential statistics in language teaching research: A review and ways forward. Language Teaching Research, 20, 741-768.</bibtext> </blist> <blist> <bibtext> Loewen S., Lavolette E., Spino L. A., Papi M., Schmidtke J., Sterling S., Wolff D. (2014). Statistical literacy among applied linguists and second language acquisition researchers. TESOL Quarterly, 48, 360-388.</bibtext> </blist> <blist> <bibtext> Meline T., Wang B. (2004). Effect-size reporting practices in AJSLP and other ASHA journals, 1999–2003. American Journal of Speech-Language Pathology, 13, 202-207.</bibtext> </blist> <blist> <bibtext> Miles M. B., Huberman A. M., Saldana J. (2014). Qualitative data analysis: A methods sourcebook (3rd ed.). Los Angeles, CA: Sage.</bibtext> </blist> <blist> <bibtext> Norouzian R., de Miranda M. A., Plonsky L. (2018). The Bayesian revolution in second language research: An applied approach. Language Learning, 68, 1032-1075.</bibtext> </blist> <blist> <bibtext> Norouzian R., de Miranda M. A., Plonsky L. (2019). A Bayesian approach to measuring evidence in L2 research: An empirical investigation. The Modern Language Journal, 103, 248-261.</bibtext> </blist> <blist> <bibtext> Norouzian R., Plonsky L. (2018). Eta- and partial eta-squared in L2 research: A cautionary review and guide to more appropriate usage. Second Language Research, 34, 257-271.</bibtext> </blist> <blist> <bibtext> Norris J. M., Ortega L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41, 805-815.</bibtext> </blist> <blist> <bibtext> Oswald F. L., Plonsky L. (2010). Meta-analysis in second language research: Choices and challenges. Annual Review of Applied Linguistics, 30, 85-110.</bibtext> </blist> <blist> <bibtext> Paul K. M., Plucker J. A. (2004). Two steps forward, one step back: Effect size reporting in gifted education research from 1995–2000. Roeper Review, 26, 68-72.</bibtext> </blist> <blist> <bibtext> Plonsky L. (2013). Study quality in SLA. Studies in Second Language Acquisition, 35, 655-687.</bibtext> </blist> <blist> <bibtext> Plonsky L., Gass S. (2011). Quantitative research methods, study quality, and outcomes: The case of interaction research. Language Learning, 61, 325-366.</bibtext> </blist> <blist> <bibtext> Plonsky L., Oswald F. L. (2014). How big is "big"? Interpreting effect sizes in L2 research. Language Learning, 64, 878-912.</bibtext> </blist> <blist> <bibtext> Plonsky L., Oswald F. L. (2017). Multiple regression as a flexible alternative to ANOVA in l2 research. Studies in Second Language Acquisition, 39, 579-592.</bibtext> </blist> <blist> <bibtext> Rosnow R. L., Rosenthal R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276-1284.</bibtext> </blist> <blist> <bibtext> Sun S., Fan X. (2010). Effect size reporting practices in communication research. Communication Methods and Measures, 4, 331-340.</bibtext> </blist> <blist> <bibtext> Sun S., Pan W., Wang L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102, 989-1004.</bibtext> </blist> <blist> <bibtext> Trafimow D., Marks M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1-2.</bibtext> </blist> <blist> <bibtext> Wang W., Gao X. (2008). English language education in china: A review of selected research. Journal of Multilingual and Multicultural Development, 29, 280-299.</bibtext> </blist> <blist> <bibtext> Wasserstein R. L., Lazar N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70, 129-133.</bibtext> </blist> <blist> <bibtext> Wei R., Hu Y. (2018). Exploring the relationship between multilingualism and tolerance of ambiguity: A survey study from an EFL context. Bilingualism: Language and Cognition. Advance online publication. doi:10.1017/S1366728918000998</bibtext> </blist> <blist> <bibtext> Wei R., Su J. (2015). Surveying the English language across China. World Englishes, 34, 175-189.</bibtext> </blist> <blist> <bibtext> Wilkinson L., &amp; Task Force on Statistical Inference, APA Board of Scientific Affairs. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.</bibtext> </blist> <blist> <bibtext> Zhang S. L. (2009). Xiaoying fudu: Waiyu dingliang yanjiu buneng hushi de cedu zhi [Effect size: Measures that cannot be ignored in L2 quantitative research]. Waiyu Jiaoxue Lilun Yu Shijian / Foreign Language Learning: Theory and Practice, 3, 67-70, 96.</bibtext> </blist> <blist> <bibtext> Zientek L. R., Capraro M. M., Capraro R. M. (2008). Reporting practices in quantitative teacher education research: One look at the evidence cited in the AERA panel. Educational Researcher, 37, 208-216.</bibtext> </blist> </ref> <ref id="AN0137323678-18"> <title> Footnotes </title> <blist> <bibtext> The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.</bibtext> </blist> <blist> <bibtext> The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The writing of this article was supported by the Educational Science Research Fund of Jiangsu Province (D/2018/01/18) and the Research Development Fund of Xi'an Jiaotong-Liverpool University (RDF-16-01-61).</bibtext> </blist> <blist> <bibtext> This general statement is about <emph>both</emph> sample effect size <emph>and</emph> population effect size. While some authors (e.g., [31], p. 204) state that effect size is "unaffected by sample size," it is instructive to point out that such a statement is applicable to population effect size only. As "sample effect size itself is a random variable" ([12], p. 448), sample effect size is therefore affected by sampling variability, which is inversely related to sample size. Put differently, when we wish to estimate effect size as a <emph>population parameter</emph> based on different values of sample effect size, sample size matters.</bibtext> </blist> <blist> <bibtext> In this exploratory study, analysis of variance (ANOVA) was confined to one-way ANOVA; the other variants (e.g., analysis of covariance [ANCOVA] and multivariate analysis of variance [MANOVA]) were not covered.</bibtext> </blist> <blist> <bibtext> Although, for the most part, the dozens of available effect size measures can be categorized into these two types, some indices such as odds ratio and the <emph>I</emph> index for hit rate (cf., [18]) do not fall neatly into these two categories.</bibtext> </blist> <blist> <bibtext> In SPSS, the two ways to generate an effect size for ANOVA are (a) Analyze→General Linear Model→Univariate→Choose "Dependent Variable" and "Fix Factor(s)"→ Click "Option"→Check "Estimates of effect size," and (b) Analyze→Compare Means→Means→ Choose variables into the "Dependent List" and "Independent List"→Click "Option"→Check "ANOVA table and eta." Please note that SPSS mislabeled partial eta-squared as eta-squared until version 11.0 ([26]).</bibtext> </blist> <blist> <bibtext> We tend not to distinguish between these two indices here as some papers may have misreported one for another, partly due to the mislabeling problem in SPSS (cf., [28]; [35]).</bibtext> </blist> <blist> <bibtext> As a result of the long-standing critiques of Null Hypothesis Significance Testing (NHST), most recently in the field of applied linguistics, the Bayesian hypothesis testing approach has been recommended as an alternative to NHST. See [33] for the Bayesian estimation of effect sizes, and [34] for Bayesian equivalents of <emph>p</emph> values.</bibtext> </blist> <blist> <bibtext> Yuhang Hu https://orcid.org/0000-0003-3867-1179</bibtext> </blist> </ref> <aug> <p>By Rining Wei; Yuhang Hu and Jianhui Xiong</p> <p>Reported by Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib16" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib51" firstref="ref4"></nolink> <nolink nlid="nl3" bibid="bib24" firstref="ref6"></nolink> <nolink nlid="nl4" bibid="bib36" firstref="ref7"></nolink> <nolink nlid="nl5" bibid="bib37" firstref="ref8"></nolink> <nolink nlid="nl6" bibid="bib22" firstref="ref9"></nolink> <nolink nlid="nl7" bibid="bib17" firstref="ref10"></nolink> <nolink nlid="nl8" bibid="bib44" firstref="ref12"></nolink> <nolink nlid="nl9" bibid="bib48" firstref="ref14"></nolink> <nolink nlid="nl10" bibid="bib13" firstref="ref15"></nolink> <nolink nlid="nl11" bibid="bib12" firstref="ref16"></nolink> <nolink nlid="nl12" bibid="bib11" firstref="ref18"></nolink> <nolink nlid="nl13" bibid="bib25" firstref="ref19"></nolink> <nolink nlid="nl14" bibid="bib40" firstref="ref21"></nolink> <nolink nlid="nl15" bibid="bib39" firstref="ref22"></nolink> <nolink nlid="nl16" bibid="bib29" firstref="ref23"></nolink> <nolink nlid="nl17" bibid="bib31" firstref="ref24"></nolink> <nolink nlid="nl18" bibid="bib19" firstref="ref27"></nolink> <nolink nlid="nl19" bibid="bib47" firstref="ref28"></nolink> <nolink nlid="nl20" bibid="bib15" firstref="ref30"></nolink> <nolink nlid="nl21" bibid="bib27" firstref="ref37"></nolink> <nolink nlid="nl22" bibid="bib10" firstref="ref38"></nolink> <nolink nlid="nl23" bibid="bib21" firstref="ref40"></nolink> <nolink nlid="nl24" bibid="bib41" firstref="ref47"></nolink> <nolink nlid="nl25" bibid="bib49" firstref="ref48"></nolink> <nolink nlid="nl26" bibid="bib53" firstref="ref49"></nolink> <nolink nlid="nl27" bibid="bib50" firstref="ref50"></nolink> <nolink nlid="nl28" bibid="bib52" firstref="ref55"></nolink> <nolink nlid="nl29" bibid="bib38" firstref="ref60"></nolink> <nolink nlid="nl30" bibid="bib45" firstref="ref61"></nolink> <nolink nlid="nl31" bibid="bib14" firstref="ref79"></nolink> <nolink nlid="nl32" bibid="bib20" firstref="ref82"></nolink> <nolink nlid="nl33" bibid="bib32" firstref="ref84"></nolink> <nolink nlid="nl34" bibid="bib43" firstref="ref91"></nolink> <nolink nlid="nl35" bibid="bib42" firstref="ref107"></nolink> <nolink nlid="nl36" bibid="bib35" firstref="ref112"></nolink> <nolink nlid="nl37" bibid="bib23" firstref="ref116"></nolink> <nolink nlid="nl38" bibid="bib30" firstref="ref120"></nolink> <nolink nlid="nl39" bibid="bib46" firstref="ref126"></nolink> |
|---|---|
| Header | DbId: eric DbLabel: ERIC An: EJ1221340 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Effect Size Reporting Practices in Applied Linguistics Research: A Study of One Major Journal – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Wei%2C+Rining%22">Wei, Rining</searchLink><br /><searchLink fieldCode="AR" term="%22Hu%2C+Yuhang%22">Hu, Yuhang</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-3867-1179">0000-0003-3867-1179</externalLink>)<br /><searchLink fieldCode="AR" term="%22Xiong%2C+Jianhui%22">Xiong, Jianhui</searchLink> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22SAGE+Open%22"><i>SAGE Open</i></searchLink>. Apr 2019 9(2). – Name: Avail Label: Availability Group: Avail Data: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: http://sagepub.com – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 11 – Name: DatePubCY Label: Publication Date Group: Date Data: 2019 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Information Analyses – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Applied+Linguistics%22">Applied Linguistics</searchLink><br /><searchLink fieldCode="DE" term="%22Language+Research%22">Language Research</searchLink><br /><searchLink fieldCode="DE" term="%22Periodicals%22">Periodicals</searchLink><br /><searchLink fieldCode="DE" term="%22Effect+Size%22">Effect Size</searchLink><br /><searchLink fieldCode="DE" term="%22Research+Reports%22">Research Reports</searchLink><br /><searchLink fieldCode="DE" term="%22Journal+Articles%22">Journal Articles</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1177/2158244019850035 – Name: ISSN Label: ISSN Group: ISSN Data: 2158-2440 – Name: Abstract Label: Abstract Group: Ab Data: Many surveys of effect size (ES) reporting practices have been conducted in social science fields such as psychology and education, but few such studies are available in applied linguistics. To bridge this gap and to echo the recent calls for more robust statistics from scholars in applied linguistics and beyond, this study represents the first attempt, in the field of applied linguistics, to focus upon ES reporting practices. With an innovative "two-standards" approach for coding, which overcomes the limitations with similar studies in other social science fields (e.g., communication), this study assesses the ES reporting practices over a span of 6 years in a major journal. Findings include the following: (a) the ES reporting rate is about 50% and (b) some improvement of ES reporting over time is in evidence. Future research directions (e.g., examining whether and how ES is interpreted after being reported) are suggested. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2019 – Name: AN Label: Accession Number Group: ID Data: EJ1221340 |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1221340 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1177/2158244019850035 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 11 Subjects: – SubjectFull: Applied Linguistics Type: general – SubjectFull: Language Research Type: general – SubjectFull: Periodicals Type: general – SubjectFull: Effect Size Type: general – SubjectFull: Research Reports Type: general – SubjectFull: Journal Articles Type: general Titles: – TitleFull: Effect Size Reporting Practices in Applied Linguistics Research: A Study of One Major Journal Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Wei, Rining – PersonEntity: Name: NameFull: Hu, Yuhang – PersonEntity: Name: NameFull: Xiong, Jianhui IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 04 Type: published Y: 2019 Identifiers: – Type: issn-electronic Value: 2158-2440 Numbering: – Type: volume Value: 9 – Type: issue Value: 2 Titles: – TitleFull: SAGE Open Type: main |
| ResultId | 1 |