View in EDS HTML Full Text PDF Full Text

Are Words Easier to Learn from Infant- than Adult-Directed Speech? A Quantitative Corpus-Based Investigation

Saved in:

Bibliographic Details
Title:	Are Words Easier to Learn from Infant- than Adult-Directed Speech? A Quantitative Corpus-Based Investigation
Language:	English
Authors:	Guevara-Rukoz, Adriana, Cristia, Alejandrina, Ludusan, Bogdan, Thiollière, Roland, Martin, Andrew, Mazuka, Reiko, Dupoux, Emmanuel
Source:	Cognitive Science. Jul 2018 42(5):1586-1617.
Availability:	Wiley-Blackwell. 350 Main Street, Malden, MA 02148. Tel: 800-835-6770; Tel: 781-388-8598; Fax: 781-388-8232; e-mail: cs-journals@wiley.com; Web site: http://www.wiley.com/WileyCDA
Peer Reviewed:	Y
Page Count:	32
Publication Date:	2018
Document Type:	Journal Articles Reports - Research
Descriptors:	Statistical Analysis, Phonemes, Phonology, Infants, Japanese, Language Acquisition, Acoustics, Adults, Databases, Learning Processes, Speech Communication, Vocabulary Development, Role, Interpersonal Communication, Comparative Analysis
DOI:	10.1111/cogs.12616
ISSN:	0364-0213
Abstract:	We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: The IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.
Abstractor:	As Provided
Entry Date:	2018
Accession Number:	EJ1185186
Database:	ERIC
Full text is not displayed to guests. Login for full access.

FullText	Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwGcc76FQEQbl-Vi7BzkoQ9AAAAA5zCB5AYJKoZIhvcNAQcGoIHWMIHTAgEAMIHNBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDFNN_1ygYEH3p54kywIBEICBn2llGA6FekWT822VmqFO84H6H6B83opUxthR24WvBC2fT6x0AWH4UrZelm2Jx_bEiAKmJ46xOWzKsSypDhubtUrSrIRoyQNpbEgWX1wnMq1PciNpkFtJc3ftXwj7b6BJu9hKktFLSbPhwQ-r1aYxHu41b4yV6rFfjNtyG3Q0oGItMZnjlNuekypvb7NIXGC94IgAKat9-vMNiIxhh8mhIA== Text: Availability: 1 Value: <anid>AN0130770052;cgn01jul.18;2018Jul19.12:31;v2.2.500</anid> <title id="AN0130770052-1">Are Words Easier to Learn From Infant‐ Than Adult‐Directed Speech? A Quantitative Corpus‐Based Investigation </title> <p>Abstract: We investigate whether infant‐directed speech (IDS) could facilitate word form learning when compared to adult‐directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: The IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.</p> <p>Speech perception; Psycholinguistics; Language development; Word learning; Infant‐directed speech; Hyperspeech</p> <hd id="AN0130770052-2">Introduction</hd> <p>Infants’ language acquisition proceeds at an amazing speed despite the inherent difficulties in discovering linguistic units such as phonemes and words from continuous speech. A popular view holds that part of the problem may be alleviated by the infants’ caregivers, who may simplify the learning task when they speak to their infants in a particular register called infant‐directed speech (IDS). In this paper, we compare IDS and adult‐directed speech (ADS) in terms of dimensions that are relevant to the learnability of sound categories. We first review alternative hypotheses about a possible facilitatory role of IDS.</p> <hd id="AN0130770052-3">IDS‐ADS differences in the context of learnability</hd> <p>The notion that particular speech registers may have articulatory and acoustic properties that enhance speech perception may have been first introduced by Lindblom in the context of his Hyper and Hypo‐articulation (H&amp;H) theory (1990). In the case of hyperarticulation, the resulting listener‐oriented modifications are referred to as ‘hyperspeech’. Here, the priority is to enhance differences among contrasting elements, and it runs counter the speaker‐oriented tendency to produce more economical articulatory sequences.</p> <p>Fernald ([<reflink idref="bib23" id="ref1">23</reflink>] ) proposed a more general definition of hyperspeech in the context of language acquisition. The idea is that parents may manipulate linguistic levels other than articulatory ones, such as information relating to word frequency or neighborhood density, resulting in facilitated perception:</p> <p>[T]he hyperspeech notion should not be confined to articulatory factors at the segmental level, but should be extended to a wider range of factors in speech that facilitate comprehension by the infant.</p> <p>While the hyperspeech notion initially refers to a modification of language as to enhance perception, Kuhl et al. ([<reflink idref="bib44" id="ref2">44</reflink>] ) go one step further, positing that IDS register‐specific modifications may also enhance learning:</p> <p>Our findings demonstrate that language input to infants has culturally universal characteristics designed to promote language learning.</p> <p>We call this last hypothesis the Hyper Learnability Hypothesis (HLH). It goes beyond the hyperspeech hypothesis in that it refers not to perception but to the language learning processes operating in the infant. Importantly, these two notions may not necessarily be aligned. In some instances, both hyperspeech and HLH are congruent with the usually reported properties of IDS: exaggerated prosody and articulation (Fernald et al., [<reflink idref="bib25" id="ref3">25</reflink>] ; Soderstrom, [<reflink idref="bib86" id="ref4">86</reflink>] ), shorter sentences (Fernald et al., [<reflink idref="bib25" id="ref5">25</reflink>] ; Newport, Gleitman, &amp; Gleitman, [<reflink idref="bib70" id="ref6">70</reflink>] ; Phillips, [<reflink idref="bib75" id="ref7">75</reflink>] ), simpler syntax (Newport et al., [<reflink idref="bib70" id="ref8">70</reflink>] ; Phillips, [<reflink idref="bib75" id="ref9">75</reflink>] ), and slower speech rate (Englund &amp; Behne, [<reflink idref="bib19" id="ref10">19</reflink>] ; Fernald et al., [<reflink idref="bib25" id="ref11">25</reflink>] ) (see Golinkoff, Can, Soderstrom, &amp; Hirsh‐Pasek, [<reflink idref="bib27" id="ref12">27</reflink>] ; Soderstrom, [<reflink idref="bib86" id="ref13">86</reflink>] , for more comprehensive reviews). All of these properties are plausible candidates for facilitating both language perception and language learning at the relevant linguistic levels—namely phonetic, prosodic, lexical and syntactic—by making these features more salient or more contrastive to the infant. Yet, in other instances, perception and learning may diverge. As Kuhl ([<reflink idref="bib43" id="ref14">43</reflink>] ) notes:</p> <p>Mothers addressing infants also increase the variety of exemplars they use, behaving in a way that makes mothers resemble many different talkers, a feature shown to assist category learning in second‐language learners.</p> <p>In this case, increase in variability, which is known to negatively affect speech perception in both adults and children (see Bergmann, Cristia, &amp; Dupoux, [<reflink idref="bib7" id="ref15">7</reflink>] ; Mullennix, Pisoni, &amp; Martin, [<reflink idref="bib69" id="ref16">69</reflink>] ; Ryalls &amp; Pisoni, [<reflink idref="bib80" id="ref17">80</reflink>] ) is nevertheless hypothesized to positively affect learning in infants. Work by Rost and McMurray ([<reflink idref="bib78" id="ref18">78</reflink>] ) suggests that this might be the case for 14‐month‐old infants learning novel word‐object mappings. However, it appears that not any kind of variability will do; only increased variability in certain cues—specifically those irrelevant to the contrasts of interest— promoted learning of word‐object mappings (Rost &amp; McMurray, [<reflink idref="bib79" id="ref19">79</reflink>] ). This illustrates the very important point that HLH cannot be empirically tested independently of a specific hypothesis or theory of the learning process in infants. Ideally, the hypothesis or theory should be explicit enough that it could be implemented as an algorithm, which derives numerical predictions on learning outcomes when run on speech corpora of ADS and IDS (Dupoux, [<reflink idref="bib16" id="ref20">16</reflink>] ). Unfortunately, as of today, such algorithms are not yet available for modeling early language acquisition in infants. Yet a reasonable alternative is to resort to measurements that act as a proxy for learning outcomes within a given theory.</p> <p>In the following, we focus on a component of language processing which has been particularly well studied: speech categories. For this component, a variety of theories have been proposed, which can be separated in two types: bottom‐up theories and top‐down theories. We review these two types in the following sections and discuss possible proxies for them.</p> <hd id="AN0130770052-4">Bottom‐up theories: Discriminability as a proxy</hd> <p>Bottom‐up theories propose that phonetic categories emerge from the speech signal; they are extracted by attending to certain phonetic dimensions (Jusczyk, Bertoncini, Bijeljac‐Babic, Kennedy, &amp; Mehler, [<reflink idref="bib36" id="ref21">36</reflink>] ), or by identifying category prototypes (Kuhl, [<reflink idref="bib42" id="ref22">42</reflink>] ). More explicitly, Maye, Werker, and Gerken ([<reflink idref="bib60" id="ref23">60</reflink>] ) proposed that infants construct categories by tracking statistical modes in phonetic space. This idea can be made even more computationally explicit by using unsupervised clustering algorithms, such as Gaussian mixture estimation (De Boer &amp; Kuhl, [<reflink idref="bib14" id="ref24">14</reflink>] ; Lake, Vallabha, &amp; McClelland, [<reflink idref="bib46" id="ref25">46</reflink>] ; McMurray, Aslin, &amp; Toscano, [<reflink idref="bib63" id="ref26">63</reflink>] ; Vallabha, McClelland, Pons, Werker, &amp; Amano, [<reflink idref="bib91" id="ref27">91</reflink>] ), or self‐organizing neural maps (Guenther &amp; Gjaja, [<reflink idref="bib29" id="ref28">29</reflink>] ; Kohonen, [<reflink idref="bib41" id="ref29">41</reflink>] ; Vallabha et al., [<reflink idref="bib91" id="ref30">91</reflink>] ). Given the existence of such computational algorithms, it would seem easy to test if IDS enhances learning by running them on IDS and ADS data, and then evaluating the quality of the resulting clusters.</p> <p>However, this is not so simple for two reasons. First, each of the above‐mentioned algorithms makes different assumptions about the number, granularity, and shape of phonetic categories, parameters which could potentially lead to different outcomes. Even more problematic is that this subset of algorithms does not exhaust the space of possible clustering algorithms.</p> <p>Since we do not know which of these assumptions and algorithms are those that best approximate computational mechanisms used by infants, applying these algorithms to data may not get us any closer to a definitive answer. Second, these particular algorithms have only been validated on artificially simplified data (e.g., representing categories as formant measurements extracted from hand‐segmented data) and not on a corpus of realistic speech. In fact, when similar algorithms are run on real speech, they fail to learn phonetic categories; instead, they learn smaller and more context‐dependent units (e.g., Varadarajan, Khudanpur, &amp; Dupoux, [<reflink idref="bib92" id="ref31">92</reflink>] ; see also Antetomaso et al., [<reflink idref="bib2" id="ref32">2</reflink>] ). The unsupervised discovery of phonetic units is currently an unsolved problem which gives rise to a variety of approaches (see Versteegh, Anguera, Jansen, &amp; Dupoux, [<reflink idref="bib16" id="ref33">16</reflink>] , for a review).</p> <p>Given the unavailability of effective phoneme discovery algorithms that could test the bottom‐up version of HLH, many researchers have adopted a more indirect approach using descriptive measures of phonetic category distributions as a proxy for learnability. Here, we review two such proxies: category separation and category discriminability.</p> <p>Category separation corresponds to the distance between the center of these categories in phonetic space. Kuhl et al. ([<reflink idref="bib44" id="ref34">44</reflink>] ) measured the center of the ‘point’ vowels /a/, /i/, and /u/ in formant space, in ADS and IDS, across three languages (American English, Russian, and Swedish). Results revealed that the spatial separation between the center of these vowels was increased in IDS compared to ADS. This observation has been replicated in several studies (Andruski, Kuhl, &amp; Hayashi, [<reflink idref="bib1" id="ref35">1</reflink>] ; Bernstein Ratner, [<reflink idref="bib9" id="ref36">9</reflink>] ; Burnham, Kitamura, &amp; Vollmer‐Conna, [<reflink idref="bib10" id="ref37">10</reflink>] ; Cristia &amp; Seidl, [<reflink idref="bib11" id="ref38">11</reflink>] ; Liu, Kuhl, &amp; Tsao, [<reflink idref="bib51" id="ref39">51</reflink>] ; McMurray, Kovack‐Lesh, Goodwin, &amp; McEchron, [<reflink idref="bib64" id="ref40">64</reflink>] ; Uther, Knoll, &amp; Burnham, [<reflink idref="bib90" id="ref41">90</reflink>] ; although see Benders, [<reflink idref="bib6" id="ref42">6</reflink>] ). However, it is less clear that separation generalizes to other segments beyond the three point vowels. For instance, Cristia and Seidl ([<reflink idref="bib11" id="ref43">11</reflink>] ) attested increased separation of the point vowels in speech spoken to 4‐ and 11‐month‐old learners of American English, but not for other vowel contrasts (e.g., [i‐I]). The between‐category distance among the latter vowel categories was not larger in IDS than in ADS (see also McMurray et al., [<reflink idref="bib64" id="ref44">64</reflink>] , for similar results). This is problematic for learnability because one might argue on computational grounds that the vowels that are difficult to learn are probably not the point vowels which are situated at the extreme of the vocal space, but rather the ones that are in the middle and have several competitors with which they can be confused.</p> <p>There is another reason to doubt that separation is a very good proxy in the first place. As shown in Fig. , categories are defined not only by their center, but also by their variability. If, for instance, IDS not only increases the separation between category centers compared to ADS, but also increases within‐category variability, the two effects could cancel each other out or even wind up making IDS more difficult to learn. In fact, as we mentioned above, Kuhl et al. ([<reflink idref="bib44" id="ref45">44</reflink>] ) reported that parents tend to be more variable in their vowel productions in IDS than ADS. This was confirmed in later studies (Cristia &amp; Seidl, [<reflink idref="bib11" id="ref46">11</reflink>] ; Kirchhoff &amp; Schimmel, [<reflink idref="bib40" id="ref47">40</reflink>] ; McMurray et al., [<reflink idref="bib64" id="ref48">64</reflink>] ). If so, what is the net effect of these two opposing tendencies on category learnability?</p> <p>Previous work by Schatz ([<reflink idref="bib84" id="ref49">84</reflink>] ) has shown that the performance of unsupervised clustering algorithms can be predicted by a psychophysically inspired measure: the ABX discrimination score. The intuition behind this measure is illustrated in Fig. : it is defined as the probability that tokens within a category are closer to one another than between categories. If the two categories are completely overlapping, the ABX score is 0.5. If, on the other hand, the two categories are well segregated, the score can reach 1.[<reflink idref="bib1" id="ref50">1</reflink>] This work has demonstrated that the ABX score tends to be more statistically stable than standard clustering algorithms (k‐nearest neighbors, spectral clustering, hierarchical clustering, k‐means, etc.) while predicting their outcomes better than they predict each other's outcomes. All in all, this method is independent of specific learning algorithms, is non‐parametric (i.e., it does not assume particular shapes of distributions) and can operate on any featural representation including raw acoustic features. It can therefore be used as a stable proxy of unsupervised clustering and, therefore, of bottom‐up learnability.</p> <p>Using this measure, Martin et al. ([<reflink idref="bib58" id="ref51">58</reflink>] ) systematically studied the discriminability of 46 phonemic contrasts of Japanese by running the ABX discriminability test on a speech corpus with features derived from an auditory model, namely mel spectral features. The outcome was that, on average, phonemic categories were actually less discriminable in IDS than in ADS. While most contrasts did not differ between the two registers, the few that systematically differed pointed rather toward a decrease in acoustic contrastiveness in IDS at the phonemic level.</p> <p>To sum up, if one uses ABX‐discriminability as a proxy for bottom‐up learnability, we can conclude that the HLH is not supported by the data available. However, bottom‐up learning is not the only theoretical option available to account for phonetic learning in infants. Next, we examine top‐down theories.</p> <hd id="AN0130770052-5">Top‐down theories: Three learnability subproblems</hd> <p>Top‐down theories of phonetic category learning share with linguists the intuition that phonemes are defined, not so much through their acoustic properties, but rather through their function. The function of phonemes is to carry meaning contrasts at the lexical level. Top‐down theories therefore posit that phonemes emerge from the lexicon. As stated by Werker and Curtin ([<reflink idref="bib94" id="ref52">94</reflink>] ) (see also Beckman &amp; Edwards, [<reflink idref="bib4" id="ref53">4</reflink>] ):</p> <p>As the vocabulary expands and more words with overlapping features are added, higher order regularities emerge from the multidimensional clusters. These higher order regularities gradually coalesce into a system of contrastive phonemes. (p. 217)</p> <p>There are many ways to flesh out these ideas in terms of computational mechanisms. All of them involve at least the requirement that (some) word forms are learned and that these forms constrain the acquisition of phonetic categories. This can be summarized in terms of three subproblems (Fig. B): (a) segmenting word tokens from continuous speech, (b) clustering said word tokens into types, and (c) using said types to learn phonetic categories via a contrastive mechanism. Arguably, these three subproblems are interdependent (in fact, some models address several of them jointly, for example, Feldman, Griffiths, &amp; Morgan, [<reflink idref="bib21" id="ref54">21</reflink>] , or iteratively, for example, Versteegh, Anguera, Jansen, &amp; Dupoux, [<reflink idref="bib93" id="ref55">93</reflink>] ), and only a fully specified model would enable to fully test the functional impact of IDS for learnability under such a theory. Yet, as above, we claim that one can develop measures that can act as proxies for learnability, even in the absence of a full model.</p> <p>In what follows, we focus on the second subproblem, that is, the clustering of word types, which we take to be of central importance for phonetic category learning. Indeed, in case of a failure to solve subproblem 1 (e.g., infants undersegment “the dog” into “thedog,” or oversegment “butterfly” into “butter fly”), it is still possible to use contrastive learning with badly segmented proto‐words to learn phonetic categories (Fourtassi &amp; Dupoux, [<reflink idref="bib26" id="ref56">26</reflink>] ). In contrast, in case of a failure to solve subproblem 2 (e.g., infants merge “cat” and “dog” into a signal word type, or split “tomato” into many context or speaker dependant variants), then it is much more dubious that contrastive learning can be of any help to establish phonetic categories. Our experiments therefore only address subproblem 2, and we come back to the other two subproblems in the General Discussion.</p> <hd id="AN0130770052-6">The present study: Word form discriminability</hd> <p>The construction of word form categories is a similar computational problem to the problem of constructing phonetic categories discussed above. Both can be formulated as unsupervised clustering problems, the only difference being the granularity and number of categories being formed. Instead of sorting out instances of ‘i’, ‘a’, and ‘o’ into clusters, the problem is to sort out instances of ‘cat’, ‘dog’, and ‘tomato’ into clusters. Therefore, in both instances, it is possible to use ABX discriminability as a proxy for the (bottom‐up) learnability of these categories. Of course, words being composed of phonemes, one would expect a correlation between ABX discriminability on phonemes and on words. However, the word form level introduces two specific types of effects making such a correlation far from trivially true.</p> <p>First, the word level typically introduces specific patterns of phonetic variability. For instance, the word ‘tomato’ can be produced in a variety of ways: etc. Some of these variations are dependent on the dialect but others can surface freely within speaker, or depending on context, speaking style, or speaking rate. Such phonetic effects translate into distinct acoustic realizations of the word forms, potentially complicating the task of word form category learning. Could it be that IDS limits this source of variation, thereby helping infants to construct word form categories? Some studies have shown the use of more canonical forms in IDS than ADS (e.g., Dilley, Millett, McAuley, &amp; Bergeson, [<reflink idref="bib15" id="ref57">15</reflink>] ), while others have not (e.g., Fais, Kajikawa, Amano, &amp; Werker, [<reflink idref="bib20" id="ref58">20</reflink>] ; Lahey &amp; Ernestus, [<reflink idref="bib45" id="ref59">45</reflink>] ), but to our knowledge no study has looked at the global effect of these variations on word discriminability, and done so systematically. This is what we will examine in Experiment 1.</p> <p>Second, and setting aside phonetic realization to focus on abstract phonological characteristics, words tend to occupy sparse regions of phonological space. Put differently, there are many more unused possible word forms than actual ones. This results in minimal pairs being generally rare. For instance, a corpus analysis reveals that, in English, Dutch, French, and German, minimal pairs will concern less than 0.1% of all pairs (Dautriche, Mahowald, Gibson, Christophe, &amp; Piantadosi, [<reflink idref="bib13" id="ref60">13</reflink>] ); in fact, two words selected at random will differ in more than 90% of their phonemes on average. This should make word form clustering an easier task than phonetic clustering, a welcome result for top‐down theories. However, it could be that IDS modulates this effect by containing a different set of words than the vocabulary directed to adults. Corpora descriptions of IDS suggest that this is the case: Caregivers use a reduced vocabulary (Henning, Striano, &amp; Lieven, [<reflink idref="bib30" id="ref61">30</reflink>] ; Kaye, [<reflink idref="bib39" id="ref62">39</reflink>] ; Phillips, [<reflink idref="bib75" id="ref63">75</reflink>] ), which often includes a set of lexical items with special characteristics, such as syllabic reduplications and mimetics (Ferguson, [<reflink idref="bib22" id="ref64">22</reflink>] ; Fernald &amp; Morikawa, [<reflink idref="bib24" id="ref65">24</reflink>] ; Mazuka, Kondo, &amp; Hayashi, [<reflink idref="bib62" id="ref66">62</reflink>] ). May IDS boost learning by containing more phonologically distinct word forms than ADS? This is what we will examine in Experiment 2.</p> <p>The overall learnability of word forms, as far as clustering is concerned, is the combined effect of phonetic/acoustic discriminability (isolated in Experiment 1) and phonological discriminability (isolated in Experiment 2). As these two factors may go in different directions, we study the global discriminability of IDS versus ADS word form lexicons in Experiment 3.</p> <hd id="AN0130770052-7">Japanese IDS</hd> <p>Like other variants of IDS around the globe (Ferguson, [<reflink idref="bib22" id="ref67">22</reflink>] ), Japanese IDS is characterized by the presence of Infant‐Directed Vocabulary (IDV), ‘babytalk’ specifically used when interacting with infants. According to a survey and corpora studies by Mazuka et al. ([<reflink idref="bib62" id="ref68">62</reflink>] ), these words are mostly phonologically unrelated to words in the ADS lexicon. In particular, IDV presents many instances of reduplications (around 65%) and onomatopoeias/mimetic words (around 40%).[<reflink idref="bib2" id="ref69">2</reflink>] Phonological structures found in IDV are, in fact, more similar to phonological patterns produced by Japanese infants earlier in development than to patterns found in the adult lexicon (Tsuji, Nishikawa, &amp; Mazuka, [<reflink idref="bib89" id="ref70">89</reflink>] ; a list of 50 earlier produced words is given by Iba, [<reflink idref="bib31" id="ref71">31</reflink>] ). In addition to pattern repetition within words, IDS also presents more content word repetition, as well as more frequent and longer pauses, making utterances in IDS shorter than in ADS (Martin, Igarashi, Jincho, &amp; Mazuka, [<reflink idref="bib56" id="ref72">56</reflink>] ).</p> <p>Regarding the phonetics of Japanese IDS, it presents pitch‐range expansion (Igarashi, Nishikawa, Tanaka, &amp; Mazuka, [<reflink idref="bib32" id="ref73">32</reflink>] ), but it is not slower than ADS when taking into account local speech rate (Martin et al., [<reflink idref="bib56" id="ref74">56</reflink>] ). More related to our question of phonetic categories, vowel space expansion in F1 x F2 space has been attested in Japanese IDS (Andruski et al., [<reflink idref="bib1" id="ref75">1</reflink>] ; Miyazawa, Shinya, Martin, Kikuchi, &amp; Mazuka, [<reflink idref="bib67" id="ref76">67</reflink>] ); however, IDS categories presented higher variability and overlap (Miyazawa et al., [<reflink idref="bib67" id="ref77">67</reflink>] ), consistent with the decrease in acoustic discriminability observed by Martin et al. ([<reflink idref="bib58" id="ref78">58</reflink>] ). In fact, contrary to intuition, IDS appears to present more devoicing of non‐high vowels than ADS (i.e., less canonical and identifiable tokens), due to breathiness (Martin, Utsugi, &amp; Mazuka, [<reflink idref="bib59" id="ref79">59</reflink>] ). This paralinguistic modification of speech, which is thought to convey affect, is more prevalent in IDS than ADS (Miyazawa et al., [<reflink idref="bib67" id="ref80">67</reflink>] ).</p> <hd id="AN0130770052-8">Corpus</hd> <p>Most of the Japanese studies cited above, as well as the work described in this paper, have used data from the RIKEN Japanese Mother‐Infant Conversation Corpus, R‐JMICC (Mazuka, Igarashi, &amp; Nishikawa, [<reflink idref="bib61" id="ref81">61</reflink>] ), a corpus of spoken Japanese produced by 22 mothers in two listener‐dependent registers: IDS and ADS (Igarashi et al., [<reflink idref="bib32" id="ref82">32</reflink>] ).</p> <p>For our study, a word was defined as a set of co‐occurring phonemes with word boundaries following the gold standard for words in Japanese, roughly corresponding to dictionary entries. Lexical derivations were considered to belong to a separate type category with respect to their corresponding lemmas. For instance, /nai/ and /aru/, inflections of the verb /aru/ (English: to be), were evaluated as separate words. Homophones were collapsed into the same word category in the analyses.</p> <p>Because of the emphasis given to phonological structure when defining word categories, devoiced vowels were considered to be phonologically identical to their voiced counterparts, and similarly for abnormally elongated vowels or consonants that did not result in lexical modifications (i.e., use of gemination for emphasis). Additionally, fragmented, mispronounced, and unintelligible words were not included in our analyses (approximately 5% out of the initial corpus). The resulting corpus is henceforth referred to as the base corpus; information about its content can be found in Table .</p> <p>Description of the base corpora for adult‐directed speech (ADS) and infant‐directed speech (IDS)</p> <p> <ephtml> &lt;table border="1" cellpadding="3"&gt;&lt;tr&gt;&lt;th /&gt;&lt;th&gt;ADS&lt;/th&gt;&lt;th&gt;IDS&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Duration&lt;/td&gt;&lt;td&gt;3&amp;#160;h&lt;/td&gt;&lt;td&gt;11&amp;#160;h&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Types&lt;/td&gt;&lt;td&gt;1,382&lt;/td&gt;&lt;td&gt;1,765&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Tokens&lt;/td&gt;&lt;td&gt;12,248&lt;/td&gt;&lt;td&gt;34,253&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt; </ephtml> </p> <hd id="AN0130770052-9">Experiment 1: Acoustic distribution of word tokens</hd> <p>In this experiment, we ask whether caregivers articulate words in a more or less ‘distinctive’ manner when addressing their infants. Our aim is to answer this question at a purely acoustic level, that is, taking into account phonetic and acoustic variability, after removing influences from other aspects that vary across registers (e.g., lexical structure). Therefore, the following analyses have been restricted to the lexicon of words that are common to IDS and ADS for each parent.</p> <p>Our main measure is ABX discriminability applied to entire words. As in Martin et al. ([<reflink idref="bib58" id="ref83">58</reflink>] ), we use the ABX<subs>score</subs> which shows classification at chance with a value of 0.5, while perfect discrimination yields a score of 1. As such, a higher ABX<subs>score</subs> for IDS than ADS would mean that, on average, parents make their word categories more acoustically discriminable when addressing their infants, making these words easier to learn according to top‐down theories.</p> <p>The ABX discriminability measure implies computing the acoustic distance between word tokens, and computing the probability that two tokens belonging to the same word type are closer to one another than two tokens belonging to two distinct word types.</p> <p>Since it is the first time that such a discriminability measure is used at the word level, we validate it in a control condition in which there are a priori reasons to expect differences in discriminability between two speech registers. Namely, we assess the discrimination of words common to ADS and read speech (RS). This register is typically articulated in a slower, clearer, and more canonical fashion than spontaneous speech. Knowing this, we expect the ABX<subs>score</subs> to be higher in read speech (RS) than in spontaneous speech (ADS).</p> <p>Moreover, in order to further validate the application of our method to word units, two additional submeasures are explored, following the distinctions introduced in Fig. : between‐category separation and within‐category variability.</p> <hd id="AN0130770052-10">Methods</hd> <hd id="AN0130770052-11">Control corpus</hd> <p>The Read Speech (RS) subsection of the RIKEN corpus consists of recordings from a subset of 20 out of the 22 parents which had also previously been recorded in the ADS and IDS registers. Participants read 115 sentences containing phonemes in frequencies similar to those of typical adult‐directed speech (Sagisaka et al., [<reflink idref="bib81" id="ref84">81</reflink>] ). We extracted the words that were common to the read and the ADS subcorpora for each individual parent. We obtained between 19 and 32 words, each of them having between 2 and 49 occurrences. All of these word tokens were selected for subsequent analysis in the control ADS versus RS comparison.</p> <hd id="AN0130770052-12">Experimental corpus</hd> <p>All 22 participants had data in the IDS and ADS registers. For each participant, we selected the words that were common to the two registers. We obtained between 43 and 64 word types (individual numbers can be seen in the Appendix Table ). All of the word tokens for these types were selected for subsequent analyses in the experimental condition comparing ADS versus IDS. We did not match IDS and ADS on number of tokens per type to maximize the reliability of the metrics. Since ABX is an unbiased metric of discriminability, the size of a corpus will only modulate the standard error, not the average of the metric. It therefore cannot bias the discriminability score in IDS versus ADS; simply the fact that the ADS scores are estimated from a smaller corpus means that they will be noisier than the IDS scores. Matching the IDS corpus size to that of ADS would result in increasing the noise in the IDS scores. Number of total tokens per speaker are shown in Fig. .</p> <hd id="AN0130770052-13">Acoustic distance</hd> <p>The three acoustic measures that were computed, namely separation, variability, and discriminability (ABX<subs>score</subs>), all depend on a common core function which provides the measure of acoustic distance between two word tokens.</p> <p>As in Martin et al. ([<reflink idref="bib58" id="ref85">58</reflink>] ), we represented word tokens using compressed Mel filterbanks, which corresponds to the first stage of an auditory model (Moore, [<reflink idref="bib68" id="ref86">68</reflink>] ; Schatz, [<reflink idref="bib84" id="ref87">84</reflink>] ).</p> <p>Specifically, the audio file of each token was converted into a sequence of auditory spectral frames sampled 100 times per second, obtained by running speech through a bank of 13 band‐pass filters centered on frequencies spread according to a Mel scale between 100 and 6855 Hz (Schatz et al., [<reflink idref="bib85" id="ref88">85</reflink>] ). The energy of the output of each of the 13 filters was computed and their dynamic range was compressed by applying a cubic root. In summary, word tokens were represented as sequences of frames, which are vectors with 13‐dimensions (i.e., 1 value per filter).</p> <p>The distance between a pair of tokens was computed as follows. First, the two tokens of interest were realigned in the time domain by performing dynamic time warping (DTW; Sakoe &amp; Chiba, [<reflink idref="bib83" id="ref89">83</reflink>] ): This algorithm searches the optimal alignment path between the sequences of frames of the two tokens that are being compared. The distance between two aligned frames being compared was set to be the angle between the two 13‐dimensional feature vectors representing said frames. Secondly, the average of the frame‐wise distances along the optimal alignment path was set as the distance between that pair of tokens.</p> <p>Each of the three measures was computed separately for each speaker, both for IDS and for ADS.</p> <hd id="AN0130770052-14">Discriminability</hd> <p>Discriminability calculations were performed as in Martin et al. ([<reflink idref="bib58" id="ref90">58</reflink>] ) by estimating the probability that two tokens within a category are less distant than two tokens in two different categories. This score is computed for each pair of word types, and then aggregated by averaging across all of these pairs (ABX<subs>score</subs>). The calculations were done using the ABXpy package available on https://github.com/bootphon/ABXpy.</p> <p>More specifically, for each pair of word types A and B, we compiled the list of all possible (a,b,x) triplets where a was a token of category A, b a token of category B and x a token of either A or B. For instance, for word types A = /nai/ and B = /aru/, there could be a triplet with tokens a = [nai]<subs>1</subs>, b = [aru]<subs>1</subs>, and x = [nai]<subs>2</subs>. The distance d(a, x) between tokens a and x was compared to the distance d(b, x) between tokens b and x. In this example, since both a and x are tokens of category A, we expect the acoustic distance between them to be smaller than their distance to a token belonging to a different category (i.e., token b of type B).</p> <p>As such, if d(a, x) &gt; d(b, x) (i.e., [nai]<subs>2</subs> more similar to [aru]<subs>1</subs> than to [nai]<subs>1</subs>), the response given by the algorithm was deemed to be incorrect and an ABX<subs>score</subs> of 0 was assigned to that specific triplet. On the other hand, if as expected d(a, x) &lt; d(b, x), the algorithm returned a response deemed as correct and a score of 1 was given to the triplet. A final mean ABX<subs>score</subs> for all triplets was then computed for each speaker, separately for IDS and ADS, only taking into account word pairs that were observed in both speech registers.</p> <hd id="AN0130770052-15">Separation</hd> <p>For each pair of word types, we computed the distance between their medoids. A medoid is defined as the word token which minimizes the average distance to all of the other tokens in that word type. In case of ties, we used a set of medoids, and their scores were averaged. Separation can be viewed as a generalization of the notion of phonetic expansion, except that it applies to entire word forms instead of particular segments (e.g., vowels).</p> <hd id="AN0130770052-16">Variability</hd> <p>For each word type, variability was computed as the average distance between each token and every other token within the same word type. By definition, only word types with more than one token were included in the calculation. One can view this measure as analogous to the standard deviation in univariate distributions.</p> <hd id="AN0130770052-17">Results and discussion</hd> <p>Regarding the control condition, we compared the acoustic discriminability of the word types common to ADS and RS. We obtained an average ABX discriminability score per speaker per register (ADS or RS). A paired Student's t‐test revealed that words were significantly more discriminable in RS than in ADS (t(<reflink idref="bib19" id="ref91">19</reflink>) = 8.74; p &lt; .0001; Cohen's d = 2.68), with RS having an ABX<subs>score</subs> 0.09 points higher than ADS, on average (ABX<subs>score</subs> of 92% vs. 83%, respectively). As shown in Fig.  (panels D and H), all 20 parents showed this effect; individual scores can be found in the Appendix Table . In other words, on average the algorithm made twice as many errors classifying word tokens into categories in ADS compared to RS. This confirms that the ABX measure is able to capture the expected effects of read versus spontaneous speech on acoustic discriminability.</p> <p>Focusing on the experimental condition, for each of the three measures (discriminability, separation, variability), we computed an aggregate score across word types separately for each parent and register (individual scores can be found in the Appendix Table ). We then analyzed the effect of register by running a paired Student's t‐test across parents.</p> <p>The results are visually represented in Fig. . First, the analysis revealed a numerically small but statistically reliable degradation in acoustic discriminability of words in IDS compared to ADS (ABX<subs>score</subs> IDS: 80% vs. ADS: 84%; t(<reflink idref="bib21" id="ref92">21</reflink>) = −4.73; p &lt; .001; Cohen's d = −0.84). This is consistent with the degradation in discriminability previously observed at the level of individual phonemes (Martin et al., [<reflink idref="bib58" id="ref93">58</reflink>] ; McMurray et al., [<reflink idref="bib64" id="ref94">64</reflink>] ). Second, the trend for greater separation of word categories in IDS compared to ADS was not statistically significant (IDS: 0.47 rad vs. ADS: 0.46 rad; t(<reflink idref="bib21" id="ref95">21</reflink>) = 1.23; p &gt; .05; Cohen's d = 0.21). Finally, there was a reliable increase in variability in IDS relative to ADS (IDS: 0.38 rad vs. ADS: 0.35 rad; t(<reflink idref="bib21" id="ref96">21</reflink>) = 4.28; p &lt; .001; Cohen's d = 1.0). This increased variability is consistent with what has been observed at the level of individual phonemes (Cristia &amp; Seidl, [<reflink idref="bib11" id="ref97">11</reflink>] ; McMurray et al., [<reflink idref="bib64" id="ref98">64</reflink>] ).</p> <p>In sum, we found that word discrimination is more easily achieved in ADS than in IDS. This can be analyzed as being due to a large increase in variability in IDS which is not being compensated for by a necessary increase in separation. This is in contrast to predictions posited by the HLH, but consistent with previous work at the phonemic level (Martin et al., [<reflink idref="bib58" id="ref99">58</reflink>] ). In a way, this is not a totally surprising result, since by virtue of matching word types across registers, the effect of register on phoneme variability and discriminability is passed on to the level of words. What is new, however, is that the IDS register does not compensate for the phonetic variability by producing more canonical word forms. Next, we examine the content of the lexicon in the two registers.</p> <hd id="AN0130770052-18">Experiment 2: Phonological density</hd> <p>In this experiment, we focus on the phonological structure of the IDS and ADS lexicons. The core question is whether parents would select a set of words that are somewhat more ‘distinctive’ in IDS, yielding a sparser lexicon. Such a sparse lexicon could compensate for the increased phonetic variability measured in Experiment 1, thereby helping infants to cluster word forms into types.</p> <p>We use normalized edit distance (NED) as our main measure of the sparseness of the IDS and ADS lexicons. Normalized edit distance is defined as the proportion of changes (i.e., segmental additions, deletions, and substitutions) to be performed in order to transform one word into another. The smaller the edit distance between two words, the more structurally similar they are.</p> <p>NED takes into consideration not only phonological neighbors (i.e., words that differ by one phoneme), but also higher order neighbors when evaluating variation in the phonological structure of the lexicon in a psychologically relevant way. It is the direct phonological equivalent of the separation metric used in Experiment 1. Indeed, both metrics measure the average distance between word categories: separation measures acoustic distance, while NED measures phonological distance. Experiment 1 showed that parents do not reliably expand the acoustic space when using IDS; Experiment 2 asks: Are they expanding the phonological space when using this register?</p> <p>Before moving on to the analysis, we point out that mean NED may vary with lexicon size. Indeed, as more and more words are added to a lexicon, changes in the neighborhood structure are to be expected. Typically, short words tend to have denser neighborhoods as the lexicon size increases (as the combinatorial possibilities for constructing distinct short words quickly saturate). At the same time, the ratio between short and long words tends to decrease with lexicon size, because most new additions in a lexicon tend to be long, and long words tend to have sparser neighborhoods than short words. In order to limit the influence of such properties on our results, IDS and ADS corpora were matched in lexicon size before any comparison was performed.</p> <hd id="AN0130770052-19">Methods</hd> <hd id="AN0130770052-20">Sampling</hd> <p>As can be seen in Table , the volume of data available for both speaking registers in the base corpus was imbalanced; the IDS subset of the corpus contains more words (types and tokens) than its ADS counterpart. In order to account for this mismatch, we performed a frequency‐dependent sampling of word types that matched their number in both speech registers. Types which were more frequently uttered by a speaker had a higher probability of being included in a sample than rarer ones. Moreover, since the measurement used in this section heavily relies on the nature of the words sampled, and as a way to increase estimation reliability, sampling was performed 100 times per speaker per register. For instance, if a speaker uttered 82 word types in ADS and 237 in IDS, we created 100 subsets of the IDS lexicon by sampling 82 types from the 237 available 100 times. The final metric for said speaker in a given speech register was the mean NED obtained from the corresponding 100 samples. On average, a sample contained 179.64 ± 49 word types (see Table  of the Appendix for more information).</p> <hd id="AN0130770052-21">Normalized edit distance</hd> <p>For each parent, within each speech register, we computed the edit distance (ED) between every possible pair of types in the sampled lexicons. ED, also called the Levenstein distance, is defined as the minimal number of additions, deletions or substitutions needed to transform one string into another. It is computed using an algorithm very similar to the Dynamic Time Warping (DTW) algorithm used in Experiment 1; the algorithm finds a path that minimizes the total number of edits (insertions, deletions and substitutions, all of them equally weighted). The maximal number of changes max(x, y) is defined as the maximum length of the two types X and Y under comparison. Normalized edit distances (NEDs) were therefore derived as follows: where x and y correspond to the phonemic lengths of two distinct words X and Y. For instance, the ED between ‘tall’ /tɔl/ and ‘ball’ /bɔl/ is 1 (one substitution: /t/ ⇒ /b/). Both words are 3 phonemes long, so max(x, y) = 3. Therefore, the NED between these types is . The more structurally similar two types are, the closer their NED will be to zero.</p> <hd id="AN0130770052-22">Results and discussion</hd> <p>The distribution of the difference in mean NEDs for IDS and ADS across parents is shown on panels A and C of Fig. . Individual scores can be found in the Appendix Table . A pair‐wise Student's t‐test showed a systematic pattern of larger normalized edit distances in IDS than ADS (IDS: 0.877 vs. ADS: 0.871; t(<reflink idref="bib21" id="ref100">21</reflink>) = 5.00; p &lt; .0001; Cohen's d = 1.38). This difference shows that, overall, the IDS lexicon contains words that are phonologically more distinctive than those in the ADS lexicon. In hindsight, a difference of this sort may have been expected as IDS has been found to contain “babytalk” or infant‐directed vocabulary, that is, a special vocabulary which includes onomatopoeias and phonological reduplications (Ferguson, [<reflink idref="bib22" id="ref101">22</reflink>] ; Fernald &amp; Morikawa, [<reflink idref="bib24" id="ref102">24</reflink>] ). This hypothesis was verified in our dataset; we found that onomatopoeias and mimetic words (hereafter referred to solely as “onomatopoeias”) constituted approximately 30% of the average sample of IDS word types used in this experiment, whereas they represented less than 2% of an average ADS sample (cf. Appendix Table ), this latter frequency being consistent with the use of mimetic words in Japanese observed in previous work (Saji &amp; Imai, [<reflink idref="bib82" id="ref103">82</reflink>] ).</p> <p>In order to study the effect of onomatopoeias on phonological discriminability, we performed a post hoc analysis by resampling words after removing all onomatopoeias from the base corpus. We then re‐computed the mean NED for ADS and IDS. Individual scores can be found in the right side of the Appendix Table . A paired Student's t‐test revealed that the previously noted difference between IDS and ADS mean NED scores was no longer significant after onomatopoeia removal (IDS: 0.872 vs. ADS: 0.870; t(<reflink idref="bib21" id="ref104">21</reflink>) = 1.14; p &gt; 0.05; Cohen's d = 0.31, visual representation on panels B and D of Fig. ). Therefore, the IDS lexicon was found to be globally sparser than the ADS lexicon, and this effect seems to be principally driven by the unequal presence of onomatopoetic sounds in both speech registers.</p> <p>Infant‐directed words may facilitate lexical development not only by decreasing the overall phonological density of the lexicon, which directly impacts the clustering subproblem detailed in the introduction, but also in virtue of other intrinsic learning properties that would be relevant to a more complete model of early word learning. In the introduction, we focused on the three key word learning subproblems of segmentation, word clustering, and phonetic categorization. At this point, it is imperative to point out that there are other factors that impact word learning in infancy above and beyond these particular processes.</p> <p>When asked about vocabulary specifically used when addressing infants, Japanese women report a set of words of which 40% of the items are sound‐symbolic (Mazuka et al., [<reflink idref="bib62" id="ref105">62</reflink>] ). An iconic relationship between an acoustic form and the semantics of the referent (Imai &amp; Kita, [<reflink idref="bib33" id="ref106">33</reflink>] ) has been shown to help 14‐months‐old infants finding a word's referent (Miyazaki et al., [<reflink idref="bib66" id="ref107">66</reflink>] ), and it also facilitates the identification by pre‐school children of the specific features of an action a verbal word form is referring to (Imai, Kita, Nagumo, &amp; Okada, [<reflink idref="bib34" id="ref108">34</reflink>] ; Kantartzis, Imai, &amp; Kita, [<reflink idref="bib38" id="ref109">38</reflink>] ). Additionally, around 65% of the reported items contain reduplication of phonological patterns (Mazuka et al., [<reflink idref="bib62" id="ref110">62</reflink>] ), which may impact learning at a range of levels. Repetitive patterns may be more salient and generalizable than other equally complex patterns (Endress, Dehaene‐Lambertz, &amp; Mehler, [<reflink idref="bib17" id="ref111">17</reflink>] ; Endress, Nespor, &amp; Mehler, [<reflink idref="bib18" id="ref112">18</reflink>] ), and this salience could facilitate lexical acquisition in infants. This is supported by recent data showing that 9‐month‐old English‐learning infants segment words containing reduplications (e.g., neenee) from running speech more easily than words without reduplications (e.g., neefoo) (Ota &amp; Skarabela, [<reflink idref="bib73" id="ref113">73</reflink>] ). Furthermore, English‐learning 18‐month‐old infants appear to better learn novel object labels when these contain reduplications (Ota &amp; Skarabela, [<reflink idref="bib72" id="ref114">72</reflink>] ). In fact, reduplication has been found to be a characteristic shared by many items from the specialized set of “babytalk” words in various languages (Ferguson, [<reflink idref="bib22" id="ref115">22</reflink>] ), in spite of the tendency to avoid such repetitive patterns in adult language (Leben, [<reflink idref="bib48" id="ref116">48</reflink>] ).</p> <p>Similarly to what was observed in the survey by Mazuka et al. ([<reflink idref="bib62" id="ref117">62</reflink>] ), the majority of the word types tagged as onomatopoeias in our IDS corpus (i.e., around 30% of the types) present reduplication and/or sound symbolism (e.g., /waNwaN/ dog; /korokoro/ light object rolling repeatedly). Since infants seem to have a learning bias for words with these phonological characteristics, the higher proportion of onomatopoeias in IDS compared to ADS may provide an additional anchor for infant word learning.</p> <p>As a reviewer pointed out, it may seem counterintuitive at first to focus on the enhanced learnability of IDS‐specific words, since children are expected to eventually master all words, whether they are specific to IDS or present in both IDS and ADS. However, we are not concerned here with all of language acquisition, but only with the possibility that top‐down cues affecting sound category learning are more helpful in IDS compared to ADS. Thus, even if the words that are learned are not part of a general target lexicon, they might nonetheless present an easier word clustering subproblem, and in that way lead to a lexicon that can be used as seed for subsequent sound category extraction routines.</p> <p>In sum, we have found that IDS contains a higher proportion of onomatopoeias and mimetic words than ADS. Aside from their remarkable distinctiveness and salience, these items seem to contribute to decreasing the global density of the IDS lexicon. While words in IDS seem to be more spread in phonological space than words in ADS, phoneme‐like representations may not yet be available to infants until a larger vocabulary is amassed (Beckman, Munson, &amp; Edwards, [<reflink idref="bib5" id="ref118">5</reflink>] ; Lindblom, [<reflink idref="bib50" id="ref119">50</reflink>] ; Metsala &amp; Walley, [<reflink idref="bib65" id="ref120">65</reflink>] ; Pierrehumbert, [<reflink idref="bib76" id="ref121">76</reflink>] ). As such, one may wonder if, similarly, words may be more distant in the acoustic space when taking the structural differences into account. Indeed, we notice that the effect size is almost twice as large for the phonological NED (Cohen's d = −1.38) than for the acoustic discriminability (Cohen's d = −0.84). However, given that they are not based on exactly the same tokens, it remains possible that the phonological advantage does not compensate for the acoustic disadvantage. Indeed, the difference in mean NED between IDS and ADS, while statistically significant, is numerically very small, representing a difference of less than one percent of a word. The following experiment examines the question of the effect of phonological structure on acoustic discriminability, by integrating both factors in one global discriminability measure.</p> <hd id="AN0130770052-23">Experiment 3: Net discriminability</hd> <p>In Experiment 1, we found that when we looked at the exact same word types in both registers, the IDS tokens were acoustically more confusable than the ADS tokens, due to the increased variability in IDS word categories in the acoustic space. In other words, when removing the influence of structural peculiarities of the lexicons, IDS does not present an advantage over ADS in acoustic discriminability. We then saw in Experiment 2 that the lexicons of IDS and ADS differed structurally. Words from the IDS lexicon were phonologically more distinct than those in the ADS lexicon, in part due to onomatopoeias and mimetic words.</p> <p>Here, we put these two previous results together and ask the following question: When accounting for register‐specific lexical structure, is the IDS lexicon acoustically clearer than the ADS lexicon? In other words, if we take a random pair of word tokens from two different word types found in the IDS recordings, are these tokens more or less acoustically distinct than a like‐built pair in the ADS recordings?</p> <hd id="AN0130770052-24">Method</hd> <hd id="AN0130770052-25">Sampling</hd> <p>In order to observe the combined effects of the differences in phonological structure on acoustic discriminability, the same sampled lexicons used for Experiment 2 were used for this section, that is, 100 lexicon subsets per register per speaker, matched in number of word types across speech registers.</p> <p>As it was done in Experiment 1, number of tokens per type were not matched in order to maximize the reliability of the ABX metric. Individual number of types can be seen in Table  of the Appendix, with total number of tokens shown in Fig. .</p> <hd id="AN0130770052-26">Computing acoustic discriminability</hd> <p>Acoustic discriminability was computed as described in Experiment 1. A mean ABX score was computed per sampled lexicon subset. ABX scores were collapsed by computing the mean ABX score per speaker per register.</p> <hd id="AN0130770052-27">Results and Discussion</hd> <p>We compared the mean ABX scores for ADS and IDS obtained on the sampled lexicons used in Experiment 2 (Fig. ). Individual scores can be found in the Appendix Table . A paired Student's t‐test revealed that mean ABX<subs>score</subs> were significantly larger for ADS than for IDS, whether onomatopoeias were included in the lexicon subsets (ABX<subs>score</subs> IDS: 86% vs. ADS: 87%; t(<reflink idref="bib21" id="ref122">21</reflink>) = −2.37, p &lt; .05; Cohen's d = −0.41) or not (ABX<subs>score</subs> IDS: 85% vs. ADS: 87%; t(<reflink idref="bib21" id="ref123">21</reflink>) = −2.57, p &lt; .05; Cohen's d = −0.43). As such, similar to what was found in Experiment 1, words are less discriminable in IDS than in ADS even after taking into account the phonological specificities of the infant‐directed lexicon.</p> <p>This result underlines the importance of assessing effects of language acquisition enhancers not only in terms of their statistical significance across parents (p values, Cohen's d), but also quantitatively, that is, in terms of their numerical strength when combined together. To see this more clearly, we computed the increase or decrease in the score under study as a percentage relative to the ADS score taken as a baseline.</p> <p>In Experiment 1, the decrement in discriminability in IDS was 4% relative to ADS, and this effect was robust across participants (Cohen's d = −0.84). In Experiment 2, the increase in NED represented a numerically smaller effect of less than 1% for IDS relative to ADS. This effect was actually even more robust across participants (Cohen's d = 1.38). Interestingly, when the two effects are combined (Experiment 3), the outcome is not determined by which effect was more statistically robust across participants, but by which one was numerically larger. Indeed, the outcome yields a numerically small (1% relative) decrement in discriminability, which is also much weaker across participants (Cohen's d = −0.41).</p> <hd id="AN0130770052-28">General discussion</hd> <p>The Hyper Learnability Hypothesis (HLH) states that when talking to their infants, parents modify the linguistic properties of their speech in order to facilitate the learning process. In this paper, we focused on the learning of phonetic categories and reviewed two classes of theories in order to quantitatively assess the HLH: (a) bottom‐up theories assume that phonetic categories emerge through the unsupervised clustering of acoustic information, (b) top‐down theories assume that phonetic categories emerge through contrastive feedback from learned word types. Previous work has already addressed bottom‐up theories: Martin et al. ([<reflink idref="bib58" id="ref124">58</reflink>] ) examined phonemes in a corpus of Japanese laboratory recordings and found that phonemes produced by caregivers addressing their 18‐ to 24‐month old infants were less discriminable than ADS phonemes. This rules out the HLH for that corpus and bottom‐up theories. In this study, we focused on top‐down theories using the same corpus and investigated the acoustic discriminability of word types.</p> <p>In Experiment 1, we compared the acoustic discriminability of words that are common to both speech registers, and found that words are less discriminable in IDS than in ADS (an absolute decrease in ABX<subs>score</subs> of 4%), likely because of increased within‐category variability. This result parallels the increase in phonetic variability found in previous studies (Cristia &amp; Seidl, [<reflink idref="bib11" id="ref125">11</reflink>] ; Kirchhoff &amp; Schimmel, [<reflink idref="bib40" id="ref126">40</reflink>] ; McMurray et al., [<reflink idref="bib64" id="ref127">64</reflink>] ), and it is consistent with the decreased phoneme discriminability measured by Martin et al. ([<reflink idref="bib58" id="ref128">58</reflink>] ). It is not consistent, however, with the claim that words in IDS are uttered in a more canonical way than in ADS (Dilley et al., [<reflink idref="bib15" id="ref129">15</reflink>] ; but see Fais et al., [<reflink idref="bib20" id="ref130">20</reflink>] ; Lahey &amp; Ernestus, [<reflink idref="bib45" id="ref131">45</reflink>] ). In Experiment 2, we turned to the structure of the phonological lexicon. We found that the IDS lexicon was globally more spread out than that of ADS, as shown by a larger normalized edit distance between words for the former. Interestingly, this effect was attributable mostly to a higher prevalence of onomatopoeias and mimetic words in IDS. These words have idiosyncratic phonological properties, such as reduplications, which are likely responsible for the increase in global distinctiveness found in the IDS lexicon, compared to the ADS lexicon. In Experiment 3, a final analysis measured the net effect of the opposite trends found in Experiments 1 and 2, and found that, on average, words were still less acoustically discriminable in IDS than in ADS, although the effect was now considerably reduced (an absolute decrease in ABX<subs>score</subs> of 1%).</p> <p>Overall, then, the word form clustering subproblem is not easier to solve by using IDS input than with ADS input; quite to the contrary, there is a numerically small but consistent trend in the opposite direction. Does this undermine the HLH for top‐down theories of phonetic learning as a whole? Clearly, the answer is “no,” since – as explained in the Introduction – HLH actually encompasses two other learning subproblems (cf. Fig. ). We discuss relevant evidence on IDS‐ADS differences bearing on each subproblem in turn.</p> <p>Regarding the problem of finding word token boundaries, Ludusan and colleagues have started studying word form segmentation using either raw acoustics or text‐like phonological representations as input. Ludusan, Seidl, Dupoux, and Cristia ([<reflink idref="bib55" id="ref132">55</reflink>] ) studied the performance of acoustic word form discovery systems on a corpus of American English addressed to 4‐ or 11‐month‐olds versus adults. The overall results are similar to those of Experiment 3; that is, the two registers give similar outcomes, if anything, with a very small difference in favor of ADS, rather than the expected IDS. Computational models of word segmentation from running speech represented via acoustics are, however, well‐known to underperform compared to models that represent speech via textual representations (Versteegh et  al., [<reflink idref="bib93" id="ref133">93</reflink>] ). Thus, in Ludusan, Mazuka, Bernard, Cristia, and Dupoux ([<reflink idref="bib54" id="ref134">54</reflink>] ), we studied word form segmentation from text‐like representations using the same RIKEN corpus as input, and a selection of state‐of‐the‐art cognitively based models of infant word segmentation. Results showed an advantage of IDS over ADS for most algorithms and settings.</p> <p>Beyond the question of whether segmentation is easier in IDS versus ADS, we cannot move on to the next learning subproblem without pointing out that, for future work to assess the net effect of register on word segmentation, one would need to know more about the size and composition of infants’ early lexicon. In fact, most accounts propose that the phonological system is extracted from the long‐term lexicon, rather than on the fly from experience with the running spoken input (discussed in Bergmann, Tsuji, &amp; Cristia, [<reflink idref="bib8" id="ref135">8</reflink>] ). In the present paper, we have done a systematic study of word discriminability across the whole set of words present in the corpus, as if infants could segment the corpus exactly as adults do. This is, of course, unlikely. In fact, recent evidence suggests that infants may be using a suboptimal segmentation algorithm (Larsen, Dupoux, &amp; Cristia, [<reflink idref="bib47" id="ref136">47</reflink>] ), which leads them to accumulate a “protolexicon” containing not only words, but also over‐ or under‐segmented tokens that do not belong to the adult‐like lexicon (Ngon et al., [<reflink idref="bib71" id="ref137">71</reflink>] ). Such protowords can nonetheless help with contrastive learning (Fourtassi &amp; Dupoux, [<reflink idref="bib26" id="ref138">26</reflink>] ; Martin, Peperkamp, &amp; Dupoux, [<reflink idref="bib57" id="ref139">57</reflink>] ).</p> <p>Regarding contrastive learning of phonetic categories, it is too early to know whether the net effect of register will be beneficial or detrimental. For instance, a detrimental effect of phonetic variability in a bottom‐up setting can become beneficial in a top‐down setting, by presenting infants with more varied input, and therefore preparing them for future between‐speaker variability. This is illustrated in the supervised learning of phonetic categories in adults (Lively, Logan, &amp; Pisoni, [<reflink idref="bib52" id="ref140">52</reflink>] ). However, as suggested by Rost and McMurray ([<reflink idref="bib79" id="ref141">79</reflink>] ), variability should be limited to acoustic cues that are not relevant to phonetic contrasts in order to promote learning. In order to fully assess the net effect of register, two important elements have to be clarified. First, one would need to have a fully specified model of contrastive learning itself. Candidate computational models have been proposed (e.g., Feldman et al., [<reflink idref="bib21" id="ref142">21</reflink>] ; Fourtassi &amp; Dupoux, [<reflink idref="bib26" id="ref143">26</reflink>] ), but not fully validated with realistic infant‐directed speech corpora (but see Versteegh et al., [<reflink idref="bib93" id="ref144">93</reflink>] , for an application to ADS corpora).</p> <p>Throughout the above discussion, an important take‐home message is that it is essential to posit well‐defined, testable theories of infant learning, which can be evaluated using quantitative measures, even when fully specified computational models are not yet available. Individual studies focus only on a few pieces of the puzzle and the magnitude of each evaluated effect must be observed relative to other effects. For instance, in our study, even the relatively large effect of IDS versus ADS on the discriminability of word forms found in Experiment 1 has to be compared to the much larger effect (by a factor of 2) of read versus spontaneous speech found within the ADS register. What we propose as a methodology is to break down theories of language acquisition into component parts, and to derive proxy measures for each component to derive a more systematic grasp of the quantitative effects of register. Before closing, we would like to discuss two limitations of this study, one regarding the corpus and the other regarding the theory tested (the HLH).</p> <p>The main limitation of the RIKEN corpus is that it was recorded in the laboratory and did not include naturalistic interactions between adults as they may occur in the home environment. The presence of an experimenter and props (toys, etc.) in the laboratory setting may induce some degree of non‐naturalness in the interaction, both with the infant, and with the adult. Johnson, Lahey, Ernestus, and Cutler ([<reflink idref="bib35" id="ref145">35</reflink>] ) found that in Dutch, ADS is not a homogeneous register, and that it bears similarities with IDS when the addressed adult is familiar as opposed to unfamiliar.[<reflink idref="bib3" id="ref146">3</reflink>] It remains to be assessed whether similar results are obtained in more ecological and representative IDS and ADS samples. In addition, this study is limited by the relatively small size of the corpus. Because we analyzed each parent separately, the size of the analyzed lexicons was between 82 and 260 words, which may under‐represent the range of words heard in a home setting. Finally, our analysis is limited to Japanese. There is evidence that vowel hyperarticulation varies across languages (Benders, [<reflink idref="bib6" id="ref147">6</reflink>] ; Englund &amp; Behne, [<reflink idref="bib19" id="ref148">19</reflink>] ; Kuhl et al., [<reflink idref="bib44" id="ref149">44</reflink>] ), and more generally that the specifics of the IDS register varies across culture (e.g., Fernald &amp; Morikawa, [<reflink idref="bib24" id="ref150">24</reflink>] ; Igarashi et al., [<reflink idref="bib32" id="ref151">32</reflink>] ). It would therefore be important to replicate our methods in more ecological, cross‐linguistic corpora. Fortunately, the availability of wearable recording systems such as the LENA© device (Greenwood, Thiemann‐Bourque, Walker, Buzhardt, &amp; Gilkerson, [<reflink idref="bib28" id="ref152">28</reflink>] ) increases the prospects of automatizing the collection and analysis of naturalistic speech (Soderstrom &amp; Wittebolle, [<reflink idref="bib87" id="ref153">87</reflink>] ).</p> <p>The second limitation of this study is that we restricted our quantitative analysis to the testing of the HLH. However, the HLH is not the only hypothesis that can be addressed. Other theories have been proposed regarding the etiology and role of IDS in the linguistic development of infants (i.e., why caregivers use it, and what are the actual effects on the child). Some modifications of the input may indeed have pedagogical functions (enhancing learnability), while other modifications may decrease learnability while increasing some other factor in the parent–infant interaction. For instance, it has been documented that mothers sometimes violate the grammar of their language when teaching new words, probably in order to place the novel word in a sentence‐final position (Aslin, Woodward, LaMendola, &amp; Bever, [<reflink idref="bib3" id="ref154">3</reflink>] ), which is salient because of properties of short‐term memory. Similarly, it has sometimes been suggested that caregivers inadvertently sacrifice phonetic precision in order to make infants more comfortable and/or more receptive to the input (Papoušek &amp; Hwang, [<reflink idref="bib74" id="ref155">74</reflink>] ; Reilly &amp; Bellugi, [<reflink idref="bib77" id="ref156">77</reflink>] ). Increased phonetic variability in IDS at the phonemic level may stem from a slower speaking rate (McMurray et al., [<reflink idref="bib64" id="ref157">64</reflink>] ), or from exaggerated prosodic variations (Fernald et al., [<reflink idref="bib25" id="ref158">25</reflink>] ; Martin et al., [<reflink idref="bib56" id="ref159">56</reflink>] ; Soderstrom, [<reflink idref="bib86" id="ref160">86</reflink>] ), or possibly from gestural modifications that convey a positive affect, such as smiling (Benders, [<reflink idref="bib6" id="ref161">6</reflink>] ), increased breathiness (Miyazawa et al., [<reflink idref="bib67" id="ref162">67</reflink>] ) or even a vocal tract that is shortened to resemble the child's own (Kalashnikova, Carignan, &amp; Burnham, [<reflink idref="bib37" id="ref163">37</reflink>] ). According to a study by Trueswell et al. ([<reflink idref="bib88" id="ref164">88</reflink>] ), successful word learning interactions tend to be those in which actions performed by both caregivers and infants are precisely synchronized, with time‐locking of gaze, speech and gestures. By focusing on efficiently capturing the infant's attention, caregivers could create an optimal learning environment, in spite of potential degradations brought upon lexical acoustic clarity. A similar interpretation is held by authors such as Csibra and Gergely ([<reflink idref="bib12" id="ref165">12</reflink>] ), who argue that one of the main roles of IDS is to inform the infant that speech is being directed to her, thus highlighting the pedagogical nature of the interaction as a whole. In this view, the goal of caregivers would not be to provide clearer input, but to make language interactions and their attached learning situations more exciting and attractive to infants.</p> <p>Another direction entirely, is to propose that IDS may help infants to produce language. Ferguson ([<reflink idref="bib22" id="ref166">22</reflink>] ) describes “babytalk” as a subset of phonologically‐simplified words due to reduced consonant clusters, use of coronals instead of velars, word shortening, etc. These adaptations would make it easier for developing infants to imitate the words, and/or they may be inspired by previous generations’ production errors. In fact, previous work performed on our corpus shows that, if anything, the structural properties of words in our IDS sample better fit early patterns of Japanese infant speech production than those of words in ADS (Tsuji et al., [<reflink idref="bib89" id="ref167">89</reflink>] ). While the causal relationship between babytalk use and infant word production should be further assessed experimentally, the phonological properties of our IDS corpus suggest that, to some extent, parental input may be encouraging infant word production.</p> <p>In brief, while the HLH focuses on the change in informational content of IDS which may boost (or hinder) the learnability of particular linguistic structures, IDS could have a beneficial effect on completely different grounds: enhancing overall attention or positive emotions which would increase depth of processing and retention, or facilitating production, thereby counteracting the inadvertent acoustic degradation of local units of speech such as words and phonemes. For these alternative theories of HLH to be testable within our quantitative approach, we would need to formulate these theories with enough precision that they can either be implemented, or proxies can be derived to analyze realistic corpora of caregivers/infants interactions.</p> <p>To conclude, the last 50 years we have learned a great deal about how IDS and ADS differ, yet much remains to be understood. We believe it is crucial in this quest to bear in mind a detailed model of early language acquisition, and to submit predictions of this model to systematic, quantitative tests.</p> <hd id="AN0130770052-29">Acknowledgments</hd> <p>This work was supported by the European Research Council (Grant ERC‐2011‐AdG‐295810 BOOTPHON), the Agence Nationale de la Recherche (Grants ANR‐2010‐BLAN‐1901‐1 BOOTLANG, ANR‐14‐CE30‐0003 MechELex, ANR‐10‐IDEX‐0001‐02 PSL, and ANR‐10‐LABX‐0087 IEC), the James S. McDonnell Foundation, the Fondation de France, the Japan Society for the Promotion of Science (Kakenhi Grant 24520446, to A. Martin), and the Canon Foundation in Europe. We thank Bob McMurray and two anonymous reviewers for helpful feedback.</p> <hd id="AN0130770052-30">Author contributions</hd> <p>R. Mazuka oversaw the collection and coding of the corpus. A. Martin wrote the algorithms for extracting words and their phonological structure. R. Thiollière provided coding support with the ABX task. A. Cristia directed the literature review. B. Ludusan assisted with preparation of the ADS‐RS comparison. A. Guevara‐Rukoz and E. Dupoux carried out the acoustical and phonological analyses and, along with A. Cristia, produced the first draft. All authors contributed to the writing of this manuscript.</p> <ref id="AN0130770052-31"> <title>Notes</title> <blist> <bibl id="bib1" idref="ref35" type="bt">1</bibl> <bibtext>Schatz () has shown that an ABX score of 1 between categories A and B implies that the two categories can be discovered without error by the clustering algorithm k‐means. </bibtext> </blist> <blist> <bibl id="bib2" idref="ref32" type="bt">2</bibl> <bibtext>In a study by Fernald and Morikawa (), Japanese mothers used onomatopoetic words more readily than American mothers. </bibtext> </blist> <blist> <bibl id="bib3" idref="ref146" type="bt">3</bibl> <bibtext>In addition to these effects, Japanese and many other languages have a set of specialized morphemes that depend on familiarity between the talkers; this could have artificially increased the difference between IDS and ADS in the present corpus. </bibtext> </blist> </ref> <ref id="AN0130770052-32"> <title>References</title> <blist> <bibtext>Andruski, J. E., Kuhl, P. K., &amp; Hayashi, A. (1999). The acoustics of vowels in Japanese women's speech to infants and adults. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville &amp; A. C. Bailey (Eds.), Proceedings of the 14th International Congress on Phonetic Sciences (Vol. 3, pp. 2177–2179). San Francisco, CA. </bibtext> </blist> <blist> <bibtext>Antetomaso, S., Miyazawa, K., Feldman, N., Elsner, M., Hitczenko, K., &amp; Mazuka, R. (2016). Modeling phonetic category learning from natural acoustic data. In M. LaMendola &amp; J. Scott (Eds.), Proceedings of the 41th Annual Boston University Conference on Language Development (pp. 32–45). Somerville, MA: Cascadilla Press. </bibtext> </blist> <blist> <bibtext>Aslin, R. N., Woodward, J. Z., LaMendola, N. P., &amp; Bever, T. G. (1996). Models of word segmentation in fluent maternal speech to infants. In J. L. Morgan &amp; K. Demuth (Eds.), Signal to syntax: Bootstrapping from speech to grammar in early acquisition (pp. 117–134). Hillsdale, NJ: Erlbaum. </bibtext> </blist> <blist> <bibl id="bib4" idref="ref53" type="bt">4</bibl> <bibtext>Beckman, M. E., &amp; Edwards, J. (2000). The ontogeny of phonological categories and the primacy of lexical learning in linguistic development. Child Development, 71(1), 240–249. </bibtext> </blist> <blist> <bibl id="bib5" idref="ref118" type="bt">5</bibl> <bibtext>Beckman, M. E., Munson, B., &amp; Edwards, J. (2007). Vocabulary growth and the developmental expansion of types of phonological knowledge. Laboratory Phonology, 9, 241–264. </bibtext> </blist> <blist> <bibl id="bib6" idref="ref42" type="bt">6</bibl> <bibtext>Benders, T. (2013). Mommy is only happy! Dutch mothers’ realisation of speech sounds in infant‐directed speech expresses emotion, not didactic intent. Infant Behavior and Development, 36(4), 847–862. </bibtext> </blist> <blist> <bibl id="bib7" idref="ref15" type="bt">7</bibl> <bibtext>Bergmann, C., Cristia, A., &amp; Dupoux, E. (2016). Discriminability of sound contrasts in the face of speaker variation quantified. In A. Papafragou, D. Grodner, D. Mirman &amp; J. C. Trueswell (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 1331–1336). Austin, TX: Cognitive Science Society. </bibtext> </blist> <blist> <bibl id="bib8" idref="ref135" type="bt">8</bibl> <bibtext>Bergmann, C., Tsuji, S., &amp; Cristia, A. (2017). Top‐down versus bottom‐up theories of phonological acquisition: A big data approach. In Proceedings of Interspeech 2017 (pp. 2103–2107). Available at https://osf.io/vypwu/ https://doi.org/10.21437/interspeech.2017-1443. </bibtext> </blist> <blist> <bibl id="bib9" idref="ref36" type="bt">9</bibl> <bibtext>Bernstein Ratner, N. (1984). Patterns of vowel modification in mother–child speech. Journal of Child Language, 11(03), 557–578. </bibtext> </blist> <blist> <bibl id="bib10" idref="ref37" type="bt">10</bibl> <bibtext>Burnham, D., Kitamura, C., &amp; Vollmer‐Conna, U. (2002). What's new, pussycat? On talking to babies and animals. Science, 296(5572), 1435. </bibtext> </blist> <blist> <bibl id="bib11" idref="ref38" type="bt">11</bibl> <bibtext>Cristia, A., &amp; Seidl, A. (2014). The hyperarticulation hypothesis of infant‐directed speech. Journal of Child Language, 41, 913–934. </bibtext> </blist> <blist> <bibl id="bib12" idref="ref165" type="bt">12</bibl> <bibtext>Csibra, G., &amp; Gergely, G. (2006). Social learning and social cognition: The case for pedagogy. Processes of Change in Brain and Cognitive Development. Attention and Performance XXI, 21, 249–274. </bibtext> </blist> <blist> <bibl id="bib13" idref="ref60" type="bt">13</bibl> <bibtext>Dautriche, I., Mahowald, K., Gibson, E., Christophe, A., &amp; Piantadosi, S. T. (2017). Words cluster phonetically beyond phonotactic regularities. Cognition, 163, 128–145. </bibtext> </blist> <blist> <bibl id="bib14" idref="ref24" type="bt">14</bibl> <bibtext>De Boer, B., &amp; Kuhl, P. K. (2003). Investigating the role of infant‐directed speech with a computer model. Acoustics Research Letters Online, 4(4), 129–134. </bibtext> </blist> <blist> <bibl id="bib15" idref="ref57" type="bt">15</bibl> <bibtext>Dilley, L. C., Millett, A. L., McAuley, J. D., &amp; Bergeson, T. R. (2014). Phonetic variation in consonants in infant‐directed and adult‐directed speech: The case of regressive place assimilation in word‐final alveolar stops. Journal of Child Language, 41(01), 155–175. </bibtext> </blist> <blist> <bibl id="bib16" idref="ref20" type="bt">16</bibl> <bibtext>Dupoux, E. (2016). Cognitive science in the era of artificial intelligence: A roadmap for reverse‐engineering the infant language‐learner. Cognition, 173, 43–59. </bibtext> </blist> <blist> <bibl id="bib17" idref="ref111" type="bt">17</bibl> <bibtext>Endress, A. D., Dehaene‐Lambertz, G., &amp; Mehler, J. (2007). Perceptual constraints and the learnability of simple grammars. Cognition, 105(3), 577–614. </bibtext> </blist> <blist> <bibl id="bib18" idref="ref112" type="bt">18</bibl> <bibtext>Endress, A. D., Nespor, M., &amp; Mehler, J. (2009). Perceptual and memory constraints on language acquisition. Trends in Cognitive Sciences, 13(8), 348–353. </bibtext> </blist> <blist> <bibl id="bib19" idref="ref10" type="bt">19</bibl> <bibtext>Englund, K. T., &amp; Behne, D. M. (2005). Infant directed speech in natural interaction — Norwegian vowel quantity and quality. Journal of Psycholinguistic Research, 34(3), 259–280. </bibtext> </blist> <blist> <bibl id="bib20" idref="ref58" type="bt">20</bibl> <bibtext>Fais, L., Kajikawa, S., Amano, S., &amp; Werker, J. F. (2010). Now you hear it, now you don't: Vowel devoicing in Japanese infant‐directed speech. Journal of Child Language, 37(02), 319–340. </bibtext> </blist> <blist> <bibl id="bib21" idref="ref54" type="bt">21</bibl> <bibtext>Feldman, N. H., Griffiths, T. L., &amp; Morgan, J. L. (2009). Learning phonetic categories by learning a lexicon. In N. Taatgen &amp; H. van Rijn (Eds.), Proceedings of the 31st Annual Conference of the Cognitive Science Society (pp. 2208–2213). </bibtext> </blist> <blist> <bibl id="bib22" idref="ref64" type="bt">22</bibl> <bibtext>Ferguson, C. A. (1964). Baby talk in six languages. American Anthropologist, 66, 103–114. </bibtext> </blist> <blist> <bibl id="bib23" idref="ref1" type="bt">23</bibl> <bibtext>Fernald, A. (2000). Speech to infants as hyperspeech: Knowledge‐driven processes in early word recognition. Phonetica, 57(2–4), 242–254. </bibtext> </blist> <blist> <bibl id="bib24" idref="ref65" type="bt">24</bibl> <bibtext>Fernald, A., &amp; Morikawa, H. (1993). Common themes and cultural variations in Japanese and American mothers’ speech to infants. Child Development, 64(3), 637–656. </bibtext> </blist> <blist> <bibl id="bib25" idref="ref3" type="bt">25</bibl> <bibtext>Fernald, A., Taeschner, T., Dunn, J., Papousek, M., de Boysson‐Bardies, B., &amp; Fukui, I. (1989). A cross‐language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16(03), 477–501. </bibtext> </blist> <blist> <bibl id="bib26" idref="ref56" type="bt">26</bibl> <bibtext>Fourtassi, A., &amp; Dupoux, E. (2014). A rudimentary lexicon and semantics help bootstrap phoneme acquisition. In R. Morante &amp; W. Yih (Eds.), Proceedings of the 18th Conference on Computational Natural Language Learning (pp. 191–200). Association for Computational Linguistics. </bibtext> </blist> <blist> <bibl id="bib27" idref="ref12" type="bt">27</bibl> <bibtext>Golinkoff, R. M., Can, D. D., Soderstrom, M., &amp; Hirsh‐Pasek, K. (2015). (Baby) talk to me: The social context of infant‐directed speech and its effects on early language acquisition. Current Directions in Psychological Science, 24(5), 339–344. </bibtext> </blist> <blist> <bibl id="bib28" idref="ref152" type="bt">28</bibl> <bibtext>Greenwood, C. R., Thiemann‐Bourque, K., Walker, D., Buzhardt, J., &amp; Gilkerson, J. (2011). Assessing children's home language environments using automatic speech recognition technology. Communication Disorders Quarterly, 32(2), 83–92. </bibtext> </blist> <blist> <bibl id="bib29" idref="ref28" type="bt">29</bibl> <bibtext>Guenther, F. H., &amp; Gjaja, M. N. (1996). The perceptual magnet effect as an emergent property of neural map formation. Journal of the Acoustical Society of America, 100, 1111–1121. </bibtext> </blist> <blist> <bibl id="bib30" idref="ref61" type="bt">30</bibl> <bibtext>Henning, A., Striano, T., &amp; Lieven, E. V. (2005). Maternal speech to infants at 1 and 3 months of age. Infant Behavior and Development, 28(4), 519–536. </bibtext> </blist> <blist> <bibl id="bib31" idref="ref71" type="bt">31</bibl> <bibtext>Iba, M. (2000). An analysis of the first 50 words acquired by young Japanese children. Language and Culture: The Journal of the Institute for Language and Culture, 4, 45–56. </bibtext> </blist> <blist> <bibl id="bib32" idref="ref73" type="bt">32</bibl> <bibtext>Igarashi, Y., Nishikawa, K., Tanaka, K., &amp; Mazuka, R. (2013). Phonological theory informs the analysis of intonational exaggeration in Japanese infant‐directed speech. The Journal of the Acoustical Society of America, 134(2), 1283–1294. </bibtext> </blist> <blist> <bibl id="bib33" idref="ref106" type="bt">33</bibl> <bibtext>Imai, M., &amp; Kita, S. (2014). The sound symbolism bootstrapping hypothesis for language acquisition and language evolution. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 369(1651), 20130298. </bibtext> </blist> <blist> <bibl id="bib34" idref="ref108" type="bt">34</bibl> <bibtext>Imai, M., Kita, S., Nagumo, M., &amp; Okada, H. (2008). Sound symbolism facilitates early verb learning. Cognition, 109(1), 54–65. </bibtext> </blist> <blist> <bibl id="bib35" idref="ref145" type="bt">35</bibl> <bibtext>Johnson, E. K., Lahey, M., Ernestus, M., &amp; Cutler, A. (2013). A multimodal corpus of speech to infant and adult listeners. The Journal of the Acoustical Society of America, 134, EL534–EL540. </bibtext> </blist> <blist> <bibl id="bib36" idref="ref21" type="bt">36</bibl> <bibtext>Jusczyk, P. W., Bertoncini, J., Bijeljac‐Babic, R., Kennedy, L. J., &amp; Mehler, J. (1990). The role of attention in speech perception by young infants. Cognitive Development, 5(3), 265–286. </bibtext> </blist> <blist> <bibl id="bib37" idref="ref163" type="bt">37</bibl> <bibtext>Kalashnikova, M., Carignan, C., &amp; Burnham, D. (2017). The origins of babytalk: Smiling, teaching or social convergence? Royal Society Open Science, 4(8), 170306. </bibtext> </blist> <blist> <bibl id="bib38" idref="ref109" type="bt">38</bibl> <bibtext>Kantartzis, K., Imai, M., &amp; Kita, S. (2011). Japanese sound‐symbolism facilitates word learning in English‐speaking children. Cognitive Science, 35(3), 575–586. </bibtext> </blist> <blist> <bibl id="bib39" idref="ref62" type="bt">39</bibl> <bibtext>Kaye, K. (1980). Why we don't talk ‘baby talk’ to babies. Journal of Child Language, 7(03), 489–507. </bibtext> </blist> <blist> <bibl id="bib40" idref="ref47" type="bt">40</bibl> <bibtext>Kirchhoff, K., &amp; Schimmel, S. (2005). Statistical properties of infant‐directed versus adult‐directed speech: Insights from speech recognition. The Journal of the Acoustical Society of America, 117(4), 2238–2246. </bibtext> </blist> <blist> <bibl id="bib41" idref="ref29" type="bt">41</bibl> <bibtext>Kohonen, T. (1988). The ‘neural’ phonetic typewriter. Computer, 21(3), 11–22. </bibtext> </blist> <blist> <bibl id="bib42" idref="ref22" type="bt">42</bibl> <bibtext>Kuhl, P. K. (1993). Early linguistic experience and phonetic perception: Implications for theories of developmental speech perception. Journal of Phonetics, 21, 125–139. </bibtext> </blist> <blist> <bibl id="bib43" idref="ref14" type="bt">43</bibl> <bibtext>Kuhl, P. K. (2000). A new view of language acquisition. Proceedings of the National Academy of Sciences, 97(22), 11850–11857. </bibtext> </blist> <blist> <bibl id="bib44" idref="ref2" type="bt">44</bibl> <bibtext>Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., &amp; Lacerda, F. (1997). Cross‐language analysis of phonetic units in language addressed to infants. Science, 277(5326), 684–686. </bibtext> </blist> <blist> <bibl id="bib45" idref="ref59" type="bt">45</bibl> <bibtext>Lahey, M., &amp; Ernestus, M. (2014). Pronunciation variation in infant‐directed speech: Phonetic reduction of two highly frequent words. Language Learning and Development, 10(4), 308–327. </bibtext> </blist> <blist> <bibl id="bib46" idref="ref25" type="bt">46</bibl> <bibtext>Lake, B., Vallabha, G., &amp; McClelland, J. (2009). Modeling unsupervised perceptual category learning. IEEE Transactions on Autonomous Mental Development, 1(1), 35–43. https://doi.org/10.1109/TAMD.2009.2021703. </bibtext> </blist> <blist> <bibl id="bib47" idref="ref136" type="bt">47</bibl> <bibtext>Larsen, E., Dupoux, E., &amp; Cristia, A. (2017). Relating unsupervised word segmentation to reported vocabulary acquisition. In Proceedings of Interspeech (pp. 2198–2202). </bibtext> </blist> <blist> <bibl id="bib48" idref="ref116" type="bt">48</bibl> <bibtext>Leben, W. R. (1973). Suprasegmental phonology. (Unpublished doctoral dissertation). Cambridge, MA: Massachusetts Institute of Technology. </bibtext> </blist> <blist> <bibl id="bib49" type="bt">49</bibl> <bibtext>Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&amp;H theory. In Hardcastle &amp; Marchal (Eds.), Speech production and speech modelling (pp. 403–439). Dordrecht, the Netherlands: Springer. </bibtext> </blist> <blist> <bibl id="bib50" idref="ref119" type="bt">50</bibl> <bibtext>Lindblom, B. (1992). Phonological units as adaptive emergents of lexical development. Phonological Development: Models, Research, Implications, 131, 163. </bibtext> </blist> <blist> <bibl id="bib51" idref="ref39" type="bt">51</bibl> <bibtext>Liu, H.‐M., Kuhl, P. K., &amp; Tsao, F.‐M. (2003). An association between mothers’ speech clarity and infants’ speech discrimination skills. Developmental Science, 6(3), F1–F10. </bibtext> </blist> <blist> <bibl id="bib52" idref="ref140" type="bt">52</bibl> <bibtext>Lively, S. E., Logan, J. S., &amp; Pisoni, D. B. (1993). Training Japanese listeners to identify English/r/and/l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. The Journal of the Acoustical Society of America, 94(3), 1242–1255. </bibtext> </blist> <blist> <bibl id="bib53" type="bt">53</bibl> <bibtext>Ludusan, B., Cristia, A., Martin, A., Mazuka, R., &amp; Dupoux, E. (2016). Learnability of prosodic boundaries: Is infant‐directed speech easier? The Journal of the Acoustical Society of America, 140(2), 1239–1250. </bibtext> </blist> <blist> <bibl id="bib54" idref="ref134" type="bt">54</bibl> <bibtext>Ludusan, B., Mazuka, R., Bernard, M., Cristia, A., &amp; Dupoux, E. (2017). The role of prosody and speech register in word segmentation: A computational modelling perspective. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 178–183). </bibtext> </blist> <blist> <bibl id="bib55" idref="ref132" type="bt">55</bibl> <bibtext>Ludusan, B., Seidl, A., Dupoux, E., &amp; Cristia, A. (2015). Motif discovery in infant‐and adult‐directed speech. In Conference on Empirical Methods in Natural Language Processing (p. 93–102). Proceedings of CogACLL2015, (pp. 93–102) </bibtext> </blist> <blist> <bibl id="bib56" idref="ref72" type="bt">56</bibl> <bibtext>Martin, A., Igarashi, Y., Jincho, N., &amp; Mazuka, R. (2016). Utterances in infant‐directed speech are shorter, not slower. Cognition, 156, 52–59. </bibtext> </blist> <blist> <bibl id="bib57" idref="ref139" type="bt">57</bibl> <bibtext>Martin, A., Peperkamp, S., &amp; Dupoux, E. (2013). Learning phonemes with a proto‐lexicon. Cognitive Science, 37(1), 103–124. </bibtext> </blist> <blist> <bibl id="bib58" idref="ref51" type="bt">58</bibl> <bibtext>Martin, A., Schatz, T., Versteegh, M., Miyazawa, K., Mazuka, R., Dupoux, E., &amp; Cristia, A. (2015). Mothers speak less clearly to infants than to adults: A comprehensive test of the hyperarticulation hypothesis. Psychological Science, 26(3), 341–347. </bibtext> </blist> <blist> <bibl id="bib59" idref="ref79" type="bt">59</bibl> <bibtext>Martin, A., Utsugi, A., &amp; Mazuka, R. (2014). The multidimensional nature of hyperspeech: Evidence from Japanese vowel devoicing. Cognition, 132(2), 216–228. </bibtext> </blist> <blist> <bibl id="bib60" idref="ref23" type="bt">60</bibl> <bibtext>Maye, J., Werker, J. F., &amp; Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82(3), B101–B111. </bibtext> </blist> <blist> <bibl id="bib61" idref="ref81" type="bt">61</bibl> <bibtext>Mazuka, R., Igarashi, Y., &amp; Nishikawa, K. (2006). Input for Learning Japanese: RIKEN Japanese Mother‐Infant Conversation Corpus. Technical report of IEICE, TL2006‐16, 106 (165), 11–15. </bibtext> </blist> <blist> <bibl id="bib62" idref="ref66" type="bt">62</bibl> <bibtext>Mazuka, R., Kondo, T., &amp; Hayashi, A. (2008). Japanese mothers’ use of specialized vocabulary in infant‐directed speech: Infant‐directed vocabulary in Japanese. In N. Matasaka (Ed.), The origins of language (pp. 39–58). New York: Springer. </bibtext> </blist> <blist> <bibl id="bib63" idref="ref26" type="bt">63</bibl> <bibtext>McMurray, B., Aslin, R. N., &amp; Toscano, J. C. (2009). Statistical learning of phonetic categories: Insights from a computational approach. Developmental Science, 12(3), 369–378. https://doi.org/10.1111/j.1467-7687.2009.00822.x. </bibtext> </blist> <blist> <bibl id="bib64" idref="ref40" type="bt">64</bibl> <bibtext>McMurray, B., Kovack‐Lesh, K. A., Goodwin, D., &amp; McEchron, W. (2013). Infant directed speech and the development of speech perception: Enhancing development or an unintended consequence? Cognition, 129(2), 362–378. </bibtext> </blist> <blist> <bibl id="bib65" idref="ref120" type="bt">65</bibl> <bibtext>Metsala, J. L., &amp; Walley, A. C. (1998). Spoken vocabulary growth and the segmental restructuring of lexical representations: Precursors to phonemic awareness and early reading ability. In J. Metsala, &amp; L. Ehri (Eds.), Word recognition in beginning literacy (pp. 89–120). Mahwah, NJ: Lawrence Erlbaum Associates Publishers. </bibtext> </blist> <blist> <bibl id="bib66" idref="ref107" type="bt">66</bibl> <bibtext>Miyazaki, M., Hidaka, S., Imai, M., Yeung, H. H., Kantartzis, K., Okada, H., &amp; Kita, S. (2013). The facilitatory role of sound symbolism in infant word learning. In M. Knauff, M. Pauen, N. Sebanz &amp; N. Matasaka (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society (Vol. 1, pp. 3080–3085). Austin, TX: Cognitive Science Society. </bibtext> </blist> <blist> <bibl id="bib67" idref="ref76" type="bt">67</bibl> <bibtext>Miyazawa, K., Shinya, T., Martin, A., Kikuchi, H., &amp; Mazuka, R. (2017). Vowels in infant‐directed speech: More breathy and more variable, but not clearer. Cognition, 166, 84–93. </bibtext> </blist> <blist> <bibl id="bib68" idref="ref86" type="bt">68</bibl> <bibtext>Moore, B. C. J. (2012). An introduction to the psychology of hearing (6th ed.). Leiden: Brill. </bibtext> </blist> <blist> <bibl id="bib69" idref="ref16" type="bt">69</bibl> <bibtext>Mullennix, J. W., Pisoni, D. B., &amp; Martin, C. S. (1989). Some effects of talker variability on spoken word recognition. The Journal of the Acoustical Society of America, 85(1), 365–378. </bibtext> </blist> <blist> <bibl id="bib70" idref="ref6" type="bt">70</bibl> <bibtext>Newport, E., Gleitman, H., &amp; Gleitman, L. (1977). Mother, I'd rather do it myself: Some effects and non‐effects of maternal speech style. In C. Snow &amp; C. Ferguson (Eds.), Talking to children: Language input and acquisition. New York: Cambridge University Press. </bibtext> </blist> <blist> <bibl id="bib71" idref="ref137" type="bt">71</bibl> <bibtext>Ngon, C., Martin, A., Dupoux, E., Cabrol, D., Dutat, M., &amp; Peperkamp, S. (2013). (Non)words, (non)words, (non)words: Evidence for a protolexicon during the first year of life. Developmental Science, 16(1), 24–34. </bibtext> </blist> <blist> <bibl id="bib72" idref="ref114" type="bt">72</bibl> <bibtext>Ota, M., &amp; Skarabela, B. (2016). Reduplicated words are easier to learn. Language Learning and Development, 12, 380–397. </bibtext> </blist> <blist> <bibl id="bib73" idref="ref113" type="bt">73</bibl> <bibtext>Ota, M., &amp; Skarabela, B. (2017). Reduplication facilitates early word segmentation. Journal of Child Language, 45, 204–218. </bibtext> </blist> <blist> <bibl id="bib74" idref="ref155" type="bt">74</bibl> <bibtext>Papoušek, M., &amp; Hwang, S.‐F. C. (1991). Tone and intonation in Mandarin babytalk to presyllabic infants: Comparison with registers of adult conversation and foreign language instruction. Applied Psycholinguistics, 12(04), 481–504. </bibtext> </blist> <blist> <bibl id="bib75" idref="ref7" type="bt">75</bibl> <bibtext>Phillips, J. R. (1973). Syntax and vocabulary of mothers’ speech to young children: Age and sex comparisons. Child Development, 44, 182–185. </bibtext> </blist> <blist> <bibl id="bib76" idref="ref121" type="bt">76</bibl> <bibtext>Pierrehumbert, J. B. (2003). Phonetic diversity, statistical learning, and acquisition of phonology. Language and Speech, 46(2–3), 115–154. </bibtext> </blist> <blist> <bibl id="bib77" idref="ref156" type="bt">77</bibl> <bibtext>Reilly, J. S., &amp; Bellugi, U. (1996). Competition on the face: Affect and language in ASL motherese. Journal of Child Language, 23(1), 219–239. </bibtext> </blist> <blist> <bibl id="bib78" idref="ref18" type="bt">78</bibl> <bibtext>Rost, G. C., &amp; McMurray, B. (2009). Speaker variability augments phonological processing in early word learning. Developmental Science, 12(2), 339–349. </bibtext> </blist> <blist> <bibl id="bib79" idref="ref19" type="bt">79</bibl> <bibtext>Rost, G. C., &amp; McMurray, B. (2010). Finding the signal by adding noise: The role of noncontrastive phonetic variability in early word learning. Infancy, 15(6), 608–635. </bibtext> </blist> <blist> <bibl id="bib80" idref="ref17" type="bt">80</bibl> <bibtext>Ryalls, B. O., &amp; Pisoni, D. B. (1997). The effect of talker variability on word recognition in preschool children. Developmental Psychology, 33(3), 441. </bibtext> </blist> <blist> <bibl id="bib81" idref="ref84" type="bt">81</bibl> <bibtext>Sagisaka, Y., Takeda, K., Abel, M., Katagiri, S., Umeda, T., &amp; Kuwabara, H. (1990). A large‐scale Japanese speech database. In Proceedings of International Conference on Spoken Language Processing (ICSLP'90, Kobe), Vol. 2, pp. 1089–1092. </bibtext> </blist> <blist> <bibl id="bib82" idref="ref103" type="bt">82</bibl> <bibtext>Saji, N., &amp; Imai, M. (2013). Onomatope kenkyu no shatei—chikadzuku oto to imi [The Role of Iconicity in Lexical Development]. In K. Shinohara &amp; R. Uno (Eds.), Onomatope kenkyu no shatei ‐ chikadzuku oto to imi (Sound Symbolism and Mimetics) (pp. 151–166). Tokyo, Japan: Hituji Syobo. </bibtext> </blist> <blist> <bibl id="bib83" idref="ref89" type="bt">83</bibl> <bibtext>Sakoe, H., &amp; Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49. </bibtext> </blist> <blist> <bibl id="bib84" idref="ref49" type="bt">84</bibl> <bibtext>Schatz, T. (2016). ABX‐Discriminability Measures and Applications (Unpublished doctoral dissertation). Paris: Ecole Normale Supérieure. </bibtext> </blist> <blist> <bibl id="bib85" idref="ref88" type="bt">85</bibl> <bibtext>Schatz, T., Peddinti, V., Bach, F., Jansen, A., Hermansky, H., &amp; Dupoux, E. (2013). Evaluating speech features with the Minimal‐Pair ABX task: Analysis of the classical MFC/PLP pipeline. In F. Bimbot et al. (Ed.), Proceedings of Interspeech (pp. 1781–1785). </bibtext> </blist> <blist> <bibl id="bib86" idref="ref4" type="bt">86</bibl> <bibtext>Soderstrom, M. (2007). Beyond babytalk: Re‐evaluating the nature and content of speech input to preverbal infants. Developmental Review, 27(4), 501–532. </bibtext> </blist> <blist> <bibl id="bib87" idref="ref153" type="bt">87</bibl> <bibtext>Soderstrom, M., &amp; Wittebolle, K. (2013). When do caregivers talk? The influences of activity and time of day on caregiver speech and child vocalizations in two childcare environments. PLoS ONE, 8(11), e80646. </bibtext> </blist> <blist> <bibl id="bib88" idref="ref164" type="bt">88</bibl> <bibtext>Trueswell, J. C., Lin, Y., Armstrong, B., Cartmill, E. A., Goldin‐Meadow, S., &amp; Gleitman, L. R. (2016). Perceiving referential intent: Dynamics of reference in natural parent–child interactions. Cognition, 148, 117–135. </bibtext> </blist> <blist> <bibl id="bib89" idref="ref70" type="bt">89</bibl> <bibtext>Tsuji, S., Nishikawa, K., &amp; Mazuka, R. (2014). Segmental distributions and consonant‐vowel association patterns in Japanese infant‐and adult‐directed speech. Journal of Child Language, 41(06), 1276–1304. </bibtext> </blist> <blist> <bibl id="bib90" idref="ref41" type="bt">90</bibl> <bibtext>Uther, M., Knoll, M. A., &amp; Burnham, D. (2007). Do you speak E‐NG‐LI‐SH? A comparison of foreigner‐and infant‐directed speech. Speech Communication, 49(1), 2–7. </bibtext> </blist> <blist> <bibl id="bib91" idref="ref27" type="bt">91</bibl> <bibtext>Vallabha, G. K., McClelland, J. L., Pons, F., Werker, J. F., &amp; Amano, S. (2007). Unsupervised learning of vowel categories from infant‐directed speech. Proceedings of the National Academy of Sciences, 104(33), 13273–13278. </bibtext> </blist> <blist> <bibl id="bib92" idref="ref31" type="bt">92</bibl> <bibtext>Varadarajan, B., Khudanpur, S., &amp; Dupoux, E. (2008). Unsupervised learning of acoustic sub‐word units. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers (pp. 165–168). Association for Computational Linguistics. </bibtext> </blist> <blist> <bibl id="bib93" idref="ref55" type="bt">93</bibl> <bibtext>Versteegh, M., Anguera, X., Jansen, A., &amp; Dupoux, E. (2016). The Zero Resource Speech Challenge 2015: Proposed approaches and results. Procedia Computer Science, 81, 67–72. </bibtext> </blist> <blist> <bibl id="bib94" idref="ref52" type="bt">94</bibl> <bibtext>Werker, J. F., &amp; Curtin, S. (2005). PRIMIR: A developmental framework of infant speech processing. Language Learning and Development, 1(2), 197–234. </bibtext> </blist> </ref> <p>PHOTO (COLOR): Schematic view of separation, variability, and discriminability between two categories A and B (left), and a possible clustering obtained from the distributions (right). Separation measures the distance between the center of categories A and B; it is computed as the distance between the medoids mA and mB. Variability measures the spatial spread of tokens within a given category; it is computed as the average distance between tokens in a category. Discriminability depends on both variability and separation; it is quantified with an ABX score as the probability that a given token x (say, of A) is less distant to another token a of A than to a token b of B.</p> <p>PHOTO (COLOR): Schematic view of (A) bottom‐up and (B) top‐down models of phonetic learning, together with ABX discriminability as a proxy for measuring the effect of adult‐directed speech (ADS) versus infant‐directed speech (IDS) on learnability.</p> <p>PHOTO (COLOR): Number of tokens used in Exp. 1 (A) and Exp. 3, with (B) and without (C) onomatopoeias, per speaker. For Exp. 3, boxplots show the distribution of number of tokens within the 100 sampled lexicons.</p> <p>PHOTO (COLOR): Acoustic distinctiveness scores computed on word types common to infant‐directed speech (IDS) and adult‐directed speech (ADS) (panels A, B, C, E, F, G), or computed on word types common to ADS and RS (control condition; panels D and H). Upper panels display the distribution of the scores across speakers, as well as means within a speech register (red horizontal lines). Gray lines connect data points corresponding to the same caregiver in both registers (either ADS‐IDS or ADS‐RS). Bottom panels show the distribution of IDS minus ADS (or RS minus ADS) score differences. Densities to the right of the red zero line denote higher scores for IDS (or RS). A, E: Mean between‐category separation (ADS vs. IDS). B, F: Mean within‐category variability (ADS vs. IDS). C, G: Mean ABX discrimination score (ADS vs. IDS). D, H: Mean ABX discrimination score (ADS vs. RS; control condition). N.S., Non‐significant difference. p &lt; .001. p &lt; .0001.</p> <p>PHOTO (COLOR): Global phonological density scores (mean normalized edit distance) for adult‐directed speech (ADS) and infant‐directed speech (IDS), computed on lexicons matched for number of types across the two registers. Upper panels display the distribution of the scores across individual speakers, as well as means within a speech register (red horizontal lines). Gray lines connect data points corresponding to the same caregiver in both registers. Bottom panels show the distribution of IDS minus ADS score differences. Densities to the right of the red zero line denote higher scores for IDS. A, C: Samples from base corpus. B, D: Samples from base corpus after onomatopoeia removal. N.S., Non‐significant difference. *p &lt; .0001.</p> <p>PHOTO (COLOR): Acoustic‐based ABX word discrimination error in adult‐directed speech (ADS) and infant‐directed speech (IDS) computed on lexicons matched for number of word types across the two registers. Upper panels display the distribution of the scores across speakers, as well as means within a speech register (red horizontal lines). Gray lines connect data points corresponding to the same caregiver in both registers. Bottom panels show the distribution of IDS minus ADS score differences. Densities to the right of the red zero line denote higher error rates for IDS. A, C: Samples from base corpus. B, D: Samples from base corpus after onomatopoeia removal. p &lt; .05.</p> <p>PHOTO (COLOR): Summary of infant‐directed speech (IDS) characteristics relative to adult‐directed speech (ADS) in a top‐down model of phonetic category learning for the RIKEN corpus. Enhanced characteristics of IDS relative to ADS are shown in green, while those for which the opposite trend is observed are shown in red.</p> <aug> <p>By Adriana Guevara‐Rukoz; Alejandrina Cristia; Bogdan Ludusan; Roland Thiollière; Andrew Martin; Reiko Mazuka and Emmanuel Dupoux</p> </aug>
Header	DbId: eric DbLabel: ERIC An: EJ1185186 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: Are Words Easier to Learn from Infant- than Adult-Directed Speech? A Quantitative Corpus-Based Investigation – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Guevara-Rukoz%2C+Adriana%22">Guevara-Rukoz, Adriana</searchLink><br /><searchLink fieldCode="AR" term="%22Cristia%2C+Alejandrina%22">Cristia, Alejandrina</searchLink><br /><searchLink fieldCode="AR" term="%22Ludusan%2C+Bogdan%22">Ludusan, Bogdan</searchLink><br /><searchLink fieldCode="AR" term="%22Thiollière%2C+Roland%22">Thiollière, Roland</searchLink><br /><searchLink fieldCode="AR" term="%22Martin%2C+Andrew%22">Martin, Andrew</searchLink><br /><searchLink fieldCode="AR" term="%22Mazuka%2C+Reiko%22">Mazuka, Reiko</searchLink><br /><searchLink fieldCode="AR" term="%22Dupoux%2C+Emmanuel%22">Dupoux, Emmanuel</searchLink> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Cognitive+Science%22"><i>Cognitive Science</i></searchLink>. Jul 2018 42(5):1586-1617. – Name: Avail Label: Availability Group: Avail Data: Wiley-Blackwell. 350 Main Street, Malden, MA 02148. Tel: 800-835-6770; Tel: 781-388-8598; Fax: 781-388-8232; e-mail: cs-journals@wiley.com; Web site: http://www.wiley.com/WileyCDA – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 32 – Name: DatePubCY Label: Publication Date Group: Date Data: 2018 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Statistical+Analysis%22">Statistical Analysis</searchLink><br /><searchLink fieldCode="DE" term="%22Phonemes%22">Phonemes</searchLink><br /><searchLink fieldCode="DE" term="%22Phonology%22">Phonology</searchLink><br /><searchLink fieldCode="DE" term="%22Infants%22">Infants</searchLink><br /><searchLink fieldCode="DE" term="%22Japanese%22">Japanese</searchLink><br /><searchLink fieldCode="DE" term="%22Language+Acquisition%22">Language Acquisition</searchLink><br /><searchLink fieldCode="DE" term="%22Acoustics%22">Acoustics</searchLink><br /><searchLink fieldCode="DE" term="%22Adults%22">Adults</searchLink><br /><searchLink fieldCode="DE" term="%22Databases%22">Databases</searchLink><br /><searchLink fieldCode="DE" term="%22Learning+Processes%22">Learning Processes</searchLink><br /><searchLink fieldCode="DE" term="%22Speech+Communication%22">Speech Communication</searchLink><br /><searchLink fieldCode="DE" term="%22Vocabulary+Development%22">Vocabulary Development</searchLink><br /><searchLink fieldCode="DE" term="%22Role%22">Role</searchLink><br /><searchLink fieldCode="DE" term="%22Interpersonal+Communication%22">Interpersonal Communication</searchLink><br /><searchLink fieldCode="DE" term="%22Comparative+Analysis%22">Comparative Analysis</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1111/cogs.12616 – Name: ISSN Label: ISSN Group: ISSN Data: 0364-0213 – Name: Abstract Label: Abstract Group: Ab Data: We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: The IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2018 – Name: AN Label: Accession Number Group: ID Data: EJ1185186
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1185186
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1111/cogs.12616 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 32 StartPage: 1586 Subjects: – SubjectFull: Statistical Analysis Type: general – SubjectFull: Phonemes Type: general – SubjectFull: Phonology Type: general – SubjectFull: Infants Type: general – SubjectFull: Japanese Type: general – SubjectFull: Language Acquisition Type: general – SubjectFull: Acoustics Type: general – SubjectFull: Adults Type: general – SubjectFull: Databases Type: general – SubjectFull: Learning Processes Type: general – SubjectFull: Speech Communication Type: general – SubjectFull: Vocabulary Development Type: general – SubjectFull: Role Type: general – SubjectFull: Interpersonal Communication Type: general – SubjectFull: Comparative Analysis Type: general Titles: – TitleFull: Are Words Easier to Learn from Infant- than Adult-Directed Speech? A Quantitative Corpus-Based Investigation Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Guevara-Rukoz, Adriana – PersonEntity: Name: NameFull: Cristia, Alejandrina – PersonEntity: Name: NameFull: Ludusan, Bogdan – PersonEntity: Name: NameFull: Thiollière, Roland – PersonEntity: Name: NameFull: Martin, Andrew – PersonEntity: Name: NameFull: Mazuka, Reiko – PersonEntity: Name: NameFull: Dupoux, Emmanuel IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 07 Type: published Y: 2018 Identifiers: – Type: issn-print Value: 0364-0213 Numbering: – Type: volume Value: 42 – Type: issue Value: 5 Titles: – TitleFull: Cognitive Science Type: main
ResultId	1