What's in a Word Family? The Assumptions of Lexical Units
Saved in:
| Title: | What's in a Word Family? The Assumptions of Lexical Units |
|---|---|
| Language: | English |
| Authors: | Phil Bennett (ORCID |
| Source: | Vocabulary Learning and Instruction. 2026 15. |
| Availability: | Castledown Publishers. Ground Level, 470 St Kilda Road, Melbourne, 3004, Australia. Tel: +61-3-7003-8355; e-mail: contact@castledown.com; Web site: https://www.castledown.com/journals/vli |
| Peer Reviewed: | Y |
| Page Count: | 26 |
| Publication Date: | 2026 |
| Document Type: | Journal Articles Reports - Research |
| Descriptors: | English, Morphemes, Etymology, Word Lists, Vocabulary, Form Classes (Languages), Dictionaries |
| ISSN: | 2981-9954 |
| Abstract: | Lemmas, flemmas, and level 6 word families (WF6) are three commonly discussed lexical units. Because each makes differing assumptions about learner knowledge, the selection of one unit over another in research or pedagogy has a great impact on interpretations of the lexical challenge. It is therefore important to fully understand these assumptions so that practitioners can select the most suitable unit for a given purpose. This study introduces an enhanced version of Nation's BNC-COCA word lists that can be used to quantify several features of lexical units. The original WF6 lists were adapted by including flemma and lemma groupings, part-of-speech (POS) tags, morphological codings, frequency data, and an expanded list of proper nouns. Analyses of lexical unit composition reveal that, owing to their much greater inclusivity than flemmas or lemmas, WF6 units provide rapid corpus coverage over the 1-2k bands, and that irregular forms make up a considerable proportion of 1k tokens regardless of the unit chosen. Accuracy checks then suggest that POS-tagged lists offer an improvement over untagged lists due to the latter's overestimation of coverage and blocking of homographic concepts. Finally, lexical and morphological profiling shows that threshold coverage values are unlikely to be reached without knowledge of at least mid-frequency headwords and tens of derivational affixes in most genres. |
| Abstractor: | As Provided |
| Notes: | https://osf.io/4mz6y |
| Entry Date: | 2026 |
| Accession Number: | EJ1501238 |
| Database: | ERIC |
| Abstract: | Lemmas, flemmas, and level 6 word families (WF6) are three commonly discussed lexical units. Because each makes differing assumptions about learner knowledge, the selection of one unit over another in research or pedagogy has a great impact on interpretations of the lexical challenge. It is therefore important to fully understand these assumptions so that practitioners can select the most suitable unit for a given purpose. This study introduces an enhanced version of Nation's BNC-COCA word lists that can be used to quantify several features of lexical units. The original WF6 lists were adapted by including flemma and lemma groupings, part-of-speech (POS) tags, morphological codings, frequency data, and an expanded list of proper nouns. Analyses of lexical unit composition reveal that, owing to their much greater inclusivity than flemmas or lemmas, WF6 units provide rapid corpus coverage over the 1-2k bands, and that irregular forms make up a considerable proportion of 1k tokens regardless of the unit chosen. Accuracy checks then suggest that POS-tagged lists offer an improvement over untagged lists due to the latter's overestimation of coverage and blocking of homographic concepts. Finally, lexical and morphological profiling shows that threshold coverage values are unlikely to be reached without knowledge of at least mid-frequency headwords and tens of derivational affixes in most genres. |
|---|---|
| ISSN: | 2981-9954 |