A new corpus platform for the Texas German Dialect Project.
Saved in:
| Title: | A new corpus platform for the Texas German Dialect Project. |
|---|---|
| Authors: | Boas, Hans C.1 (AUTHOR) hcb@mail.utexas.edu, Schmidt, Thomas2 (AUTHOR) thomas@linguisticbits.de, Blevins, Margaret1 (AUTHOR) mblevins@utexas.edu |
| Source: | Language Resources & Evaluation. Sep2026, Vol. 60 Issue 3, p1-34. 34p. |
| Abstract: | Texas German is a contact variety that is the result of dialect mixing of several German dialects brought to Texas from central Europe starting in the 1830s. Since 2001, the Texas German Dialect Project has been assembling a large collection of spoken data documenting this unique variety. The present paper describes how a substantial part of this collection was developed into an annotated corpus and how the corpus is now available through a corpus platform based on the ZuMult technology. We start with an outline of the project’s development and its established processes of data collection, transcription, and dissemination. We then explain the process by which the data were cleaned up and enriched with language tagging, orthographic normalization, lemmatization, and part-of-speech tagging. Finally, we illustrate how the new corpus platform makes these annotated data available for systematic browsing and querying. In the outlook, we sketch prospects for future development of the data and for their role in a larger landscape of comparable speech island data. [ABSTRACT FROM AUTHOR] |
| Copyright of Language Resources & Evaluation is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 194789935 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: A new corpus platform for the Texas German Dialect Project. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Boas%2C+Hans+C%2E%22">Boas, Hans C.</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> hcb@mail.utexas.edu</i><br /><searchLink fieldCode="AR" term="%22Schmidt%2C+Thomas%22">Schmidt, Thomas</searchLink><relatesTo>2</relatesTo> (AUTHOR)<i> thomas@linguisticbits.de</i><br /><searchLink fieldCode="AR" term="%22Blevins%2C+Margaret%22">Blevins, Margaret</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> mblevins@utexas.edu</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Language+Resources+%26+Evaluation%22">Language Resources & Evaluation</searchLink>. Sep2026, Vol. 60 Issue 3, p1-34. 34p. – Name: Abstract Label: Abstract Group: Ab Data: Texas German is a contact variety that is the result of dialect mixing of several German dialects brought to Texas from central Europe starting in the 1830s. Since 2001, the Texas German Dialect Project has been assembling a large collection of spoken data documenting this unique variety. The present paper describes how a substantial part of this collection was developed into an annotated corpus and how the corpus is now available through a corpus platform based on the ZuMult technology. We start with an outline of the project’s development and its established processes of data collection, transcription, and dissemination. We then explain the process by which the data were cleaned up and enriched with language tagging, orthographic normalization, lemmatization, and part-of-speech tagging. Finally, we illustrate how the new corpus platform makes these annotated data available for systematic browsing and querying. In the outlook, we sketch prospects for future development of the data and for their role in a larger landscape of comparable speech island data. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Language Resources & Evaluation is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=194789935 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1007/s10579-025-09893-6 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 34 StartPage: 1 Titles: – TitleFull: A new corpus platform for the Texas German Dialect Project. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Boas, Hans C. – PersonEntity: Name: NameFull: Schmidt, Thomas – PersonEntity: Name: NameFull: Blevins, Margaret IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 09 Text: Sep2026 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 1574020X Numbering: – Type: volume Value: 60 – Type: issue Value: 3 Titles: – TitleFull: Language Resources & Evaluation Type: main |
| ResultId | 1 |