A new corpus platform for the Texas German Dialect Project.

Saved in:
Bibliographic Details
Title: A new corpus platform for the Texas German Dialect Project.
Authors: Boas, Hans C.1 (AUTHOR) hcb@mail.utexas.edu, Schmidt, Thomas2 (AUTHOR) thomas@linguisticbits.de, Blevins, Margaret1 (AUTHOR) mblevins@utexas.edu
Source: Language Resources & Evaluation. Sep2026, Vol. 60 Issue 3, p1-34. 34p.
Abstract: Texas German is a contact variety that is the result of dialect mixing of several German dialects brought to Texas from central Europe starting in the 1830s. Since 2001, the Texas German Dialect Project has been assembling a large collection of spoken data documenting this unique variety. The present paper describes how a substantial part of this collection was developed into an annotated corpus and how the corpus is now available through a corpus platform based on the ZuMult technology. We start with an outline of the project’s development and its established processes of data collection, transcription, and dissemination. We then explain the process by which the data were cleaned up and enriched with language tagging, orthographic normalization, lemmatization, and part-of-speech tagging. Finally, we illustrate how the new corpus platform makes these annotated data available for systematic browsing and querying. In the outlook, we sketch prospects for future development of the data and for their role in a larger landscape of comparable speech island data. [ABSTRACT FROM AUTHOR]
Copyright of Language Resources & Evaluation is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 194789935
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: A new corpus platform for the Texas German Dialect Project.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Boas%2C+Hans+C%2E%22">Boas, Hans C.</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> hcb@mail.utexas.edu</i><br /><searchLink fieldCode="AR" term="%22Schmidt%2C+Thomas%22">Schmidt, Thomas</searchLink><relatesTo>2</relatesTo> (AUTHOR)<i> thomas@linguisticbits.de</i><br /><searchLink fieldCode="AR" term="%22Blevins%2C+Margaret%22">Blevins, Margaret</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> mblevins@utexas.edu</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Language+Resources+%26+Evaluation%22">Language Resources & Evaluation</searchLink>. Sep2026, Vol. 60 Issue 3, p1-34. 34p.
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Texas German is a contact variety that is the result of dialect mixing of several German dialects brought to Texas from central Europe starting in the 1830s. Since 2001, the Texas German Dialect Project has been assembling a large collection of spoken data documenting this unique variety. The present paper describes how a substantial part of this collection was developed into an annotated corpus and how the corpus is now available through a corpus platform based on the ZuMult technology. We start with an outline of the project’s development and its established processes of data collection, transcription, and dissemination. We then explain the process by which the data were cleaned up and enriched with language tagging, orthographic normalization, lemmatization, and part-of-speech tagging. Finally, we illustrate how the new corpus platform makes these annotated data available for systematic browsing and querying. In the outlook, we sketch prospects for future development of the data and for their role in a larger landscape of comparable speech island data. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Language Resources & Evaluation is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=194789935
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s10579-025-09893-6
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 34
        StartPage: 1
    Titles:
      – TitleFull: A new corpus platform for the Texas German Dialect Project.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Boas, Hans C.
      – PersonEntity:
          Name:
            NameFull: Schmidt, Thomas
      – PersonEntity:
          Name:
            NameFull: Blevins, Margaret
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 09
              Text: Sep2026
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 1574020X
          Numbering:
            – Type: volume
              Value: 60
            – Type: issue
              Value: 3
          Titles:
            – TitleFull: Language Resources & Evaluation
              Type: main
ResultId 1