Dynamic language modeling for European Portuguese

Saved in:
Bibliographic Details
Title: Dynamic language modeling for European Portuguese
Authors: Martins, Ciro1,2, Teixeira, António1 ajst@ua.pt, Neto, João2
Source: Computer Speech & Language. Oct2010, Vol. 24 Issue 4, p750-773. 24p.
Subjects: Portuguese language, Dylan (Computer program language), Vocabulary, Speech perception, Language & languages, Syntax (Grammar), Broadcast journalism, Information storage & retrieval systems
Abstract: Abstract: This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size. [Copyright &y& Elsevier]
Copyright of Computer Speech & Language is the property of Academic Press Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 49809186
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Dynamic language modeling for European Portuguese
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Martins%2C+Ciro%22">Martins, Ciro</searchLink><relatesTo>1,2</relatesTo><br /><searchLink fieldCode="AR" term="%22Teixeira%2C+António%22">Teixeira, António</searchLink><relatesTo>1</relatesTo><i> ajst@ua.pt</i><br /><searchLink fieldCode="AR" term="%22Neto%2C+João%22">Neto, João</searchLink><relatesTo>2</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Computer+Speech+%26+Language%22">Computer Speech & Language</searchLink>. Oct2010, Vol. 24 Issue 4, p750-773. 24p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Portuguese+language%22">Portuguese language</searchLink><br /><searchLink fieldCode="DE" term="%22Dylan+%28Computer+program+language%29%22">Dylan (Computer program language)</searchLink><br /><searchLink fieldCode="DE" term="%22Vocabulary%22">Vocabulary</searchLink><br /><searchLink fieldCode="DE" term="%22Speech+perception%22">Speech perception</searchLink><br /><searchLink fieldCode="DE" term="%22Language+%26+languages%22">Language & languages</searchLink><br /><searchLink fieldCode="DE" term="%22Syntax+%28Grammar%29%22">Syntax (Grammar)</searchLink><br /><searchLink fieldCode="DE" term="%22Broadcast+journalism%22">Broadcast journalism</searchLink><br /><searchLink fieldCode="DE" term="%22Information+storage+%26+retrieval+systems%22">Information storage & retrieval systems</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Abstract: This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size. [Copyright &y& Elsevier]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Computer Speech & Language is the property of Academic Press Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=49809186
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1016/j.csl.2010.02.003
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 24
        StartPage: 750
    Subjects:
      – SubjectFull: Portuguese language
        Type: general
      – SubjectFull: Dylan (Computer program language)
        Type: general
      – SubjectFull: Vocabulary
        Type: general
      – SubjectFull: Speech perception
        Type: general
      – SubjectFull: Language & languages
        Type: general
      – SubjectFull: Syntax (Grammar)
        Type: general
      – SubjectFull: Broadcast journalism
        Type: general
      – SubjectFull: Information storage & retrieval systems
        Type: general
    Titles:
      – TitleFull: Dynamic language modeling for European Portuguese
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Martins, Ciro
      – PersonEntity:
          Name:
            NameFull: Teixeira, António
      – PersonEntity:
          Name:
            NameFull: Neto, João
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 10
              Text: Oct2010
              Type: published
              Y: 2010
          Identifiers:
            – Type: issn-print
              Value: 08852308
          Numbering:
            – Type: volume
              Value: 24
            – Type: issue
              Value: 4
          Titles:
            – TitleFull: Computer Speech & Language
              Type: main
ResultId 1