Dynamic language modeling for European Portuguese
Saved in:
| Title: | Dynamic language modeling for European Portuguese |
|---|---|
| Authors: | Martins, Ciro1,2, Teixeira, António1 ajst@ua.pt, Neto, João2 |
| Source: | Computer Speech & Language. Oct2010, Vol. 24 Issue 4, p750-773. 24p. |
| Subjects: | Portuguese language, Dylan (Computer program language), Vocabulary, Speech perception, Language & languages, Syntax (Grammar), Broadcast journalism, Information storage & retrieval systems |
| Abstract: | Abstract: This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size. [Copyright &y& Elsevier] |
| Copyright of Computer Speech & Language is the property of Academic Press Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 49809186 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Dynamic language modeling for European Portuguese – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Martins%2C+Ciro%22">Martins, Ciro</searchLink><relatesTo>1,2</relatesTo><br /><searchLink fieldCode="AR" term="%22Teixeira%2C+António%22">Teixeira, António</searchLink><relatesTo>1</relatesTo><i> ajst@ua.pt</i><br /><searchLink fieldCode="AR" term="%22Neto%2C+João%22">Neto, João</searchLink><relatesTo>2</relatesTo> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Computer+Speech+%26+Language%22">Computer Speech & Language</searchLink>. Oct2010, Vol. 24 Issue 4, p750-773. 24p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Portuguese+language%22">Portuguese language</searchLink><br /><searchLink fieldCode="DE" term="%22Dylan+%28Computer+program+language%29%22">Dylan (Computer program language)</searchLink><br /><searchLink fieldCode="DE" term="%22Vocabulary%22">Vocabulary</searchLink><br /><searchLink fieldCode="DE" term="%22Speech+perception%22">Speech perception</searchLink><br /><searchLink fieldCode="DE" term="%22Language+%26+languages%22">Language & languages</searchLink><br /><searchLink fieldCode="DE" term="%22Syntax+%28Grammar%29%22">Syntax (Grammar)</searchLink><br /><searchLink fieldCode="DE" term="%22Broadcast+journalism%22">Broadcast journalism</searchLink><br /><searchLink fieldCode="DE" term="%22Information+storage+%26+retrieval+systems%22">Information storage & retrieval systems</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Abstract: This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size. [Copyright &y& Elsevier] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Computer Speech & Language is the property of Academic Press Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=49809186 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1016/j.csl.2010.02.003 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 24 StartPage: 750 Subjects: – SubjectFull: Portuguese language Type: general – SubjectFull: Dylan (Computer program language) Type: general – SubjectFull: Vocabulary Type: general – SubjectFull: Speech perception Type: general – SubjectFull: Language & languages Type: general – SubjectFull: Syntax (Grammar) Type: general – SubjectFull: Broadcast journalism Type: general – SubjectFull: Information storage & retrieval systems Type: general Titles: – TitleFull: Dynamic language modeling for European Portuguese Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Martins, Ciro – PersonEntity: Name: NameFull: Teixeira, António – PersonEntity: Name: NameFull: Neto, João IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 10 Text: Oct2010 Type: published Y: 2010 Identifiers: – Type: issn-print Value: 08852308 Numbering: – Type: volume Value: 24 – Type: issue Value: 4 Titles: – TitleFull: Computer Speech & Language Type: main |
| ResultId | 1 |