Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach.
Saved in:
| Title: | Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach. |
|---|---|
| Authors: | Guohong Fu1 ghfu@hotmail.com, Kang-Kwong Luke2 kkluke@hkusua.hku.hk, Webster, Jonathan J.3 ctjjw@cityu.edu.hk |
| Source: | International Journal of Computer Processing of Oriental Languages. Jun2007, Vol. 20 Issue 2/3, p165-179. 15p. 1 Diagram, 5 Charts. |
| Subjects: | Chinese abbreviations, Chinese language, Signs & symbols, Hidden Markov models, Language & languages |
| Abstract: | Chinese news texts often contain a number of abbreviations without explicitly defining their full-forms. Therefore, expanding abbreviations to their original full-forms plays an important role in improving accuracy of the information extraction and retrieval systems for Chinese. In this paper, we present a hybrid approach to automatic expansion of abbreviations in Chinese news texts. Generally, Chinese abbreviations are produced from their original full-forms via reduction, elimination or generalization. To ensure every abbreviation can successfully be expanded, each abbreviation under expansion is assumed to be created by these three methods, respectively. Based on this assumption, a mapping table between shortened words and their matrix words, and a dictionary of short-form/full-form pairs are used to generate all possible expansions for abbreviations. For an ambiguous abbreviation with mutiple expansion candidates, then hidden Markov models are employed to rank all its expansion candidates and select a proper one with the maximum score. In order to further improve expansion performance, some linguistic knowledge like discourse information and abbreviation patterns are utilized to correct possible expansion errors. Evaluation on an abbreviation-expanded corpus built from the Peking University Corpus showed that our approach can achieve 86.3% and 83.8% on average in precision and recall respectively for various types of abbreviations in Chinese news texts. [ABSTRACT FROM AUTHOR] |
| Copyright of International Journal of Computer Processing of Oriental Languages is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Links: – Type: pdflink Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 30028073 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Guohong+Fu%22">Guohong Fu</searchLink><relatesTo>1</relatesTo><i> ghfu@hotmail.com</i><br /><searchLink fieldCode="AR" term="%22Kang-Kwong+Luke%22">Kang-Kwong Luke</searchLink><relatesTo>2</relatesTo><i> kkluke@hkusua.hku.hk</i><br /><searchLink fieldCode="AR" term="%22Webster%2C+Jonathan+J%2E%22">Webster, Jonathan J.</searchLink><relatesTo>3</relatesTo><i> ctjjw@cityu.edu.hk</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22International+Journal+of+Computer+Processing+of+Oriental+Languages%22">International Journal of Computer Processing of Oriental Languages</searchLink>. Jun2007, Vol. 20 Issue 2/3, p165-179. 15p. 1 Diagram, 5 Charts. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Chinese+abbreviations%22">Chinese abbreviations</searchLink><br /><searchLink fieldCode="DE" term="%22Chinese+language%22">Chinese language</searchLink><br /><searchLink fieldCode="DE" term="%22Signs+%26+symbols%22">Signs & symbols</searchLink><br /><searchLink fieldCode="DE" term="%22Hidden+Markov+models%22">Hidden Markov models</searchLink><br /><searchLink fieldCode="DE" term="%22Language+%26+languages%22">Language & languages</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Chinese news texts often contain a number of abbreviations without explicitly defining their full-forms. Therefore, expanding abbreviations to their original full-forms plays an important role in improving accuracy of the information extraction and retrieval systems for Chinese. In this paper, we present a hybrid approach to automatic expansion of abbreviations in Chinese news texts. Generally, Chinese abbreviations are produced from their original full-forms via reduction, elimination or generalization. To ensure every abbreviation can successfully be expanded, each abbreviation under expansion is assumed to be created by these three methods, respectively. Based on this assumption, a mapping table between shortened words and their matrix words, and a dictionary of short-form/full-form pairs are used to generate all possible expansions for abbreviations. For an ambiguous abbreviation with mutiple expansion candidates, then hidden Markov models are employed to rank all its expansion candidates and select a proper one with the maximum score. In order to further improve expansion performance, some linguistic knowledge like discourse information and abbreviation patterns are utilized to correct possible expansion errors. Evaluation on an abbreviation-expanded corpus built from the Peking University Corpus showed that our approach can achieve 86.3% and 83.8% on average in precision and recall respectively for various types of abbreviations in Chinese news texts. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of International Journal of Computer Processing of Oriental Languages is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=30028073 |
| RecordInfo | BibRecord: BibEntity: Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 15 StartPage: 165 Subjects: – SubjectFull: Chinese abbreviations Type: general – SubjectFull: Chinese language Type: general – SubjectFull: Signs & symbols Type: general – SubjectFull: Hidden Markov models Type: general – SubjectFull: Language & languages Type: general Titles: – TitleFull: Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Guohong Fu – PersonEntity: Name: NameFull: Kang-Kwong Luke – PersonEntity: Name: NameFull: Webster, Jonathan J. IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 06 Text: Jun2007 Type: published Y: 2007 Identifiers: – Type: issn-print Value: 02194279 Numbering: – Type: volume Value: 20 – Type: issue Value: 2/3 Titles: – TitleFull: International Journal of Computer Processing of Oriental Languages Type: main |
| ResultId | 1 |