Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach.

Saved in:
Bibliographic Details
Title: Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach.
Authors: Guohong Fu1 ghfu@hotmail.com, Kang-Kwong Luke2 kkluke@hkusua.hku.hk, Webster, Jonathan J.3 ctjjw@cityu.edu.hk
Source: International Journal of Computer Processing of Oriental Languages. Jun2007, Vol. 20 Issue 2/3, p165-179. 15p. 1 Diagram, 5 Charts.
Subjects: Chinese abbreviations, Chinese language, Signs & symbols, Hidden Markov models, Language & languages
Abstract: Chinese news texts often contain a number of abbreviations without explicitly defining their full-forms. Therefore, expanding abbreviations to their original full-forms plays an important role in improving accuracy of the information extraction and retrieval systems for Chinese. In this paper, we present a hybrid approach to automatic expansion of abbreviations in Chinese news texts. Generally, Chinese abbreviations are produced from their original full-forms via reduction, elimination or generalization. To ensure every abbreviation can successfully be expanded, each abbreviation under expansion is assumed to be created by these three methods, respectively. Based on this assumption, a mapping table between shortened words and their matrix words, and a dictionary of short-form/full-form pairs are used to generate all possible expansions for abbreviations. For an ambiguous abbreviation with mutiple expansion candidates, then hidden Markov models are employed to rank all its expansion candidates and select a proper one with the maximum score. In order to further improve expansion performance, some linguistic knowledge like discourse information and abbreviation patterns are utilized to correct possible expansion errors. Evaluation on an abbreviation-expanded corpus built from the Peking University Corpus showed that our approach can achieve 86.3% and 83.8% on average in precision and recall respectively for various types of abbreviations in Chinese news texts. [ABSTRACT FROM AUTHOR]
Copyright of International Journal of Computer Processing of Oriental Languages is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:Chinese news texts often contain a number of abbreviations without explicitly defining their full-forms. Therefore, expanding abbreviations to their original full-forms plays an important role in improving accuracy of the information extraction and retrieval systems for Chinese. In this paper, we present a hybrid approach to automatic expansion of abbreviations in Chinese news texts. Generally, Chinese abbreviations are produced from their original full-forms via reduction, elimination or generalization. To ensure every abbreviation can successfully be expanded, each abbreviation under expansion is assumed to be created by these three methods, respectively. Based on this assumption, a mapping table between shortened words and their matrix words, and a dictionary of short-form/full-form pairs are used to generate all possible expansions for abbreviations. For an ambiguous abbreviation with mutiple expansion candidates, then hidden Markov models are employed to rank all its expansion candidates and select a proper one with the maximum score. In order to further improve expansion performance, some linguistic knowledge like discourse information and abbreviation patterns are utilized to correct possible expansion errors. Evaluation on an abbreviation-expanded corpus built from the Peking University Corpus showed that our approach can achieve 86.3% and 83.8% on average in precision and recall respectively for various types of abbreviations in Chinese news texts. [ABSTRACT FROM AUTHOR]
ISSN:02194279