Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach.

Saved in:
Bibliographic Details
Title: Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach.
Authors: Guohong Fu1 ghfu@hotmail.com, Kang-Kwong Luke2 kkluke@hkusua.hku.hk, Webster, Jonathan J.3 ctjjw@cityu.edu.hk
Source: International Journal of Computer Processing of Oriental Languages. Jun2007, Vol. 20 Issue 2/3, p165-179. 15p. 1 Diagram, 5 Charts.
Subjects: Chinese abbreviations, Chinese language, Signs & symbols, Hidden Markov models, Language & languages
Abstract: Chinese news texts often contain a number of abbreviations without explicitly defining their full-forms. Therefore, expanding abbreviations to their original full-forms plays an important role in improving accuracy of the information extraction and retrieval systems for Chinese. In this paper, we present a hybrid approach to automatic expansion of abbreviations in Chinese news texts. Generally, Chinese abbreviations are produced from their original full-forms via reduction, elimination or generalization. To ensure every abbreviation can successfully be expanded, each abbreviation under expansion is assumed to be created by these three methods, respectively. Based on this assumption, a mapping table between shortened words and their matrix words, and a dictionary of short-form/full-form pairs are used to generate all possible expansions for abbreviations. For an ambiguous abbreviation with mutiple expansion candidates, then hidden Markov models are employed to rank all its expansion candidates and select a proper one with the maximum score. In order to further improve expansion performance, some linguistic knowledge like discourse information and abbreviation patterns are utilized to correct possible expansion errors. Evaluation on an abbreviation-expanded corpus built from the Peking University Corpus showed that our approach can achieve 86.3% and 83.8% on average in precision and recall respectively for various types of abbreviations in Chinese news texts. [ABSTRACT FROM AUTHOR]
Copyright of International Journal of Computer Processing of Oriental Languages is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Links:
  – Type: pdflink
Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 30028073
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Guohong+Fu%22">Guohong Fu</searchLink><relatesTo>1</relatesTo><i> ghfu@hotmail.com</i><br /><searchLink fieldCode="AR" term="%22Kang-Kwong+Luke%22">Kang-Kwong Luke</searchLink><relatesTo>2</relatesTo><i> kkluke@hkusua.hku.hk</i><br /><searchLink fieldCode="AR" term="%22Webster%2C+Jonathan+J%2E%22">Webster, Jonathan J.</searchLink><relatesTo>3</relatesTo><i> ctjjw@cityu.edu.hk</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22International+Journal+of+Computer+Processing+of+Oriental+Languages%22">International Journal of Computer Processing of Oriental Languages</searchLink>. Jun2007, Vol. 20 Issue 2/3, p165-179. 15p. 1 Diagram, 5 Charts.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Chinese+abbreviations%22">Chinese abbreviations</searchLink><br /><searchLink fieldCode="DE" term="%22Chinese+language%22">Chinese language</searchLink><br /><searchLink fieldCode="DE" term="%22Signs+%26+symbols%22">Signs & symbols</searchLink><br /><searchLink fieldCode="DE" term="%22Hidden+Markov+models%22">Hidden Markov models</searchLink><br /><searchLink fieldCode="DE" term="%22Language+%26+languages%22">Language & languages</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Chinese news texts often contain a number of abbreviations without explicitly defining their full-forms. Therefore, expanding abbreviations to their original full-forms plays an important role in improving accuracy of the information extraction and retrieval systems for Chinese. In this paper, we present a hybrid approach to automatic expansion of abbreviations in Chinese news texts. Generally, Chinese abbreviations are produced from their original full-forms via reduction, elimination or generalization. To ensure every abbreviation can successfully be expanded, each abbreviation under expansion is assumed to be created by these three methods, respectively. Based on this assumption, a mapping table between shortened words and their matrix words, and a dictionary of short-form/full-form pairs are used to generate all possible expansions for abbreviations. For an ambiguous abbreviation with mutiple expansion candidates, then hidden Markov models are employed to rank all its expansion candidates and select a proper one with the maximum score. In order to further improve expansion performance, some linguistic knowledge like discourse information and abbreviation patterns are utilized to correct possible expansion errors. Evaluation on an abbreviation-expanded corpus built from the Peking University Corpus showed that our approach can achieve 86.3% and 83.8% on average in precision and recall respectively for various types of abbreviations in Chinese news texts. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of International Journal of Computer Processing of Oriental Languages is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=30028073
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 15
        StartPage: 165
    Subjects:
      – SubjectFull: Chinese abbreviations
        Type: general
      – SubjectFull: Chinese language
        Type: general
      – SubjectFull: Signs & symbols
        Type: general
      – SubjectFull: Hidden Markov models
        Type: general
      – SubjectFull: Language & languages
        Type: general
    Titles:
      – TitleFull: Automatic Expansion of Abbreviations in Chinese News Text:: A Hybrid Approach.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Guohong Fu
      – PersonEntity:
          Name:
            NameFull: Kang-Kwong Luke
      – PersonEntity:
          Name:
            NameFull: Webster, Jonathan J.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 06
              Text: Jun2007
              Type: published
              Y: 2007
          Identifiers:
            – Type: issn-print
              Value: 02194279
          Numbering:
            – Type: volume
              Value: 20
            – Type: issue
              Value: 2/3
          Titles:
            – TitleFull: International Journal of Computer Processing of Oriental Languages
              Type: main
ResultId 1