Use this url to cite publication: https://hdl.handle.net/20.500.12259/50111
Options
The influence of collocation segmentation and top 10 items to keyword assignment performance
Type of publication
Straipsnis konferencijos medžiagoje Web of Science ir Scopus duomenų bazėje / Article in conference proceedings in Web of Science and Scopus database (P1a)
Title
The influence of collocation segmentation and top 10 items to keyword assignment performance
Is part of
Computational linguistics and intelligent text processing : 11th international conference, CICLing 2010, Iasi, Romania, March 21-27, 2010 : proceedings / editor Gelbukh A. Berlin : Springer, 2010
Date Issued
Date Issued |
---|
2010 |
Publisher
Berlin : Springer, 2010
Publisher (trusted)
Extent
p. 648-660
Field of Science
Abstract
Automatic document annotation from a controlled conceptual thesaurus is useful for establishing precise links between similar documents. This study presents a language independent document annotation system based on features derived from a novel collocation segmentation method. Using the multilingual conceptual thesaurus EuroVoc, we evaluate filtered and unfiltered version of the method, comparing it against other language independent methods based on single words and bigrams. Testing our new method against the manually tagged multilingual corpus Acquis Communautaire 3.0 (AC) using all descriptors found there, we attain improvements in keyword assignment precision from 18 to 29 percent and in F-measure from 17.2 to 27.6 for 5 keywords assigned to a document. The further filtering out of the top 10 frequent items improves precision by 4 percent and collocation segmentation improves precision by 9 percent on the average, over 21 languages tested.
Series/Report no.
(Lecture Notes in Computer Science. Vol. 6008 0302-9743)
Type of document
type::text::journal::journal article::research article
Language
Anglų / English (en)
Coverage Spatial
Vokietija / Germany (DE)