Text documents clustering
Author | Affiliation | |||
---|---|---|---|---|
LT | ||||
LT | ||||
LT | Baltijos pažangių technologijų institutas, Vilnius | LT |
Date |
---|
2014 |
Big amounts of textual information are generated every day, and existing techniques can hardly deal with such information flow. However, users expect fast and exact information management and retrieval tools. Clustering is a well known technique for grouping similar data and in such a way making it more manageable and usable. Text clustering is an adaptation of clustering for a very specific data - documents. However, it is not transferable directly to any language, i.e. specifics of language influence performance quite a lot, as shows results for English and other well investigated languages. In this paper we apply different distances and clustering approaches for Lithuanian data, discuss results and provide recommendations for documents in Lithuanian clustering.