Annotation of cybersecurity terminology: methodology, problems and results
Author | Affiliation | |
---|---|---|
Rackevičienė, Sigita | ||
Date |
---|
2021 |
Currently, most terminology extraction projects are based on deep learning systems, the development of which depends on big amounts of texts and training data. The latter are obtained by manually annotating terminology used in domain-specific texts. Annotation is usually performed by terminology researchers in cooperation with domain experts. The presentation presents the monolingual and bilingual terminology annotation methodology which has been used for annotation of the terms of the domain of cybersecurity (CS), the problems which have occurred during the annotation and the initial results. For the purposes of the annotation work, the special software QuickTag has been developed. The software provides a toolkit for annotation of terms and appellations used in monolingual texts and bilingual parallel texts. Functionalities of the software allow adding various types of metadata about lexical units used in coherent texts. Firstly, the main annotation function allows tagging terms and appellations with the pre-existing tags indicating their conceptual characteristics: terms of the CS domain, terms related to the CS domain and appellations of the CS domain. Appellations can be additionally tagged with the tags indicating their semantic classes according to the nature of the referent (documents, institutions, software, etc.). Secondly, QuickTag allows adding metadata about certain usage- and formation-related features of the tagged lexical units, e. g. an annotator can indicate a full term form of the tagged abbreviated term, specify formation type of the term or its origin. [...]