Krilavičius, Tomas
Application of machine learning methods for financial distress detectionItem type:ETD, [Mašininio mokymo metodų taikymas finansinius sunkumus patiriančių įmonių identifikavimui]doctoral thesis[2024][ETD_DR][N009]In this dissertation, a methodology for financial distress detection using machine learning methods was created. This methodology allows to start from any feature set and to build a viable financial distress detection model, by combining different dimensionality reduction, feature construction, class balancing, and machine learning, methods. A novel data-based definition of financial distress was proposed. These adaptive definitions allow the assessment of beneficiaries' different levels of risk tolerance. The provided methodology in this dissertation can be used for creating a financial distress identifier. This financial distress identifier could be used as a stand-alone tool for evaluating distress in the small-medium size enterprises. Moreover, the suggested financial distress conditions in the methodology could be modified according to the beneficiary's risk tolerance level. The proposed methodology analyzes financial distress from the worst-case scenario perspective. However, this condition can be adjusted to the beneficiary's requirements, and the retrained model then could be used for adjusted financial distress identification. Additionally, the suggested methodology is simple to adopt for different financial distress or bankruptcy datasets.
26 Building a python framework for forestry data analysis: integration of the database with calculation modelsItem type:ETD, [Python sistemos kūrimas miškininkystės duomenų analizei: duomenų bazės ir skaičiavimo modelių integracija]bachelor thesis[2024][ETD_BAK][N009]The fast development and complex requirements of forest management and research require advanced analytical capabilities in terms of sustainable processes and practices. This bachelor thesis identifies a critical need for a Python environment that can easily integrate forestry database with different computational models. The main problem to be addressed is the fragmented structure of existing forestry database and computational models used to analyse and evaluate the data. The proposed Python-based library aims to enable efficient data processing, analysis and operating by providing an integration between these two important components. The methodology is based on understanding existing forestry database in Lithuania, collaborating with computational model developers, and creating a structured framework that supports smooth data integration and manipulation. Main tasks cover validation and analysis of the existing forestry database, definition of the requirements, and developing a Python platform architecture, which employ modularity and scalability principles. Achieved results shows that the developed environment provides a single platform, improves analytical capabilities, and simplifies data processing and integration. Providing more accurate data analysis, the platform aids informed decision-making and sustainable forest management. The bachelor thesis concludes that challenges such as data compatibility and system adaptation, can be effectively managed via strategic planning and ongoing collaboration with computational model developers. This type of the integration between forestry database, and Python-based computational models offers significant long-term benefits, making this framework a valuable tool for analysing and managing forestry data.
45 Analyzing and identifying Russian disinformation in Lithuania using graph kernelsItem type:ETD, [Rusijos dezinformacijos Lietuvoje analizė ir identifikavimas naudojant grafų branduolių metodus]master thesis[2024][ETD_MGR][N001]Monitoring social and traditional media means handling vast amounts of unstructured data. In such scenarios, techniques that extract only key information, reducing volume and organizing it into structured data, are often beneficial. This study examined a similar approach using knowledge graphs. It explored disinformation occurrences on the messaging app Telegram through knowledge graphs formed from extracted Subject-Verb-Object (SVO) triplets. Disinformation was identified by computing Graph Kernels between knowledge graphs from Telegram conversations and graphs based on verified disinformation cases from the EUvsDisinfo database. This method assessed over 1 million messages from 27 Russian and Belarusian Telegram channels. According to Lithuanian security agencies, these channels are popular among certain Lithuanian groups and often contain disinformation, primarily targeting the Russo-Ukrainian War. The results showed that disinformation was present in all channels, with a specific focus on biological and nuclear weapons and the political views of residents in occupied territories. The study found that the Shortest Path Graph Kernel was the most effective for detecting disinformation, uniquely identifying graphs with significant disinformation and disregarding those with minimal disinformation.
79 Malware and injection attack detection using artificial intelligenceItem type:ETD, [Kenkėjiškų programų ir injekcinių atakų aptikimas naudojant dirbtinį intelektą]master thesis[2024][ETD_MGR][N009]Mahmoud Ahmed Abdelhamid AbouradyWe used the Aegean Wi-Fi Interference Dataset (AWID) to build and test solid machine learning models for the crucial job of pinpointing interference spots. These models were put through their paces in various scenarios, including both smooth and problematic network behaviors. The main aim here was to boost the performance and precision of intrusion detection systems (IDS), ultimately protecting digital assets and network infrastructure. For this research, we opted for random forest and decision tree classifiers because they perform well and are easy to interpret. We did a lot of pre-processing on the dataset, like cleaning the data, standardizing features, coding categorical variables, and dealing with missing values. This was all to make sure we had top-notch input data for training our models. The results demonstrated outstanding performance by both models, The outcomes show how well machine learning models—the Random Forest in particular identify and classify attacks on networks. This thesis shows the capability of machine learning in improving organization security through viable arrangement of organization traffic. The high-level ML models created and assessed here give a strong groundwork to building stronger and versatile network safety frameworks, possibly prompting huge enhancements in the discovery and moderation of cyberattacks.
33 Predictive modeling of code quality: exploring feature significance through machine learning analysis of ”GitHub” repositoriesItem type:ETD, [Nuspėjamasis kodo kokybės modeliavimas: savybių svarbos tyrimas ”GitHub” repozitorijose pasitelkiant mašininio mokymosi modelius]master thesis[2024][ETD_MGR][N009]Lapienis, PauliusCurrent day code quality systems are based on static code quality metrics. These metrics are old and, their utility is questionable. Because of that, new ways of performing code analysis are required. In this thesis, popular static code analysis metrics are analyzed, applied and compared with graph based methodologies. This comparison is made on a codebase level, with a new dataset presented in the thesis. The dataset consists of seven different, static code quality metrics, five different graphs featurized from the source code, and one external code quality metric, in this case weighted "GitHub" issues. This type of featurization was performed on each distinct version control system commit. In the thesis, two different machine learning methods are applied to static, code quality metric analysis, and three graph based neural networks are applied to five different methods of featurizatoin, which in total, equates to seventeen different experiments, from which the comparison of code featurization techniques, and machine learning architectures is performed. Graph based methodologies are identified as having a slight edge in predicting code quality, but were much more interpretable.
24 1 Formation of cargo delivery routes in a dynamic environmentItem type:ETD, [Krovinių pristatymo maršrutų formavimas dinamiškoje aplinkoje]master thesis[2024][ETD_MGR][N009]In this work, we analyzed the relevance of optimizing the formation of delivery routes and the limitations of existing solutions. This study focuses on the formation of delivery routes for small shipments within a city, considering the dynamic nature of the cargo delivery. This paper presents a survey on various algorithms and methods applicable to Vehicle Routing Problem (VRP). In addition, we evaluated the effectiveness of using electric cargo bicycles as an alternative to trucks for transporting small consignments in the city. The paper provides a thorough explanation of applying a genetic algorithm and reinforcement learning for planning optimal routes for cargo delivery in a dynamic environment. We introduced a reinforcement learning model architecture for solving vehicle routing problems, demonstrating its great generalization capability. The methods proposed in our study outperform Google OR-tools, which relies on heuristic methods. We provide a detailed comparison of the applied methods. Furthermore, we presented a developed simulation model capable of modeling the generation of delivery routes in dynamic environments using different algorithms. This simulation model was implemented as a web application.
25 Neapykantos kalbos atpažinimas panaudojant dirbtinį intelektąItem type:ETD, [Hate speech detection using artificial intelligence]master thesis[2023][ETD_MGR][N009]Šiame darbe pateiktas neapykantos kalbos atpažinimo sprendimas ir metodika lietuvių kalbai. Neapykantos kalbai aptikti naudojami trys giliojo mokymosi modeliai: daugiakalbis BERT, LitLat BERT ir Electra. Visi trys modeliai buvo adaptuoti lietuviškų komentarų klasifikavimui į tris klases: neapykantos, įžeidžią ir neutralią kalbą. Norint adaptuoti modelius atpažinti neapykantos kalbą, buvo parengtas anotuotas duomenų rinkinys, kuriame yra 27 358 lietuviški komentarai. Juos anotavo keturi anotatoriai. Neapykantos kalbos apraiškų turinčių komentarų šiame duomenų rinkinyje yra 15 proc., įžeidžios kalbos – 29 proc., o neutralios kalbos komentarų – 56 proc. Sukurta neapykantos kalbos atpažinimo metodologija lietuvių kalbai remiasi iš anksto apmokytu transformerių neuroninių tinklo architektūros modeliu LitLat BERT, kuris buvo imtas kaip pagrindas ir papildomai apmokytas parengto tekstyno duomenimis. LitLat BERT pasirinktas kaip pagrindas, kadangi taikomųjų mokslinių tyrimų etape, lyginant su kitais testuotais modeliais, jis parodė geriausius rezultatus: F1 statistikos reikšmė neapykantos kalbai siekė 82 proc., o bendras tikslumas – 75 proc. Apmokyti modeliai buvo įvertinti naudojant tikslumo, atkūrimo, preciziškumo ir F1 statistikos metrikas.
3 72 Prisitaikanti IT motyvacinės veiklos sistemaItem type:ETD, [Adaptive IT motivational performance system]master thesis[2023][ETD_MGR][N009]Kinderis, DenasIT pramonė Lietuvoje pastaraisiais metais sparčiai išaugo ir tapo viena reikšmingiausių šalies ekonomikos sektoriuje. Didėjant kvalifikuotų IT specialistų paklausai, daugelis Lietuvos įmonių siekia investuodami į inovatyvias technologijas ir strategijas pritraukti talentingus specialistus. Šiame magistro baigiamajame darbe nagrinėjama alternatyvi IT sektoriui skirta motyvacinė sistema, pabrėžianti komandos atlikto darbo įvertinimą, bendradarbiavimą ir nuolatinį grįžtamojo ryšio rinkimą alternatyvia metodika. Sistema sudaryta iš suprantamų tikslų ir išmatuojamų parametrų, kad būtų galima efektyviai įvertinti ir pagerinti darbuotojų veiklą. Motyvacinės veiklos sistema sieja elementus iš esamų sistemų ir veiklos vystymo įrankių, tokių kaip OKR, KPI, Agile Performance Management, 360-Degree Feedback. Keturi pagrindiniai šios sistemos komponentai: tikslų nustatymas ir derinimas, atviras bendravimas, nuolatinis grįžtamasis ryšys ir tobulėjimas bei pripažinimo ir atlygio integracija. Pritaikant šiuos komponentus, magistro baigiamajame darbe aiškinama, kaip siūlomos motyvacinės sistemos savybės padeda darbuotojams pasiekti norimą rezultatą, kurį galima lengvai išmatuoti ir pritaikyti skirtingoms komandoms, sektoriams ir įmonėms. Pateikiamos įžvalgos apie tikslų įgyvendinimą, galimus iššūkius ir rekomendacijos sistemos patobulinimams.
5 15 Solution for agriculture vehicle work planning and trackingItem type:ETD, [Žemės ūkio transporto priemonių darbo planavimo ir sekimo sprendimas]bachelor thesis[2022][ETD_BAK][N009]Sabraliyeva, TomirisThe new direction in agriculture, Precision Agriculture, has opened the door for many researchers and companies to produce new technologically advanced ways to enhance the work process one of which is GPS tracking. But a relatively young market does not offer a wide range of products, which will let farmers benefit from the technology without the need for any investments and installation of additional hardware. The system proposed in this paper is a GPS-based vehicle tracking and work planning system that will allow agriculture workers to track their vehicles, monitor tasks, and plan the work process without the need for any extra expenses: the application exploits the capability of modern phones to use GPS technology thanks to receivers embedded in them. The output of the work includes the system requirements, initial designs, an early prototype that showcases the look of the future system, and a developed cross-platform application with the administrative web page. The analysis conducted during the work showed that GPS tracking is the most optimal way to perform tracking, and the cross-platform development of the application that accommodates such functionality will be not only efficient but more accessible to the potential users.
17 1 Encoding musical style using music information retrieval and deep learning methodsItem type:ETD, [Muzikinio stiliaus užkodavimas naudojant muzikos informacijos modeliavimo ir giliojo mokymosi metodus]bachelor thesis[2022][ETD_BAK][N001]Nowadays, music is more accessible to us than ever before. With the increased popularity of online music streaming companies, people find themselves spending more and more time choosing the songs they actually like. This poses a problem of a fast and accurate music recommendation method, which would let the users ignore the large quantities of songs and choose precisely what they like. This work presents a method to compare music based entirely on its audio signal properties. For this, we used seven different signal processing and deep learning methods: Mel Frequency Cepstral Coefficients, Chromagram, Tempogram, Zero-Crossing rate, Autoencoder, Variational Autoencoder, and OpenL3 embeddings models. All experiments were performed on a database consisting of 4039 most popular songs from 11 different genres. The methods were evaluated by comparing algorithm's results with the music similarity results given by the experts and by counting the number of same genre/artist songs in the recommendations list. The evaluation results showed that the best model to find similar music was the Chromagram model. Models which indicated the music's rhythm scored the least in both model's evaluation tests.
45