Application of large language models in generating Ukrainian corpora for training text classification systems

Vyacheslav Nykytyuk; Andrii  Dolinskyi

doi:10.33108/

Опубліковано: 2026-05-09

Ключові слова:

Artificial intelligence, deep learning, LSTM models, text sentiment, language dataset, corpus, text classification, natural language processing, Ukrainian language, automation, message analysis

Vyacheslav Nykytyuk

Ternopil Ivan Puluj National Technical University, Ternopil

https://orcid.org/0000-0003-1547-8042

Andrii Dolinskyi

Ternopil Ivan Puluj National Technical University, Ternopil, Ukraine

Анотація

This study deals with the application of modern Artificial Intelligence technologies, mainly deep learning methods based on recurrent neural networks of the Long Short-Term Memory (LSTM) type, for the construction and analysis of Ukrainian-language corpora. The research attention is directed to the automated classification of text sentiment (tonality), which involves distributing texts into positive, neutral, and negative categories. This, sequentially, testified the effectiveness of the proposed algorithmic solutions and their suitability for the tasks of analyzing the emotional tone of Ukrainian-language content.

Номер

Том 121 № 1 (2026)

Розділ

Articles

Ця робота ліцензується відповідно до ліцензії Creative Commons Attribution 4.0 International License.

Біографія автора

Vyacheslav Nykytyuk, Ternopil Ivan Puluj National Technical University, Ternopil

PhD, Associate Chair of the Department of Computer Science

Посилання

[1] O. Zalutska, Method for analyzing the Ukrainian language texts sentiment using natural language processing, in: Information Control Systems and Intelligent Technologies. Advances and Applications, Liha-Pres, Lviv, 2022, pp. 122–137. https://doi.org/10.36059/978-966-397-538-2-7 (in Ukrainian)

[2] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language Models are Few-Shot Learners, (2020). https://doi.org/10.48550/arXiv.2005.14165 (Accessed 15 October 2025)

[3] T. Chen, Y. Chen, T. Gui, Q. Huang, L. Xue, Qi. Zhang, A Survey on Large Language Models in Natural Language Processing, (2023). https://doi.org/10.48550/arXiv.2303.18223 (Accessed 15 October 2025)

[4] R. Khrabatyn, V. Zaiats, Technologies for designing the structure of the information system for monitoring the technical condition of bridge structures, Scientific Journal of the Ternopil National Technical University. 109 (2023) 72–79. https://doi.org/10.33108/visnyk_tntu2023.01.072 (in Ukrainian)

[5] O.A. Pastukh, O.V. Tkach, Brain-computer interaction based on motor imagery using machine learning, Scientific Journal of the Ternopil National Technical University. 112 (2023) 26–31. https://doi.org/10.33108/visnyk_tntu2023.04.026 (in Ukrainian)

[6] F. Li, C. Cui, Y. Hu, L. Wang, Sentiment Analysis of User Comment Text based on LSTM, WSEAS Transactions on Signal Processing. 19 (2023) 19–31. https://doi.org/10.37394/232014.2023.19.3

[7] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention Is All You Need, (2017). https://doi.org/10.48550/arXiv.1706.03762 (Accessed 15 October 2025)

[8] M. Prytula, Fine-tuning BERT, DistilBERT, XLM-RoBERTa and Ukr-RoBERTa models for sentiment analysis of Ukrainian language reviews, Artificial Intelligence. 2 (2024) 85–97. https://doi.org/10.15407/jai2024.02.085 (in Ukrainian)

[9] B. Ding, J. Zhou, Z. Li, X. Long, X. Li, Z. Wu, S. Gao, Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges, (2024). https://doi.org/10.48550/arXiv.2403.02990 (Accessed 15 October 2025)

[10] H. Dai, Z. Liu, W. Liao, X. Huang, Y. Cao, Z. Wu, S. Zhao, W. Zhu, S. Wei, T. Liu, N.S. Peng, AugGPT: Leveraging ChatGPT for Text Data Augmentation, (2023). https://doi.org/10.48550/arXiv.2302.13007 (Accessed 15 October 2025)

[11] M. Bayer, M.A. Kaufhold, C. Reuter, A Survey on Data Augmentation for Text Classification, ACM Computing Surveys. 55 (2023) 1–39. https://doi.org/10.1145/3544558

[12] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, Cambridge, 2016. https://www.deeplearningbook.org

[13] S. Hochreiter, J. Schmidhuber, Long Short-Term Memory, Neural Computation. 9 (1997) 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

[14] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word Representations in Vector Space, (2013). https://doi.org/10.48550/arXiv.1301.3781 (Accessed 15 October 2025)

[15] A. Simarmata, Anthony, Tiffany, M. Phanie, Sentiment Analysis On Twitter Posts About The Russia and Ukraine War With Long Short-Term Memory, Sinkron. 8 (2023) 762–772. https://doi.org/10.33395/sinkron.v8i2.12235

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##