Application of large language models in generating Ukrainian corpora for training text classification systems

Main Article Content

Vyacheslav Nykytyuk
Andrii Dolinskyi

Abstract

This study deals with the application of modern Artificial Intelligence technologies, mainly deep learning methods based on recurrent neural networks of the Long Short-Term Memory (LSTM) type, for the construction and analysis of Ukrainian-language corpora. The research attention is directed to the automated classification of text sentiment (tonality), which involves distributing texts into positive, neutral, and negative categories. This, sequentially, testified the effectiveness of the proposed algorithmic solutions and their suitability for the tasks of analyzing the emotional tone of Ukrainian-language content.

Article Details

Section

Articles

Author Biography

Vyacheslav Nykytyuk, Ternopil Ivan Puluj National Technical University, Ternopil

PhD, Associate Chair of the Department of Computer Science

References

[1] O. Zalutska, Method for analyzing the Ukrainian language texts sentiment using natural language processing, in: Information Control Systems and Intelligent Technologies. Advances and Applications, Liha-Pres, Lviv, 2022, pp. 122–137. https://doi.org/10.36059/978-966-397-538-2-7 (in Ukrainian)

[2] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language Models are Few-Shot Learners, (2020). https://doi.org/10.48550/arXiv.2005.14165 (Accessed 15 October 2025)

[3] T. Chen, Y. Chen, T. Gui, Q. Huang, L. Xue, Qi. Zhang, A Survey on Large Language Models in Natural Language Processing, (2023). https://doi.org/10.48550/arXiv.2303.18223 (Accessed 15 October 2025)

[4] R. Khrabatyn, V. Zaiats, Technologies for designing the structure of the information system for monitoring the technical condition of bridge structures, Scientific Journal of the Ternopil National Technical University. 109 (2023) 72–79. https://doi.org/10.33108/visnyk_tntu2023.01.072 (in Ukrainian)

[5] O.A. Pastukh, O.V. Tkach, Brain-computer interaction based on motor imagery using machine learning, Scientific Journal of the Ternopil National Technical University. 112 (2023) 26–31. https://doi.org/10.33108/visnyk_tntu2023.04.026 (in Ukrainian)

[6] F. Li, C. Cui, Y. Hu, L. Wang, Sentiment Analysis of User Comment Text based on LSTM, WSEAS Transactions on Signal Processing. 19 (2023) 19–31. https://doi.org/10.37394/232014.2023.19.3

[7] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention Is All You Need, (2017). https://doi.org/10.48550/arXiv.1706.03762 (Accessed 15 October 2025)

[8] M. Prytula, Fine-tuning BERT, DistilBERT, XLM-RoBERTa and Ukr-RoBERTa models for sentiment analysis of Ukrainian language reviews, Artificial Intelligence. 2 (2024) 85–97. https://doi.org/10.15407/jai2024.02.085 (in Ukrainian)

[9] B. Ding, J. Zhou, Z. Li, X. Long, X. Li, Z. Wu, S. Gao, Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges, (2024). https://doi.org/10.48550/arXiv.2403.02990 (Accessed 15 October 2025)

[10] H. Dai, Z. Liu, W. Liao, X. Huang, Y. Cao, Z. Wu, S. Zhao, W. Zhu, S. Wei, T. Liu, N.S. Peng, AugGPT: Leveraging ChatGPT for Text Data Augmentation, (2023). https://doi.org/10.48550/arXiv.2302.13007 (Accessed 15 October 2025)

[11] M. Bayer, M.A. Kaufhold, C. Reuter, A Survey on Data Augmentation for Text Classification, ACM Computing Surveys. 55 (2023) 1–39. https://doi.org/10.1145/3544558

[12] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, Cambridge, 2016. https://www.deeplearningbook.org

[13] S. Hochreiter, J. Schmidhuber, Long Short-Term Memory, Neural Computation. 9 (1997) 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

[14] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word Representations in Vector Space, (2013). https://doi.org/10.48550/arXiv.1301.3781 (Accessed 15 October 2025)

[15] A. Simarmata, Anthony, Tiffany, M. Phanie, Sentiment Analysis On Twitter Posts About The Russia and Ukraine War With Long Short-Term Memory, Sinkron. 8 (2023) 762–772. https://doi.org/10.33395/sinkron.v8i2.12235

Most read articles by the same author(s)