The semantic power of text content as a flow of a vector field of embeddings

Main Article Content

Viktor Stashkiv
Andrii Khamarchuk
Kyrylo Chornopyskyi
Vladyslav Shumeiko
Maksym Chorniak
Karina Yarosh
Valentyna Tserkovniuk
Oleh Pastukh

Abstract

The growing volume of textual data demands advanced methods for evaluating both content effectiveness and semantic structure. While current Natural Language Processing (NLP) techniques offer powerful tools, they often lack metrics for quantifying intrinsic semantic intensity or conceptual coherence. This paper introduces “semantic power” – a novel quantitative measure designed to analyze the conceptual structure and semantic richness of texts, grounded in principles of field theory. The proposed methodology draws on the Ostrogradsky–Gauss theorem and the divergence operator, establishing a theoretical link between local semantic properties of a text (derived from LaBSE vector embeddings) and their global influence. The approach involves computing a semantic centroid, representing the point of highest meaning concentration, and measuring semantic power using a model that assumes an inverse-square decay of vector influence. For further analysis, Gaussian Mixture Model (GMM) clustering id applied, and Principal Component Analysis (PCA) is used for dimensionality reduction and visualization. Experiments on philosophical texts by key Early Modern thinkers – G. W. Leibniz, R. Descartes, and I. Kant – reveal distinct and meaningful variations in semantic power (0.6010, 0.5633, and 0.5787, respectively) and in the resulting clustering patterns (2, 7, and 2 clusters). These findings suggest that semantic power is not merely a numerical descriptor but one that correlates with established intellectual styles and methodological orientations of the authors. As such, semantic power emerges as a powerful and objective metric for assessing the deep cognitive and semantic dimensions of textual content, with potential applications in philology, cognitive science, and computational linguistics and related disciplines.

Article Details

Section

Articles

References

1. Yao, L., Mao, C., & Luo, Y. (2019). Graph convolutional networks for text classification. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 7370–7377. https://doi.org/10.1609/aaai.v33i01.33017370

2. Kozlowski, D., Dusdal, J., Pang, J., & Zilian, A. (2021). Semantic and relational spaces in science of science: Deep learning models for article vectorisation. Scientometrics. https://doi.org/10.1007/s11192-021-03984-1

3. Liu, B., Guan, W., Yang, C., Fang, Z., & Lu, Z. (2023). Transformer and graph convolutional network for text classification. International Journal of Computational Intelligence Systems, 16(1). https://doi.org/10.1007/s44196-023-00337-z

4. Wang, B., Li, Q., Melucci, M., & Song, D. (2019). Semantic hilbert space for text representation learning. У The world wide web conference. ACM Press. https://doi.org/10.1145/3308558.3313516

5. Vyas, Y., Niu, X., & Carpuat, M. (2018). Identifying semantic divergences in parallel text without annotations. У Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/n18-1136

6. Zeng, D., Zha, E., Kuang, J., & Shen, Y. (2024). Multi-label text classification based on semantic-sensitive graph convolutional network. Knowledge-Based Systems, 284, 111303. https://doi.org/10.1016/j.knosys.2023.111303

7. Tekgöz, H., İlhan Omurca, S., Koç, K. Y., Topçu, U., & Çeli̇k, O. (2022). Semantic similarity comparison between production line failures for predictive maintenance. Advances in Artificial Intelligence Research. https://doi.org/10.54569/aair.1142568

8. Premalatha, M., Viswanathan, V., & Čepová, L. (2022). Application of semantic analysis and LSTM-GRU in developing a personalized course recommendation system. Applied Sciences, 12(21), 10792. https://doi.org/10.3390/app122110792

9. Narendra G O & Hashwanth S. (2022). Named entity recognition based resume parser and summarizer. International Journal of Advanced Research in Science, Communication and Technology, 728–735. https://doi.org/10.48175/ijarsct-3029

10. Venkatesh, D., & Raman, S. (2024). BITS pilani at semeval-2024 task 1: Using text-embedding-3-large and labse embeddings for semantic textual relatedness. У Proceedings of the 18th international workshop on semantic evaluation (semeval-2024). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.semeval-1.124

11. Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2022). Language-agnostic BERT sentence embedding. У Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: Long papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.62

12. Kesiraju, S., Plchot, O., Burget, L., & Gangashetty, S. V. (2020). Learning document embeddings along with their uncertainties. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2319–2332. https://doi.org/10.1109/taslp.2020.3012062

13. Hu, C., Wu, T., Liu, S., Liu, C., Ma, T., & Yang, F. (2024). Joint unsupervised contrastive learning and robust GMM for text clustering. Information Processing & Management, 61(1), 103529. https://doi.org/10.1016/j.ipm.2023.103529

14. Chesanovsky, I., & Levhunets, D. (2017). Representation of narrow-band radio signals with angular modulation in trunked radio systems using the principal component analysis. Scientific Journal of the Ternopil National Technical University, 86(2), 117–121. https://elartu.tntu.edu.ua/handle/lib/22368

15. Musil, T. (2019). Examining structure of word embeddings with PCA. У Text, speech, and dialogue (с. 211–223). Springer International Publishing. https://doi.org/10.1007/978-3-030-27947-9_18

Most read articles by the same author(s)