The semantic power of text content as a flow of a vector field of embeddings
Main Article Content
Abstract
The growing volume of textual data demands advanced methods for evaluating both content effectiveness and semantic structure. While current Natural Language Processing (NLP) techniques offer powerful tools, they often lack metrics for quantifying intrinsic semantic intensity or conceptual coherence. This paper introduces “semantic power” – a novel quantitative measure designed to analyze the conceptual structure and semantic richness of texts, grounded in principles of field theory. The proposed methodology draws on the Ostrogradsky–Gauss theorem and the divergence operator, establishing a theoretical link between local semantic properties of a text (derived from LaBSE vector embeddings) and their global influence. The approach involves computing a semantic centroid, representing the point of highest meaning concentration, and measuring semantic power using a model that assumes an inverse-square decay of vector influence. For further analysis, Gaussian Mixture Model (GMM) clustering id applied, and Principal Component Analysis (PCA) is used for dimensionality reduction and visualization. Experiments on philosophical texts by key Early Modern thinkers – G. W. Leibniz, R. Descartes, and I. Kant – reveal distinct and meaningful variations in semantic power (0.6010, 0.5633, and 0.5787, respectively) and in the resulting clustering patterns (2, 7, and 2 clusters). These findings suggest that semantic power is not merely a numerical descriptor but one that correlates with established intellectual styles and methodological orientations of the authors. As such, semantic power emerges as a powerful and objective metric for assessing the deep cognitive and semantic dimensions of textual content, with potential applications in philology, cognitive science, and computational linguistics and related disciplines.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
1. Yao, L., Mao, C., & Luo, Y. (2019). Graph convolutional networks for text classification. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 7370–7377. https://doi.org/10.1609/aaai.v33i01.33017370
2. Kozlowski, D., Dusdal, J., Pang, J., & Zilian, A. (2021). Semantic and relational spaces in science of science: Deep learning models for article vectorisation. Scientometrics. https://doi.org/10.1007/s11192-021-03984-1
3. Liu, B., Guan, W., Yang, C., Fang, Z., & Lu, Z. (2023). Transformer and graph convolutional network for text classification. International Journal of Computational Intelligence Systems, 16(1). https://doi.org/10.1007/s44196-023-00337-z
4. Wang, B., Li, Q., Melucci, M., & Song, D. (2019). Semantic hilbert space for text representation learning. У The world wide web conference. ACM Press. https://doi.org/10.1145/3308558.3313516
5. Vyas, Y., Niu, X., & Carpuat, M. (2018). Identifying semantic divergences in parallel text without annotations. У Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/n18-1136
6. Zeng, D., Zha, E., Kuang, J., & Shen, Y. (2024). Multi-label text classification based on semantic-sensitive graph convolutional network. Knowledge-Based Systems, 284, 111303. https://doi.org/10.1016/j.knosys.2023.111303
7. Tekgöz, H., İlhan Omurca, S., Koç, K. Y., Topçu, U., & Çeli̇k, O. (2022). Semantic similarity comparison between production line failures for predictive maintenance. Advances in Artificial Intelligence Research. https://doi.org/10.54569/aair.1142568
8. Premalatha, M., Viswanathan, V., & Čepová, L. (2022). Application of semantic analysis and LSTM-GRU in developing a personalized course recommendation system. Applied Sciences, 12(21), 10792. https://doi.org/10.3390/app122110792
9. Narendra G O & Hashwanth S. (2022). Named entity recognition based resume parser and summarizer. International Journal of Advanced Research in Science, Communication and Technology, 728–735. https://doi.org/10.48175/ijarsct-3029
10. Venkatesh, D., & Raman, S. (2024). BITS pilani at semeval-2024 task 1: Using text-embedding-3-large and labse embeddings for semantic textual relatedness. У Proceedings of the 18th international workshop on semantic evaluation (semeval-2024). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.semeval-1.124
11. Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2022). Language-agnostic BERT sentence embedding. У Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: Long papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.62
12. Kesiraju, S., Plchot, O., Burget, L., & Gangashetty, S. V. (2020). Learning document embeddings along with their uncertainties. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2319–2332. https://doi.org/10.1109/taslp.2020.3012062
13. Hu, C., Wu, T., Liu, S., Liu, C., Ma, T., & Yang, F. (2024). Joint unsupervised contrastive learning and robust GMM for text clustering. Information Processing & Management, 61(1), 103529. https://doi.org/10.1016/j.ipm.2023.103529
14. Chesanovsky, I., & Levhunets, D. (2017). Representation of narrow-band radio signals with angular modulation in trunked radio systems using the principal component analysis. Scientific Journal of the Ternopil National Technical University, 86(2), 117–121. https://elartu.tntu.edu.ua/handle/lib/22368
15. Musil, T. (2019). Examining structure of word embeddings with PCA. У Text, speech, and dialogue (с. 211–223). Springer International Publishing. https://doi.org/10.1007/978-3-030-27947-9_18