Implementation of Topic modeling for Multilingual Document Summarization based on Bag of Itemset
DOI:
https://doi.org/10.29407/gj.v10i2.28393Keywords:
Bag of itemset, LDA, Summarization, Topic Summarization, Multilingual document summarizationAbstract
With the increasing number of electronic text documents, the process of searching and processing information has become increasingly complex, especially when these documents come from multiple sources and languages. Consequently, document summarization methods are needed to help users retrieve important information more quickly. However, existing multilingual summarization methods, such as ELSA, are limited by dataset size and the need to pre-determine themes. By integrating the Bag of Itemset representation and the Latent Dirichlet Allocatio Algorithm Modification (LDA-AM) approach, this study aims to improve the quality of multilingual document summarization. The proposed method first uses topic modeling to divide different multilingual documents into several topics. Then, for each topic, a sentence selection process is performed to generate topic-based summaries, which are then combined into a general summary. Using the ROUGE evaluation metric, experiments were conducted to compare the proposed method with baseline. Experimental results show that the proposed method performs better than ROUGE-1 with a value of 0.2623, ROUGE-2 with a value of 0.1802, and ROUGE-L with a value of 0.1231. The results indicate that in the process of summarizing multilingual documents, summary quality can be improved by combining the Bag of Itemset representation and LDA-AM.
References
[1] Y. Zang, H. Jein, D. Meng “A comprehensive survey on automatic text summarization with exploration of LLM-based methods,” Neurocomputing,vol. 663, 2026.
[2] Jiang M, Zou Y, Zhang, “GATSum: Graph-Based Topic-Aware Abstract Text Summarization,” Information Technology and Control, pp. 345-355, 2022.
[3] Meiling Xu, Hayati, “Text Summarization: A Bibliometric Study and Systematic Literature Review,” Ingénierie des Systèmes d’Information, pp. 2207-2089, 2024.
[4] M. G. S. R. Matteo Francia, “Summarization and visualization of multi-level and multi-dimensional itemsets,” Information Sciences, pp. 63-85, 2020.
[5] S. R. G. F. S. E. N. Adhika Pramita Widyasari, “Review of automatic text summarization techniques & methods,” Journal of King Saud University – Computer and Information Sciences, pp. 1029-1046, 2020.
[6] Mr. Pranav, Mr. Shivprasad, Mr. Aniket, “Multilingual Text Summarization Using NLP,” International Journal of Advanced Research in Science, Communication and Technology(IJARSCT), vol. 5, 2025.
[7] Y. W. W. S. X. Z. A. R. a. Y. Y. X. Yan, “Unsupervised Graph-Based Tibetan Multi-Document Summarization,” Computers, Materials and Continua, pp. 1769-1781, 2022.
[8] S. B.-G. a. Z. Z. D. R. Radev, “Experiments in Single and Multi-Document Summarization Using MEAD,” The First Document Understanding Conference, 2021.
[9] P. G. a. E. B. P. d. T. Luca Cagliero, “ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis,” ACM Transactions on Information Systems, vol. 37, 2019.
[10] R. Gaetano, B. Pierpaolo, S. Giovanni, “Centroid-based Text Summarization through Compositionality of Word Embeddings,” Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, pp. 12-21, 201, 2017.
[11] B. Subeno, R. Kusumaningrum dan Farikhin, “Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collapsed Gibbs Sampling Inference Process,” International Journal of Electrical and Computer Engineering (IJECE), pp. 3204-3213, 2018.
[12] T. S. X. Han, “Efficient Collapsed Gibbs Sampling For Latent Dirichlet Allocation,” Asian Conference on Machine Learning (ACML2010), 2010.
[13] R. Dani, W. Deden, S. Dady, “Observing the Performance of the TextRank Algorithm on Automatic Text Summarization for Bahasa Indonesia,” International Journal on Advanced Science, Engineering and Information Technology (IJASEIT), pp. 1147–1153, 2023.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Bambang Subeno

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Copyright on any article is retained by the author(s).
- The author grants the journal, the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material is distributed under the Creative Commons Attribution-ShareAlike 4.0 International License




