Implementation of Topic modeling for Multilingual Document Summarization based on Bag of Itemset

Authors

  • Bambang Subeno Universitas Telkom

DOI:

https://doi.org/10.29407/gj.v10i2.28393

Keywords:

Bag of itemset, LDA, Summarization, Topic Summarization, Multilingual document summarization

Abstract

With the increasing number of electronic text documents, the process of searching and processing information has become increasingly complex, especially when these documents come from multiple sources and languages. Consequently, document summarization methods are needed to help users retrieve important information more quickly. However, existing multilingual summarization methods, such as ELSA, are limited by dataset size and the need to pre-determine themes. By integrating the Bag of Itemset representation and the Latent Dirichlet Allocatio Algorithm Modification (LDA-AM) approach, this study aims to improve the quality of multilingual document summarization. The proposed method first uses topic modeling to divide different multilingual documents into several topics. Then, for each topic, a sentence selection process is performed to generate topic-based summaries, which are then combined into a general summary. Using the ROUGE evaluation metric, experiments were conducted to compare the proposed method with baseline. Experimental results show that the proposed method performs better than ROUGE-1 with a value of 0.2623, ROUGE-2 with a value of 0.1802, and ROUGE-L with a value of 0.1231. The results indicate that in the process of summarizing multilingual documents, summary quality can be improved by combining the Bag of Itemset representation and LDA-AM.

Abstract views: 0 , PDF downloads: 4

References

[1] Y. Zang, H. Jein, D. Meng “A comprehensive survey on automatic text summarization with exploration of LLM-based methods,” Neurocomputing,vol. 663, 2026.

[2] Jiang M, Zou Y, Zhang, “GATSum: Graph-Based Topic-Aware Abstract Text Summarization,” Information Technology and Control, pp. 345-355, 2022.

[3] Meiling Xu, Hayati, “Text Summarization: A Bibliometric Study and Systematic Literature Review,” Ingénierie des Systèmes d’Information, pp. 2207-2089, 2024.

[4] M. G. S. R. Matteo Francia, “Summarization and visualization of multi-level and multi-dimensional itemsets,” Information Sciences, pp. 63-85, 2020.

[5] S. R. G. F. S. E. N. Adhika Pramita Widyasari, “Review of automatic text summarization techniques & methods,” Journal of King Saud University – Computer and Information Sciences, pp. 1029-1046, 2020.

[6] Mr. Pranav, Mr. Shivprasad, Mr. Aniket, “Multilingual Text Summarization Using NLP,” International Journal of Advanced Research in Science, Communication and Technology(IJARSCT), vol. 5, 2025.

[7] Y. W. W. S. X. Z. A. R. a. Y. Y. X. Yan, “Unsupervised Graph-Based Tibetan Multi-Document Summarization,” Computers, Materials and Continua, pp. 1769-1781, 2022.

[8] S. B.-G. a. Z. Z. D. R. Radev, “Experiments in Single and Multi-Document Summarization Using MEAD,” The First Document Understanding Conference, 2021.

[9] P. G. a. E. B. P. d. T. Luca Cagliero, “ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis,” ACM Transactions on Information Systems, vol. 37, 2019.

[10] R. Gaetano, B. Pierpaolo, S. Giovanni, “Centroid-based Text Summarization through Compositionality of Word Embeddings,” Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, pp. 12-21, 201, 2017.

[11] B. Subeno, R. Kusumaningrum dan Farikhin, “Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collapsed Gibbs Sampling Inference Process,” International Journal of Electrical and Computer Engineering (IJECE), pp. 3204-3213, 2018.

[12] T. S. X. Han, “Efficient Collapsed Gibbs Sampling For Latent Dirichlet Allocation,” Asian Conference on Machine Learning (ACML2010), 2010.

[13] R. Dani, W. Deden, S. Dady, “Observing the Performance of the TextRank Algorithm on Automatic Text Summarization for Bahasa Indonesia,” International Journal on Advanced Science, Engineering and Information Technology (IJASEIT), pp. 1147–1153, 2023.

Downloads

PlumX Metrics

Published

2026-06-04

How to Cite

Implementation of Topic modeling for Multilingual Document Summarization based on Bag of Itemset. (2026). Generation Journal, 10(2), 98-105. https://doi.org/10.29407/gj.v10i2.28393

Similar Articles

1-10 of 18

You may also start an advanced similarity search for this article.