Unveiling Insights: A Knowledge Discovery Approach to Comparing Topic Modeling Techniques in Digital Health Research

Siti Rohajawati; Puji Rahayu; Afny Tazkiyatul Misky; Khansha Nafi Rasyidatus Sholehah; Normala Rahim; R.R. Hutanti Setyodewi

doi:10.29407/intensif.v8i1.22058

Authors

Siti Rohajawati Universitas Bakrie http://orcid.org/0000-0002-6775-8997
Puji Rahayu Universitas Mercubuana http://orcid.org/0000-0002-6684-9774
Afny Tazkiyatul Misky Universitas Mercubuana http://orcid.org/0009-0005-4555-5901
Khansha Nafi Rasyidatus Sholehah Universitas Mercubuana http://orcid.org/0009-0008-1259-5837
Normala Rahim Universiti Sultan Zainal Abidin http://orcid.org/0000-0002-2094-7694
R.R. Hutanti Setyodewi DR. Gerard sp. z o.o http://orcid.org/0009-0008-3937-652X

DOI:

https://doi.org/10.29407/intensif.v8i1.22058

Keywords:

Knowledge Discovery, Topic Modeling, Digital Health

Abstract

This paper introduces a knowledge discovery approach focused on comparing topic modeling techniques within the realm of digital health research. Knowledge discovery has been applied in massive data repositories (databases) and also in various field studies, which use these techniques for finding patterns in the data, determining which models and parameters might be suitable, and looking for patterns of interest in a specific representational. Unfortunately, the investigation delves into the utilization of Latent Dirichlet Allocation (LDA) and Pachinko Allocation Models (PAM) as generative probabilistic models in knowledge discovery, which is still limited. The study's findings position PAM as the superior technique, showcasing the greatest number of distinctive tokens per topic and the fastest processing time. Notably, PAM identifies 87 unique tokens across 10 topics, surpassing LDA Gensim's identification of only 27 unique tokens. Furthermore, PAM demonstrates remarkable efficiency by swiftly processing 404 documents within an incredibly short span of 0.000118970870 seconds, in contrast to LDA Gensim's considerably longer processing time of 0.368770837783 seconds. Ultimately, PAM emerges as the optimum method for digital health research's topic modeling, boasting unmatched efficiency in analyzing extensive digital health text data.

Downloads

Download data is not yet available.

Abstract views: 421 , PDF downloads: 401

Author Biographies

Siti Rohajawati, Universitas Bakrie

Departement Sistem Informasi, Universitas Bakrie
Puji Rahayu, Universitas Mercubuana

Departement Teknik Informatika, Universitas Mercubuana
Afny Tazkiyatul Misky, Universitas Mercubuana

Departement Teknik Informatika, Universitas Mercubuana
Khansha Nafi Rasyidatus Sholehah, Universitas Mercubuana

Departement Teknik Informatika, Universitas Mercubuana
Normala Rahim, Universiti Sultan Zainal Abidin

Fakulti Informatik dan Komputeran, Universiti Sultan Zainal Abidin, Malaysia
R.R. Hutanti Setyodewi, DR. Gerard sp. z o.o

DR. Gerard sp. z o.o., Industries, Poland

References

A. Adhikari and J. Adhikari, Advances in Knowledge Discovery in Databases, Intelligen. New York Dordrecht London: Springer International Publishing Switzerland, 2015. doi: 10.1007/978-3-319-13212-9.

M. Furner, M. Z. Islam, and C.-T. Li, “Knowledge Discovery and Visualisation Framework using Machine Learning for Music Information Retrieval from Broadcast Radio Data,” Expert Syst. Appl., vol. 182, p. 115236, 2021, doi: https://doi.org/10.1016/j.eswa.2021.115236.

V. Vasilaki, V. Conca, N. Frison, A. L. Eusebi, F. Fatone, and E. Katsou, “A Knowledge Discovery Framework to Predict the N2O Emissions in the Wastewater Sector,” Water Res., vol. 178, p. 115799, 2020, doi: https://doi.org/10.1016/j.watres.2020.115799.

H. Jelodar et al., “Latent Dirichlet Allocation (LDA) and Topic modeling: Models, Applications, a Survey,” J. Mach. Learn. Res., vol. 3, no. null, pp. 993–1022, Mar. 2003, doi: https://doi.org/10.1007/s11042-018-6894-4.

A. Ahmed, R. Charate, N. V. K. Pothineni, S. K. Aedma, R. Gopinathannair, and D. R. Lakkireddy, “Role of Digital Health During Coronavirus Disease 2019 Pandemic and Future Perspectives,” Card. Electrophysiol. Clin., vol. 14, pp. 115–123, 2021, [Online]. Available: https://api.semanticscholar.org/CorpusID:240230974

K. R. Jongsma, M. N. Bekker, S. Haitjema, and A. L. Bredenoord, “How Digital Health Affects the Patient-Physician Relationship: An Empirical-Ethics Study into the Perspectives and Experiences in Obstetric Care,” Pregnancy Hypertens., vol. 25, pp. 81–86, 2021, doi: https://doi.org/10.1016/j.preghy.2021.05.017.

A. Nurlayli and M. A. Nasichuddin, “Topic Modeling Penelitian Dosen JPTEI UNY pada Google Scholar Menggunakan Latent Dirichlet Allocation,” Elinvo (Electronics, Informatics, Vocat. Educ., vol. 4, no. 2, pp. 154–161, 2019, doi: 10.21831/elinvo.v4i2.28254.

X. Cheng, Q. Cao, and S. S. Liao, “An Overview of Literature on COVID-19, MERS and SARS: Using Text Mining and Latent Dirichlet Allocation,” J. Inf. Sci., vol. 48, no. 3, pp. 304–320, Aug. 2020, doi: 10.1177/0165551520954674.

J. Tuke et al., “Pachinko Prediction: A Bayesian method for event prediction from social media data,” Inf. Process. Manag., vol. 57, no. 2, p. 102147, 2020, doi: https://doi.org/10.1016/j.ipm.2019.102147.

Y. A. Alsahafi and V. Gay, “An Overview of Electronic Personal Health Records,” Heal. Policy Technol., vol. 7, no. 4, pp. 427–432, 2018, doi: https://doi.org/10.1016/j.hlpt.2018.10.004.

L. M. Ganiem, “Efek Telemedicine pada Masyarakat (Kajian Hukum Media McLuhan: Tetrad),” Interak. J. Ilmu Komun., vol. 9, no. 2, pp. 87–97, 2021, doi: 10.14710/interaksi.9.2.87-97.

C. Schaefer and A. Makatsaria, “Framework of Data Analytics and Integrating Knowledge Management,” Int. J. Intell. Networks, vol. 2, pp. 156–165, 2021, doi: https://doi.org/10.1016/j.ijin.2021.09.004.

X. Shu and Y. Ye, “Knowledge Discovery: Methods from Data Mining and Machine Learning,” Soc. Sci. Res., vol. 110, p. 102817, 2023, doi: https://doi.org/10.1016/j.ssresearch.2022.102817.

A. Ciapetti, G. Ruggiero, and D. Toti, “A Semantic Knowledge Discovery Framework for Detecting Online Terrorist Networks,” in MultiMedia Modeling, 2019, pp. 120–131.

A. Jahani, P. Akhavan, M. Jafari, and M. Fathian, “Conceptual model for knowledge discovery process in databases based on multi-agent system,” VINE J. Inf. Knowl. Manag. Syst., vol. 46, no. 2, pp. 207–231, Jan. 2016, doi: 10.1108/VJIKMS-01-2015-0003.

A. Halder and M. Kannadhasan, “Knowledge Structure, Progression and Emergent Areas of Corporate Bankrupty: A Blibliiometric and Topic Modelling Analyses,” SSRN Electr., pp. 1–25, 2022, doi: https://dx.doi.org/10.2139/ssrn.4193714.

H. Kim, I. Cho, and M. Park, “Analyzing genderless fashion trends of consumers’ perceptions on social media: using unstructured big data analysis through Latent Dirichlet Allocation-based topic modeling,” Fash. Text., vol. 9, no. 1, p. 6, 2022, doi: 10.1186/s40691-021-00281-6.

L. Liu, L. Tang, W. Dong, S. Yao, and W. Zhou, “An overview of topic modeling and its current applications in bioinformatics,” Springerplus, vol. 5, no. 1, 2016, doi: 10.1186/s40064-016-3252-8.

M. Thompson, “The Geographies of Digital Health – Digital Therapeutic Landscapes and Mobilities,” Health Place, vol. 70, p. 102610, 2021, doi: https://doi.org/10.1016/j.healthplace.2021.102610.

A. P. Sunjaya, “Potensi, Aplikasi dan Perkembangan Digital Health di Indonesia,” J. Indones. Med. Assoc., vol. 69, no. 4, pp. 167–169, 2019, doi: 10.47830/jinma-vol.69.4-2019-63.

I. Vayansky and S. A. P. Kumar, “A Review of Topic Modeling Methods,” Inf. Syst., vol. 94, p. 101582, 2020, doi: https://doi.org/10.1016/j.is.2020.101582.

K. R. Nastiti, A. F. Hidayatullah, and A. R. Pratama, “Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation,” J. Online Inform., vol. 6, no. 1, p. 17, 2021, doi: 10.15575/join.v6i1.636.

S. Yamasaki, K. Yaji, and K. Fujita, “Knowledge Discovery in Databases for Determining Formulation in Topology Optimization,” Struct. Multidiscip. Optim., vol. 59, no. 2, pp. 595–611, 2019, doi: 10.1007/s00158-018-2086-0.

T. Y. Choi and V. Cho, “Towards a knowledge discovery framework for yield management in the Hong Kong hotel industry,” Int. J. Hosp. Manag., vol. 19, no. 1, pp. 17–31, 2000, doi: 10.1016/S0278-4319(99)00053-5.

R. J. Roiger, “The Knowledge Discovery Process,” Data Min., pp. 199–220, 2018, doi: 10.1201/9781315382586-6.

A. T. Jebb, S. Parrigon, and S. E. Woo, “Exploratory Data Analysis as a Foundation of Inductive Research,” Hum. Resour. Manag. Rev., vol. 27, no. 2, pp. 265–276, 2017, doi: 10.1016/j.hrmr.2016.08.003.

P. Chakri, S. Pratap, Lakshay, and S. K. Gouda, “An Exploratory Data Analysis Approach for Analyzing Financial Accounting Data using Machine Learning,” Decis. Anal. J., vol. 7, no. January, p. 100212, 2023, doi: 10.1016/j.dajour.2023.100212.

M. O. Adeniyi et al., “Dynamic Model of COVID-19 Disease with Exploratory Data Analysis,” Sci. African, vol. 9, p. e00477, 2020, doi: 10.1016/j.sciaf.2020.e00477.

A. Patel and S. Jain, “Formalisms of Representing Knowledge,” Procedia Comput. Sci., vol. 125, pp. 542–549, 2018, doi: 10.1016/j.procs.2017.12.070.

M. M. Abdul Jalil, C. P. Ling, N. M. Mohamad Noor, and F. Mohd, “Knowledge Representation Model for Crime Analysis,” Procedia Comput. Sci., vol. 116, pp. 484–491, 2017, doi: 10.1016/j.procs.2017.10.067.

C. Palma, V. Morgado, and R. J. N. B. da Silva, “Top-down evaluation of matrix effects uncertainty,” Talanta, vol. 192, pp. 278–287, 2019, doi: 10.1016/j.talanta.2018.09.039.

J. Rossmann, R. Gurke, L. D. Renner, R. Oertel, and W. Kirch, “Evaluation of the matrix effect of different sample matrices for 33 pharmaceuticals by post-column infusion,” J. Chromatogr. B Anal. Technol. Biomed. Life Sci., vol. 1000, pp. 84–94, 2015, doi: 10.1016/j.jchromb.2015.06.019.

X. Zhang, “Knowledge integration in interdisciplinary research teams: Role of social networks,” J. Eng. Technol. Manag., vol. 67, p. 101733, 2023, doi: https://doi.org/10.1016/j.jengtecman.2023.101733.

K. Gugerell, V. Radinger-Peer, and M. Penker, “Systemic knowledge integration in transdisciplinary and sustainability transformation research,” Futures, vol. 150, no. May, p. 103177, 2023, doi: 10.1016/j.futures.2023.103177.

M. Furner, M. Z. Islam, and C. T. Li, “Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data,” Expert Syst. Appl., vol. 182, no. May, p. 115236, 2021, doi: 10.1016/j.eswa.2021.115236.

K. Ogunsina, I. Bilionis, and D. DeLaurentis, “Exploratory data analysis for airline disruption management,” Mach. Learn. with Appl., vol. 6, no. July, p. 100102, 2021, doi: 10.1016/j.mlwa.2021.100102.

C. Meaney, T. A. Stukel, P. C. Austin, R. Moineddin, M. Greiver, and M. Escobar, “Quality indices for topic model selection and evaluation: a literature review and case study,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–18, 2023, doi: 10.1186/s12911-023-02216-1.

A. Abdelrazek, Y. Eid, E. Gawish, W. Medhat, and A. Hassan, “Topic modeling algorithms and applications: A survey,” Inf. Syst., vol. 112, p. 102131, 2023, doi: https://doi.org/10.1016/j.is.2022.102131.

C. C. Silva, M. Galster, and F. Gilson, “Topic modeling in software engineering research,” Empir. Softw. Eng., vol. 26, no. 6, 2021, doi: 10.1007/s10664-021-10026-0.

R. K. Gupta, R. Agarwalla, B. H. Naik, J. R. Evuri, A. Thapa, and T. D. Singh, “Prediction of research trends using LDA based topic modeling,” Glob. Transitions Proc., vol. 3, no. 1, pp. 298–304, 2022, doi: 10.1016/j.gltp.2022.03.015.

J. A. Lossio-Ventura, S. Gonzales, J. Morzan, H. Alatrista-Salas, T. Hernandez-Boussard, and J. Bian, “Evaluation of clustering and topic modeling methods over health-related tweets and emails,” Artif. Intell. Med., vol. 117, no. May, p. 102096, 2021, doi: 10.1016/j.artmed.2021.102096.

V. Alekseev, E. Egorov, K. Vorontsov, A. Goncharov, K. Nurumov, and T. Buldybayev, “TopicBank: Collection of coherent topics using multiple model training with their further use for topic model validation,” Data Knowl. Eng., vol. 135, p. 101921, 2021, doi: 10.1016/j.datak.2021.101921.

J. Gan and Y. Qi, “Selection of the optimal number of topics for LDA topic model—Taking patent policy analysis as an example,” Entropy, vol. 23, no. 10, 2021, doi: 10.3390/e23101301.

T. Huynh-The, O. Banos, B. V. Le, D. M. Bui, Y. Yoon, and S. Lee, “Traffic behavior recognition using the pachinko allocation model,” Sensors (Switzerland), vol. 15, no. 7, pp. 16040–16059, 2015, doi: 10.3390/s150716040.

W. Li; and A. McCallum, “Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations,” 2006.