Enhancing the Decision Tree Algorithm to Improve Performance Across Various Datasets
Abstract
Background: The Village Fund is an initiative by the central government to promote equitable regional development. However, it has also led to corruption. Many Indonesians share their opinions on the Village Fund on social media platforms like X, and news coverage is extensive on portals like detik.com. Objective: This study aims to classify data from social media and news coverage to enhance understanding. Methods: The research improves the decision tree algorithm by integrating other algorithms and techniques such as XGBoost and SMOTE. Ensuring high accuracy is vital for the credibility of machine learning classifications among the public. The study uses two different datasets, necessitating varied testing approaches. For the news portal dataset, a single test with seven labels is conducted, followed by enhancement with XGBoost. The X dataset undergoes two tests with datasets of 1200 and 3078 entries, using three labels. Conclusion: The evaluation results indicate that the highest accuracy achieved with the news portal data was 82%, thanks to a combination of decision tree algorithms with various parameters and the balancing effect of SMOTE. For the Twitter dataset with 3078 entries, the highest accuracy reached 95%, attributed to the application of ensemble techniques, particularly boosting.
Downloads
References
A. Sofianto and T. Risandewi, “Mapping of Potential Village-Owned Enterprises (BUMDes) for Rural Economic Recovery during the COVID-19 Pandemic in Central Java, Indonesia,” in IOP Conference Series: Earth and Environmental Science, IOP Publishing Ltd, Nov. 2021, pp. 1–17. doi: 10.1088/1755-1315/887/1/012022.
Haeranah, “Village Funds Management in Improving the Development Leppangeng Village, Ajangale District, Bone Regency,” Jurnal Ilmu Pemerintahan Suara Khatulistiwa, vol. 5, no. 1, pp. 81–91, 2020, doi: 10.33701/jipsk.v5i1.1126.
M. Rahmadanti, G. Gamaputra, D. A. U. Yuni Lestari, and P. Pinata, “Village Financial System Management in Kebumen Regency,” KnE Social Sciences, May 2022, doi: 10.18502/kss.v7i9.10992.
E. Hermawan, “Community Empowerment through Management of Village Funds Allocation in Indonesia,” International Journal of Science and Society, vol. 1, no. 3, pp. 67–79, 2019, doi: 10.54783/ijsoc.v1i3.30.
S. Wahyudi, T. Achmad, and I. D. Pamungkas, “Prevention Village Fund Fraud in Indonesia: Moral Sensitivity as a Moderating Variable,” Economies, vol. 10, no. 1, pp. 1–16, 2022, doi: 10.3390/economies10010026.
B. Santoso and A. Awangga, “Village Government Implementation Based on Law Number 6 of 2014,” Hermeneutika, vol. 7, no. 1, pp. 155–163, 2023, doi: 10.33603/hermeneutika.v6i3.8326.
A. A. I. N. Marhaeni et al., “Empowerment Of Village Owned Enterprises (BUMDes) In The Context Of Optimizing The Assets Of Nyuhtebel Village, Manggis District, Karangasem Regency,” International Journal Of Community Service, vol. 2, no. 4, pp. 447–453, 2022, doi: 10.51601/ijcs.v2i4.151.
M. A. Ladiku, F. U. Puluhulawa, and N. M. Nggilu, “Measuring The Evaluation And Clarification of The Implementation of The Forming of Village Regulations In The New Normal Time,” Estudiente Law Journal, vol. 3, no. 1, pp. 56–69, 2021, doi: 10.33756/eslaj.v0i0.14942.
J. Boegershausen, H. Datta, A. Borah, and A. T. Stephen, “Fields of Gold: Scraping Web Data for Marketing Insights,” J Mark, vol. 86, no. 5, pp. 1–20, Sep. 2022, doi: 10.1177/00222429221100750.
V. A. Flores, P. A. Permatasari, and L. Jasa, “Penerapan Web Scraping Sebagai Media Pencarian dan Menyimpan Artikel Ilmiah Secara Otomatis Berdasarkan Keyword,” Majalah Ilmiah Teknologi Elektro, vol. 19, no. 2, p. 157, 2020, doi: 10.24843/mite.2020.v19i02.p06.
S. Satriajati, S. Bagus Panuntun, and S. Pramana, “Implementasi Web Scraping Dalam Pengumpulan Berita Kriminal Pada Masa Pandemi COVID-19 (Studi Kasus: Situs Berita detik.com),” in Seminar Nasional Official Statistics, 2020, pp. 300–308. doi: 10.34123/semnasoffstat.v2020i1.578.
A. Suryadi, W. A. Syb’an, N. Alfa’inna, E. H. Hermaliani, and U. N. Mandiri, “Implementasi Web Scraping dan Sentiment Analysis Terhadap Berita Menggunakan Machine Learning,” JURNAL SWABUMI, vol. 11, no. 1, p. 2023, 2023, doi: 10.31294/swabumi.v11i1.15145.
M. Yusa, E. Utami, and E. T. Luthfi, “Evaluasi Performa Algoritma Klasifikasi Decision Tree ID3, C4.5, dan CART Pada Dataset Readmisi Pasien Diabetes,” InfoSys Journal, vol. 4, no. 1, pp. 23–34, 2016, doi: 10.22303/infosys.4.1.2016.23-34.
G. Katz, A. Shabtai, L. Rokach, and N. Ofek, “Confdtree: A statistical method for improving decision trees,” J Comput Sci Technol, vol. 29, no. 3, pp. 392–407, 2014, doi: 10.1007/s11390-014-1438-5.
F. Es-Sabery et al., “A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree Classifier,” IEEE Access, vol. 9, pp. 58706–58739, 2021, doi: 10.1109/ACCESS.2021.3073215.
Y. Q. Song, X. Yao, Z. Liu, X. Shen, and J. Mao, “An Improved C4.5 Algorthm in Bagging Integration Model,” IEEE Access, vol. 8, pp. 206866–206875, 2020, doi: 10.1109/ACCESS.2020.3032291.
X. Luo, X. Wen, M. C. Zhou, A. Abusorrah, and L. Huang, “Decision-Tree-Initialized Dendritic Neuron Model for Fast and Accurate Data Classification,” IEEE Trans Neural Netw Learn Syst, vol. 33, no. 9, pp. 4173–4183, Sep. 2022, doi: 10.1109/TNNLS.2021.3055991.
J. M. Ahn, J. Kim, and K. Kim, “Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting,” Toxins (Basel), vol. 15, no. 10, Oct. 2023, doi: 10.3390/toxins15100608.
S. S. Dhaliwal, A. Al Nahid, and R. Abbas, “Effective intrusion detection system using XGBoost,” Information (Switzerland), vol. 9, no. 7, Jun. 2018, doi: 10.3390/info9070149.
M. Fayaz, A. Khan, J. U. Rahman, A. Alharbi, M. I. Uddin, and B. Alouffi, “Ensemble machine learning model for classification of spam product reviews,” Complexity, vol. 2020, pp. 1–10, 2020, doi: 10.1155/2020/8857570.
A. Mohammed and R. Kora, “A comprehensive review on ensemble deep learning: Opportunities and challenges,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 2. King Saud bin Abdulaziz University, pp. 757–774, Feb. 01, 2023. doi: 10.1016/j.jksuci.2023.01.014.
I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022, doi: 10.1109/ACCESS.2022.3207287.
E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, pp. 1–21, Dec. 2021, doi: 10.3390/informatics8040079.
M. K. Anam, M. I. Mahendra, W. Agustin, Rahmaddeni, and Nurjayadi, “Framework for Analyzing Netizen Opinions on BPJS Using Sentiment Analysis and Social Network Analysis (SNA),” Intensif, vol. 6, no. 1, pp. 2549–6824, 2022, doi: 10.29407/intensif.v6i1.15870.
Y. Jung, “Multiple predicting K-fold cross-validation for model selection,” J Nonparametr Stat, vol. 30, no. 1, pp. 197–215, Jan. 2018, doi: 10.1080/10485252.2017.1404598.
M. K. Anam et al., “Sentiment Analysis for Online Learning using The Lexicon-Based Method and The Support Vector Machine Algorithm,” ILKOM Jurnal Ilmiah, vol. 15, no. 2, pp. 290–302, 2023, doi: 10.33096/ilkom.v15i2.1590.290-302.
R. Haque, N. Islam, M. Tasneem, and A. K. Das, “Multi-class sentiment classification on Bengali social media comments using machine learning,” International Journal of Cognitive Computing in Engineering, vol. 4, pp. 21–35, Jun. 2023, doi: 10.1016/j.ijcce.2023.01.001.
L. Zhao, S. Lee, and S. P. Jeong, “Decision tree application to classification problems with boosting algorithm,” Electronics (Switzerland), vol. 10, no. 16, Aug. 2021, doi: 10.3390/electronics10161903.
M. Zhang, H. Peng, and X. Yan, “Improved algorithm of decision tree based on neural network,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Dec. 2020, pp. 1–8. doi: 10.1088/1742-6596/1693/1/012081.
M. Riansyah, S. Suwilo, and M. Zarlis, “Improved Accuracy In Data Mining Decision Tree Classification Using Adaptive Boosting (Adaboost),” SinkrOn, vol. 8, no. 2, pp. 617–622, Apr. 2023, doi: 10.33395/sinkron.v8i2.12055.
P. Tzirakis and C. Tjortjis, “T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes,” Adv Data Anal Classif, vol. 11, no. 2, pp. 353–370, Jun. 2017, doi: 10.1007/s11634-016-0246-x.
P. Rim and E. Liu, “Optimizing the C4.5 Decision Tree Algorithm using MSD-Splitting,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 11, no. 10, pp. 41–47, 2020, doi: 10.14569/IJACSA.2020.0111006.
A. R. Manga’, A. N. Handayani, H. W. Herwanto, R. A. Asmara, Y. I. Sulistya, and Kasmira, “Analysis of the Ensemble Method Classifier’s Performance on Handwritten Arabic Characters Dataset,” ILKOM Jurnal Ilmiah, vol. 15, no. 1, pp. 186–192, Apr. 2023, doi: 10.33096/ilkom.v15i1.1357.186-192.
F. Leon, S.-A. Floria, and C. Bădică, “Evaluating the Effect of Voting Methods on Ensemble-Based Classification,” in International Conference on INnovations in Intelligent SysTems and Applications (INISTA), 2017, pp. 1–6. doi: 10.1109/INISTA.2017.8001122.
A. Pohon et al., “The Decision Tree Algorithm on Sentiment Analysis: Russia and Ukraine War,” vol. 13, no. 2, 2023, doi: 10.30700/jst.v13i2.1397.
A. Y. Ikhsanti, Y. Fauziah, and R. I. Perwira, “Implementation of the c4.5 decision tree learning algorithm for sentiment analysis in e-commerce application reviews on google play store,” Computing and Information Processing Letters , vol. 1, no. 1, pp. 25–30, 2021, doi: 10.31315/cip.v1i1.6128.
F. Fersellia, E. Utami, and A. Yaqin, “Sentiment Analysis of Shopee Food Application User Satisfaction Using the C4.5 Decision Tree Method,” Sinkron, vol. 8, no. 3, pp. 1554–1563, Jul. 2023, doi: 10.33395/sinkron.v8i3.12531.
Y. Rianto and A. Y. Kuntoro, “Prediction Using Random Forest, Decision Tree, Naïve Bayes, And Ensemble Algorithm,” SinkrOn, vol. 5, no. 1, pp. 9–20, Sep. 2020, doi: 10.33395/sinkron.v5i1.10565.
I. Sabilirrasyad, Z. Hasan, and mas’ud Hermansyah, “Sentiment Analysis of Twitter Discussions on Rafael Alun: Multinomial Naïve Bayes and Decision Tree Approach,” in International Conference On Economics ,Businessand Information Technology, 2023, pp. 803–809. doi: 10.31967/prmandala.v4i0.827.
Copyright (c) 2024 Pandu Pratama Putra, M Khairul Anam, Sarjon Defit, Arda Yunianta
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Copyright on any article is retained by the author(s).
2. The author grants the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.
3. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
4. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
5. The article and any associated published material is distributed under the Creative Commons Attribution-ShareAlike 4.0 International License