Student Dropout Prediction Using Random Forest and XGBoost Method
DOI:
https://doi.org/10.29407/intensif.v9i1.21191Keywords:
Student Dropout, Prediction, Random Forest, XGBoostAbstract
Background: The increasing dropout rate in Indonesia poses significant challenges to the education system, particularly as students advance through higher education levels. Predicting student attrition accurately can help institutions implement timely interventions to improve retention. Objective: This study aims to evaluate the effectiveness of the Random Forest and XGBoost algorithms in predicting student attrition based on demographic, socioeconomic, and academic performance factors. Methods: A quantitative study was conducted using a dataset of 4,424 instances with 34 attributes, categorized into Dropout, Graduate, and Enrolled. The performance of Random Forest and XGBoost was compared based on accuracy, specificity, and sensitivity. Results: Random Forest achieved the highest accuracy at 80.56%, with a specificity of 76.41% and sensitivity of 72.42%, outperforming XGBoost. While XGBoost was slightly less accurate, it remained a competitive approach for student attrition prediction. Conclusion: The findings highlight Random Forest's robustness in handling extensive datasets with diverse attributes, making it a reliable tool for identifying at-risk students. This study underscores the potential of machine learning in addressing educational challenges. Future research should explore advanced ensemble techniques, such as the Ensemble Voting Classifier, or deep learning models to further enhance prediction accuracy and scalability.
Downloads
References
R. W. Rumberger, “The economics of high school dropouts,” in The Economics of Education, Elsevier, 2020, pp. 149–158. doi: 10.1016/B978-0-12-815391-8.00012-4.
T. D. Snyder, C. de Brey, and S. A. Dillow, Digest of Education Statistics, vol. 51, no. 10. Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, 2014. doi: 10.5860/choice.51-5366.
J. G. C. Krüger, A. de S. Britto, and J. P. Barddal, “An explainable machine learning approach for student dropout prediction,” Expert Systems with Applications, vol. 233, p. 120933, Dec. 2023, doi: 10.1016/j.eswa.2023.120933.
J. J. Lanawaang and R. Mesra, “Faktor Penyebab Anak Putus Sekolah di Kelurahaan Tuutu Analisis Pasal 31 Ayat 1, 2, dan 3 UUD 1945,” Jurnal Ilmiah Mandala Education, vol. 9, no. 2, Apr. 2023, doi: 10.58258/jime.v9i2.5103.
C. Marquez-Vera, C. R. Morales, and S. V. Soto, “Predicting School Failure and Dropout by Using Data Mining Techniques,” IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, vol. 8, no. 1, pp. 7–14, Feb. 2013, doi: 10.1109/RITA.2013.2244695.
J. E. Nieuwoudt and M. L. Pedler, “Student Retention in Higher Education: Why Students Choose to Remain at University,” Journal of College Student Retention: Research, Theory & Practice, vol. 25, no. 2, pp. 326–349, Aug. 2023, doi: 10.1177/1521025120985228.
I. Lykourentzou, I. Giannoukos, V. Nikolopoulos, G. Mpardis, and V. Loumos, “Dropout prediction in e-learning courses through the combination of machine learning techniques,” Computers & Education, vol. 53, no. 3, pp. 950–965, Nov. 2009, doi: 10.1016/j.compedu.2009.05.010.
H. Luan and C.-C. Tsai, “A Review of Using Machine Learning Approaches for Precision Education,” Educational Technology & Society, vol. 24, no. 1, pp. 250–266, Nov. 2021, [Online]. Available: https://www.jstor.org/stable/26977871
F. Del Bonifro, M. Gabbrielli, G. Lisanti, and S. P. Zingaro, “Student dropout prediction,” in Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6--10, 2020, Proceedings, Part I 21, 2020, pp. 129–140.
N. Y. L. Gaol, “Prediksi Mahasiswa Berpotensi Non Aktif Menggunakan Data Mining dalam Decision Tree dan Algoritma C4.5,” Jurnal Informasi & Teknologi, pp. 23–29, Mar. 2020, doi: 10.37034/jidt.v2i1.22.
M. T. Anwar and D. R. A. Permana, “Perbandingan Performa Model Data Mining untuk Prediksi Dropout Mahasiwa,” Jurnal Teknologi dan Manajemen, vol. 19, no. 2, pp. 33–40, Aug. 2021, doi: 10.52330/jtm.v19i2.34.
J. Y. Chung and S. Lee, “Dropout early warning systems for high school students using machine learning,” Children and Youth Services Review, vol. 96, pp. 346–353, Jan. 2019, doi: 10.1016/j.childyouth.2018.11.030.
Y. Zheng, Z. Shao, M. Deng, Z. Gao, and Q. Fu, “MOOC dropout prediction using a fusion deep model based on behaviour features,” Computers and Electrical Engineering, vol. 104, p. 108409, Dec. 2022, doi: 10.1016/j.compeleceng.2022.108409.
H. Aldowah, H. Al-Samarraie, A. I. Alzahrani, and N. Alalwan, “Factors affecting student dropout in MOOCs: a cause and effect decision‐making model,” Journal of Computing in Higher Education, vol. 32, no. 2, pp. 429–454, Aug. 2020, doi: 10.1007/s12528-019-09241-y.
K. Coussement, M. Phan, A. De Caigny, D. F. Benoit, and A. Raes, “Predicting student dropout in subscription-based online learning environments: The beneficial impact of the logit leaf model,” Decision Support Systems, vol. 135, p. 113325, Aug. 2020, doi: 10.1016/j.dss.2020.113325.
A. Anggrawan, H. Hairani, and C. Satria, “Improving SVM Classification Performance on Unbalanced Student Graduation Time Data Using SMOTE,” International Journal of Information and Education Technology, vol. 13, no. 2, pp. 289–295, 2023, doi: 10.18178/ijiet.2023.13.2.1806.
L. Kemper, G. Vorhoff, and B. U. Wigger, “Predicting student dropout: A machine learning approach,” European Journal of Higher Education, vol. 10, no. 1, pp. 28–47, Jan. 2020, doi: 10.1080/21568235.2020.1718520.
M. Cannistrà, C. Masci, F. Ieva, T. Agasisti, and A. M. Paganoni, “Early-predicting dropout of university students: an application of innovative multilevel machine learning and statistical techniques,” Studies in Higher Education, vol. 47, no. 9, pp. 1935–1956, Sep. 2022, doi: 10.1080/03075079.2021.2018415.
V. Realinho, J. Machado, L. Baptista, and M. V Martins, “Predict students’ dropout and academic success.” Zenodo, Dec. 2021. doi: 10.5281/zenodo.5777340.
N. McKelvey, K. Curran, and L. Toland, “The Challenges of Data Cleansing with Data Warehouses,” pp. 77–82. doi: 10.4018/978-1-5225-0182-4.ch005.
Y. Bouchlaghem, Y. Akhiat, and S. Amjad, “Feature Selection: A Review and Comparative Study,” E3S Web of Conferences, vol. 351, p. 01046, May 2022, doi: 10.1051/e3sconf/202235101046.
P. Dangeti, Statistics for Machine Learning. Packt Publishing, 2017. [Online]. Available: https://books.google.co.id/books?id=C-dDDwAAQBAJ
T. Chen and C. Guestrin, “XGBoost,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
I. Düntsch and G. Gediga, “Indices for rough set approximation and the application to confusion matrices,” International Journal of Approximate Reasoning, vol. 118, pp. 155–172, Mar. 2020, doi: 10.1016/j.ijar.2019.12.008.
J. Görtler et al., “Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels,” in CHI Conference on Human Factors in Computing Systems, New York, NY, USA: ACM, Apr. 2022, pp. 1–13. doi: 10.1145/3491102.3501823.
N. M. Kebonye, “Exploring the novel support points-based split method on a soil dataset,” Measurement, vol. 186, p. 110131, Dec. 2021, doi: 10.1016/j.measurement.2021.110131.
T. Sushma and V. Ramakrishnan, “Comparison of random forest classifier with XG boost classifier to classify the accuracy of flight delays,” 2023, p. 020040. doi: 10.1063/5.0178976.
Z. Xu, Y. Zhu, G. Li, and J. Yang, “Diabetes risk prediction model based on random forest and Xgboost,” in International Conference on Electronic Information Engineering and Computer Science (EIECS 2022), Y. Yue, Ed., SPIE, Apr. 2023, p. 22. doi: 10.1117/12.2668038.
R. P. Quevedo et al., “Consideration of spatial heterogeneity in landslide susceptibility mapping using geographical random forest model,” Geocarto International, vol. 37, no. 25, pp. 8190–8213, Dec. 2022, doi: 10.1080/10106049.2021.1996637.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Lalu Ganda Rady Putra, Didik Dwi Prasetya, Mayadi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Copyright on any article is retained by the author(s).
2. The author grants the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.
3. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
4. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
5. The article and any associated published material is distributed under the Creative Commons Attribution-ShareAlike 4.0 International License