Student Dropout Prediction Using Random Forest and XGBoost Method

Lalu Ganda Rady Putra; Didik Dwi Prasetya; Mayadi Mayadi

doi:10.29407/intensif.v9i1.21191

Authors

Lalu Ganda Rady Putra Universitas Negeri Malang https://orcid.org/0000-0002-7596-5734
Didik Dwi Prasetya Universitas Negeri Malang https://orcid.org/0000-0002-3540-2961
Mayadi Universiti Teknologi Mara https://orcid.org/0000-0002-4585-9973

DOI:

https://doi.org/10.29407/intensif.v9i1.21191

Keywords:

Student Dropout, Prediction, Random Forest, XGBoost

Abstract

Background: The increasing dropout rate in Indonesia poses significant challenges to the education system, particularly as students advance through higher education levels. Predicting student attrition accurately can help institutions implement timely interventions to improve retention. Objective: This study aims to evaluate the effectiveness of the Random Forest and XGBoost algorithms in predicting student attrition based on demographic, socioeconomic, and academic performance factors. Methods: A quantitative study was conducted using a dataset of 4,424 instances with 34 attributes, categorized into Dropout, Graduate, and Enrolled. The performance of Random Forest and XGBoost was compared based on accuracy, specificity, and sensitivity. Results: Random Forest achieved the highest accuracy at 80.56%, with a specificity of 76.41% and sensitivity of 72.42%, outperforming XGBoost. While XGBoost was slightly less accurate, it remained a competitive approach for student attrition prediction. Conclusion: The findings highlight Random Forest's robustness in handling extensive datasets with diverse attributes, making it a reliable tool for identifying at-risk students. This study underscores the potential of machine learning in addressing educational challenges. Future research should explore advanced ensemble techniques, such as the Ensemble Voting Classifier, or deep learning models to further enhance prediction accuracy and scalability.

Downloads

Download data is not yet available.

Abstract views: 401 , PDF downloads: 287

Author Biographies

Lalu Ganda Rady Putra, Universitas Negeri Malang

Teknik Elektro dan Informatika Universitas Negeri Malang
Didik Dwi Prasetya, Universitas Negeri Malang

Teknik Elektro dan Informatika, Universitas Negeri Malang
Mayadi, Universiti Teknologi Mara

Kolej Pengajian Pengkomputeran, Informatik dan Media, Universiti Teknologi Mara

References

R. W. Rumberger, “The economics of high school dropouts,” in The Economics of Education, Elsevier, 2020, pp. 149–158. doi: 10.1016/B978-0-12-815391-8.00012-4.

T. D. Snyder, C. de Brey, and S. A. Dillow, Digest of Education Statistics, vol. 51, no. 10. Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, 2014. doi: 10.5860/choice.51-5366.

J. G. C. Krüger, A. de S. Britto, and J. P. Barddal, “An explainable machine learning approach for student dropout prediction,” Expert Systems with Applications, vol. 233, p. 120933, Dec. 2023, doi: 10.1016/j.eswa.2023.120933.

J. J. Lanawaang and R. Mesra, “Faktor Penyebab Anak Putus Sekolah di Kelurahaan Tuutu Analisis Pasal 31 Ayat 1, 2, dan 3 UUD 1945,” Jurnal Ilmiah Mandala Education, vol. 9, no. 2, Apr. 2023, doi: 10.58258/jime.v9i2.5103.

C. Marquez-Vera, C. R. Morales, and S. V. Soto, “Predicting School Failure and Dropout by Using Data Mining Techniques,” IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, vol. 8, no. 1, pp. 7–14, Feb. 2013, doi: 10.1109/RITA.2013.2244695.

J. E. Nieuwoudt and M. L. Pedler, “Student Retention in Higher Education: Why Students Choose to Remain at University,” Journal of College Student Retention: Research, Theory & Practice, vol. 25, no. 2, pp. 326–349, Aug. 2023, doi: 10.1177/1521025120985228.

I. Lykourentzou, I. Giannoukos, V. Nikolopoulos, G. Mpardis, and V. Loumos, “Dropout prediction in e-learning courses through the combination of machine learning techniques,” Computers & Education, vol. 53, no. 3, pp. 950–965, Nov. 2009, doi: 10.1016/j.compedu.2009.05.010.

H. Luan and C.-C. Tsai, “A Review of Using Machine Learning Approaches for Precision Education,” Educational Technology & Society, vol. 24, no. 1, pp. 250–266, Nov. 2021, [Online]. Available: https://www.jstor.org/stable/26977871

F. Del Bonifro, M. Gabbrielli, G. Lisanti, and S. P. Zingaro, “Student dropout prediction,” in Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6--10, 2020, Proceedings, Part I 21, 2020, pp. 129–140.

N. Y. L. Gaol, “Prediksi Mahasiswa Berpotensi Non Aktif Menggunakan Data Mining dalam Decision Tree dan Algoritma C4.5,” Jurnal Informasi & Teknologi, pp. 23–29, Mar. 2020, doi: 10.37034/jidt.v2i1.22.

M. T. Anwar and D. R. A. Permana, “Perbandingan Performa Model Data Mining untuk Prediksi Dropout Mahasiwa,” Jurnal Teknologi dan Manajemen, vol. 19, no. 2, pp. 33–40, Aug. 2021, doi: 10.52330/jtm.v19i2.34.

J. Y. Chung and S. Lee, “Dropout early warning systems for high school students using machine learning,” Children and Youth Services Review, vol. 96, pp. 346–353, Jan. 2019, doi: 10.1016/j.childyouth.2018.11.030.

Y. Zheng, Z. Shao, M. Deng, Z. Gao, and Q. Fu, “MOOC dropout prediction using a fusion deep model based on behaviour features,” Computers and Electrical Engineering, vol. 104, p. 108409, Dec. 2022, doi: 10.1016/j.compeleceng.2022.108409.

H. Aldowah, H. Al-Samarraie, A. I. Alzahrani, and N. Alalwan, “Factors affecting student dropout in MOOCs: a cause and effect decision‐making model,” Journal of Computing in Higher Education, vol. 32, no. 2, pp. 429–454, Aug. 2020, doi: 10.1007/s12528-019-09241-y.

K. Coussement, M. Phan, A. De Caigny, D. F. Benoit, and A. Raes, “Predicting student dropout in subscription-based online learning environments: The beneficial impact of the logit leaf model,” Decision Support Systems, vol. 135, p. 113325, Aug. 2020, doi: 10.1016/j.dss.2020.113325.

A. Anggrawan, H. Hairani, and C. Satria, “Improving SVM Classification Performance on Unbalanced Student Graduation Time Data Using SMOTE,” International Journal of Information and Education Technology, vol. 13, no. 2, pp. 289–295, 2023, doi: 10.18178/ijiet.2023.13.2.1806.

L. Kemper, G. Vorhoff, and B. U. Wigger, “Predicting student dropout: A machine learning approach,” European Journal of Higher Education, vol. 10, no. 1, pp. 28–47, Jan. 2020, doi: 10.1080/21568235.2020.1718520.

M. Cannistrà, C. Masci, F. Ieva, T. Agasisti, and A. M. Paganoni, “Early-predicting dropout of university students: an application of innovative multilevel machine learning and statistical techniques,” Studies in Higher Education, vol. 47, no. 9, pp. 1935–1956, Sep. 2022, doi: 10.1080/03075079.2021.2018415.

V. Realinho, J. Machado, L. Baptista, and M. V Martins, “Predict students’ dropout and academic success.” Zenodo, Dec. 2021. doi: 10.5281/zenodo.5777340.

N. McKelvey, K. Curran, and L. Toland, “The Challenges of Data Cleansing with Data Warehouses,” pp. 77–82. doi: 10.4018/978-1-5225-0182-4.ch005.

Y. Bouchlaghem, Y. Akhiat, and S. Amjad, “Feature Selection: A Review and Comparative Study,” E3S Web of Conferences, vol. 351, p. 01046, May 2022, doi: 10.1051/e3sconf/202235101046.

P. Dangeti, Statistics for Machine Learning. Packt Publishing, 2017. [Online]. Available: https://books.google.co.id/books?id=C-dDDwAAQBAJ

T. Chen and C. Guestrin, “XGBoost,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.

I. Düntsch and G. Gediga, “Indices for rough set approximation and the application to confusion matrices,” International Journal of Approximate Reasoning, vol. 118, pp. 155–172, Mar. 2020, doi: 10.1016/j.ijar.2019.12.008.

J. Görtler et al., “Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels,” in CHI Conference on Human Factors in Computing Systems, New York, NY, USA: ACM, Apr. 2022, pp. 1–13. doi: 10.1145/3491102.3501823.

N. M. Kebonye, “Exploring the novel support points-based split method on a soil dataset,” Measurement, vol. 186, p. 110131, Dec. 2021, doi: 10.1016/j.measurement.2021.110131.

T. Sushma and V. Ramakrishnan, “Comparison of random forest classifier with XG boost classifier to classify the accuracy of flight delays,” 2023, p. 020040. doi: 10.1063/5.0178976.

Z. Xu, Y. Zhu, G. Li, and J. Yang, “Diabetes risk prediction model based on random forest and Xgboost,” in International Conference on Electronic Information Engineering and Computer Science (EIECS 2022), Y. Yue, Ed., SPIE, Apr. 2023, p. 22. doi: 10.1117/12.2668038.

R. P. Quevedo et al., “Consideration of spatial heterogeneity in landslide susceptibility mapping using geographical random forest model,” Geocarto International, vol. 37, no. 25, pp. 8190–8213, Dec. 2022, doi: 10.1080/10106049.2021.1996637.