Evaluation of the random forest model as a tool for predicting credit risk in university students
Main Article Content
Abstract
Credit risk for university students is one of the growing problems in the context of low financial inclusion associated with Peru. Many young people resort to informal loans or face difficulties in accessing formal credit. For such borrowers, a machine learning algorithm is applied to measure credit score assessment, with the Random Forest algorithm being popular due to its predictive capacity and ability to handle complex variables. The purpose of the study was to investigate the relevant factors of credit risk according to the socioeconomic, academic, and financial behavior of Peruvian university students, and to verify the predictions made by the RF model compared to the traditional model. The study design adopted was quantitative, basic, and non-experimental with a cross-sectional approach. Questionnaires and database inquiries were used for data collection. To support the theoretical framework, Pareto diagrams, Ishikawa diagrams, and VOS viewer were applied as analysis tools. The data were preprocessed, and the Random Forest model was trained with cross-validation using accuracy, recall, and F1 as metrics. Regarding the results obtained, the model achieved 78% accuracy in credit risk classification. The key variables were family income, payment history, credit card usage, and academic performance, demonstrating that Random Forest is a robust model for predicting credit risk compared to traditional technologies. It can be used to improve financial decision-making, reduce delinquency, and provide fairer and safer financing policies for university students
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
El contenido de los artículos es responsabilidad exclusiva de los autores.
Debe cumplir con los siguientes aspectos de la licencia CC BY NC ND :
- Atribución: debe otorgar el crédito correspondiente, proporcionar un enlace a la licencia e indicar si se realizó algún cambio. Puede hacerlo de cualquier manera razonable, pero no de ninguna manera que sugiera que el licenciante lo respalda a usted o su uso.
- No comercial: el material no se puede utilizar con fines comerciales.
- Sin derivados: si remezcla, transforma o construye sobre el material, no puede distribuir el material modificado.
- Sin restricciones adicionales: No se pueden aplicar términos legales o medidas tecnológicas que restrinjan legalmente a otros de hacer cualquier cosa que permita la licencia.
References
Beaulac, C., & Rosenthal, J. S. (2019). Predicting University Students’ Academic Success and Major Using Random Forests. Research in Higher Education, 60(7), 1048–1064. Documento en línea. Disponible https://doi.org/10.1007/s11162-019-09546-y
Emma Howard, M. M. & Parnell, A. (2017). Contrasting Prediction Methods for Early Warning Systems at Undergraduate Leve. ArXiv [Math.HO], 2, 1–20.
Golbayani, P., Florescu, I., & Chatterjee, R. (2020). A comparative study of forecasting corporate credit ratings using neural networks, support vector machines, and decision trees. The North American Journal of Economics and Finance, 54, 101251. Documento en línea. Disponible https://doi.org/10.1016/j.najef.2020.101251
Goldstein, A., Eaton, C., Villalobos, A., Chakrabarti, P., Cohen, J., & Donnelly, K. (2023). Administrative Burden in Federal Student Loan Repayment, and Socially Stratified Access to Income-Driven Repayment Plans. RSF, 9(4), 86–111. Documento en línea. Disponible https://doi.org/10.7758/RSF.2023.9.4.04
Herrero, S., Rubio, J., & León, M. (2025). Loans to Family and Friends and the Formal Financial System in Latin America. International Journal of Financial Studies, 13(3), 116. Documento en línea. Disponible https://doi.org/10.3390/ijfs13030116
Kwamboka Mageto, D. (2015). Modelling of Credit Risk: Random Forests versus Cox Proportional Hazard Regression. American Journal of Theoretical and Applied Statistics, 4(4), 247. Documento en línea. Disponible https://doi.org/10.11648/j.ajtas.20150404.13
López Torres, V. G., Valenzuela Montoya, M. M., & Lizarraga Benítez, R. I. (2024). Educación financiera, materialismo y valor del dinero: su efecto en el endeudamiento de estudiantes universitarios. RIDE Revista Iberoamericana Para La Investigación y El Desarrollo Educativo, 15(29). Documento en línea. Disponible https://doi.org/10.23913/ride.v15i29.2015
Madaan, M., Kumar, A., Keshri, C., Jain, R., & Nagrath, P. (2021). Loan default prediction using decision trees and random forest: A comparative study. IOP Conference Series: Materials Science and Engineering, 1022(1), 012042. Documento en línea. Disponible https://doi.org/10.1088/1757-899X/1022/1/012042
Maehara, R., Benites, L., Talavera, A., Aybar-Flores, A., & Muñoz, M. (2024). Predicting Financial Inclusion in Peru: Application of Machine Learning Algorithms. Journal of Risk and Financial Management, 17(1). Documento en línea. Disponible https://doi.org/10.3390/jrfm17010034
Mestiri, S. (2024). Credit scoring using machine learning and deep Learning-Based models. Data Science in Finance and Economics, 4(2), 236–248. Documento en línea. Disponible https://doi.org/10.3934/DSFE.2024009
Monarrez, T., & Turner, L. (2024). The Effect of Student Loan Payment Burdens on Borrower Outcomes (Working Paper (Federal Reserve Bank of Philadelphia)). Federal Reserve Bank of Philadelphia. Documento en línea. Disponible https://doi.org/10.21799/frbp.wp.2024.08
Morales Castro, J. A., & Espinosa Jiménez, P. M. (2023). Factors influencing the supply of bank loans in Mexico: an analysis in the context of the 2000 to 2021 crises. Revista Academia and Negocios, 9(1), 79–94. Documento en línea. Disponible https://doi.org/10.29393/RAN9-7FIJP20007
Náñez Alonso, S., Jorge-Vazquez, J., Arias, L., & del Nogal, N. (2024). What Factors Are Limiting Financial Inclusion and Development in Peru? Empirical Evidence. Economies, 12(4), 93. Documento en línea. Disponible https://doi.org/10.3390/economies12040093
Rao, C., Liu, M., Goh, M., & Wen, J. (2020). 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Applied Soft Computing, 95, 106570. Documento en línea. Disponible https://doi.org/10.1016/j.asoc.2020.106570
Thuy, N. T. H., Ha, N. T. V., Trung, N. N., Binh, V. T. T., Hang, N. T., & Binh, V. T. (2025). Comparing the Effectiveness of Machine Learning and Deep Learning Models in Student Credit Scoring: A Case Study in Vietnam. Risks, 13(5). Documento en línea. Disponible https://doi.org/10.3390/risks13050099
Wu, W. (2022). Machine Learning Approaches to Predict Loan Default. Intelligent Information Management, 14(05), 157–164. Documento en línea. Disponible https://doi.org/10.4236/iim.2022.145011
Yang, H. (2023). A Random Forest Approach to Appraise Personal Credit Risk of Internet Loans. Tehnicki Vjesnik - Technical Gazette, 30(2). Documento en línea. Disponible https://doi.org/10.17559/TV-20221003064737
Zhu, L., Qiu, D., Ergu, D., Ying, C., & Liu, K. (2019). A study on predicting loan default based on the random forest algorithm. Procedia Computer Science, 162, 503–513. Documento en línea. Disponible https://doi.org/10.1016/j.procs.2019.12.017