Reducing Lending Risk: SVM Model Development with SMOTE for Unbalanced Credit Data
DOI:
https://doi.org/10.21108/ijoict.v9i2.860Keywords:
Lending, Machine Learning, Support Vector Machine, SMOTEAbstract
Lending is an important activity for banks in managing available funds. However, lending is also an activity that has a high risk, because not all customers who borrow funds can fulfill the responsibilities of the existing agreement. Because of this, it is necessary to have a method that can predict creditworthiness to customers in order to minimize the risks that arise. This research uses machine learning method, namely Support Vector Machine (SVM) in predicting creditworthiness. This method is applied and compared before and after the Synthetic Minority Oversampling Technique (SMOTE) on historical bank credit data BPR NBP 16 Rantau Prapat, North Sumatra and find the best parameters with grid search. According to the results of the analysis based on Area Under the Receiver Operating Characteristic Curve (AUC-ROC), SVM with SMOTE shows better results, namely 96%, than SVM without SMOTE, namely 56%.
Downloads
References
[2] Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). ScienceDirect ScienceDirect A Comparative Assessment of Credit Risk Model Based on Machine Learning-a case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069
[3] Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of Machine Learning second edition. MIT Press
[4] Nurachim, R. I. (2019). PEMILIHAN MODEL PREDIKSI INDEKS HARGA SAHAM YANG DIKEMBANGKAN BERDASARKAN ALGORITMA SUPPORT VECTOR MACHINE(SVM) ATAU MULTILAYER PERCEPTRON(MLP) STUDI KASUS?: SAHAM PT TELEKOMUNIKASI INDONESIA TBK. Jurnal Teknologi Informatika & Komputer |, 5(1).
[5] Lusiyanti, D., & Nacong, D. N. (2018). SISTEM SEDERHANA UNTUK MEMPREDIKSI RISIKO PEMBERIAN KREDIT. JURNAL ILMIAH MATEMATIKA DAN TERAPAN, 15(2), 248–255. https://doi.org/10.22487/2540766X.2018.V15.I2.11360
[6] Kubat, M. (2021). An Introduction to Machine Learning. An Introduction to Machine Learning, 1–458. https://doi.org/10.1007/978-3-030-81935-4/COVER
[7] Namvar, A., Siami, M., Rabhi, F., & Naderpour, M. (2018). Credit risk prediction in an imbalanced social lending environment.
[8] Alam, T. M., Shaukat, K., Hameed, I. A., Luo, S., Sarwar, M. U., Shabbir, S., Li, J., & Khushi, M. (2020). An investigation of credit card default prediction in the imbalanced datasets. IEEE Access, 8, 201173–201198. https://doi.org/10.1109/ACCESS.2020.3033784
[9] Doko, F., Kalajdziski, S., & Mishkovski, I. (2021). Credit Risk Model Based on Central Bank Credit Registry Data. Journal of Risk and Financial Management, 14(3). https://doi.org/10.3390/jrfm14030138
[10] Boiko Ferreira, L. E., Barddal, J. P., Gomes, H. M., & Enembreck, F. (2018). Improving credit risk prediction in online peer-To-peer (P2P) lending using imbalanced learning techniques. Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, 2017-November, 175–181. https://doi.org/10.1109/ICTAI.2017.00037
[11] Yang, P.-A. F.-F., Kelancaran, M., Nurani, P., Syari’ati Pramono, N., Manajemen, D., Ekonomi, F., Manajemen, D., Pertanian, I., Kampus, B., Bogor, D., & Permanasari, Y. (2016). Analisis Faktor-faktor yang Memengaruhi Kelancaran Kredit dan Penilaian Kesehatan Keuangan pada Amartha Microfinance. Jurnal Manajemen Dan Organisasi, 7(1), 1–16. https://doi.org/10.29244/JMO.V7I1.14065
[12] Frye, M., Mohren, J., & Schmitt, R. H. (2021). Benchmarking of Data Preprocessing Methods for Machine Learning-Applications in Production. Procedia CIRP, 104, 50–55. https://doi.org/10.1016/j.procir.2021.11.009
[13] Qu, Z., Li, H., Wang, Y., Zhang, J., Abu-Siada, A., & Yao, Y. (2020). Detection of electricity theft behavior based on improved synthetic minority oversampling technique and random forest classifier. Energies, 13(8). https://doi.org/10.3390/en13082039
[14] Erlin, E., Desnelita, Y., Nasution, N., Suryati, L., & Zoromi, F. (2022). Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang. MATRIK?: Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(3), 677–690. https://doi.org/10.30812/matrik.v21i3.1726
[15] Nugroho, A. S., Witarto, A. B., & Handoko, D. (2003). Support Vector Machine-Teori dan Aplikasinya dalam Bioinformatika 1. http://asnugroho.net
[16] M, H., & M.N, S. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 01–11. https://doi.org/10.5121/ijdkp.2015.5201
[17] Valero-Carreras, D., Alcaraz, J., & Landete, M. (2022). Comparing two SVM models through different metrics based on the confusion matrix. https://doi.org/10.1016/j.cor.2022.106131
[18] Normawati, D., & Prayogi, S. A. (2021). Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter. J-SAKTI (Jurnal Sains Komputer Dan Informatika), 5(2), 697–711. http://ejurnal.tunasbangsa.ac.id/index.php/jsakti/article/view/369
Downloads
Published
How to Cite
Issue
Section
License
Manuscript submitted to IJoICT has to be an original work of the author(s), contains no element of plagiarism, and has never been published or is not being considered for publication in other journals. Author(s) shall agree to assign all copyright of published article to IJoICT. Requests related to future re-use and re-publication of major or substantial parts of the article must be consulted with the editors of IJoICT.