Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset

Authors

  • Meida Cahyo Untoro Institut Teknologi Sumatera
  • Muhammad Asyroful Nur Maulana Yusuf Institut Teknologi Sumatera

DOI:

https://doi.org/10.25299/itjrd.2023.12412

Keywords:

Imbalanced, Undersampling, oversampling, classification

Abstract

Classification is a model for making predictions based on existing data. Imbalanced data leads to misclassification or modeling errors where the data is not relevant and results in poor classification modeling. A poor classification model is caused by imbalanced data in the classification label, and there is a need for data balancing as a solution to resolve this issue. The methods used to handle data imbalance are Random Undersampling and MWMOTE. The goal is to see the implementation of Random Undersampling and MWMOTE working well in addressing the imbalanced dataset and to know the performance and accuracy in modeling. The dataset used is an open source dataset from Kaggle consisting of Diabetes data, Bank Turnover data, Stroke data, and Credit Card data with various data ratios, with the goal of addressing the problem of imbalanced data. Model evaluation was performed using the confusion matrix and decision tree algorithm by looking at the precision, recall, f-measure, and accuracy values from the Random Undersampling and MWMOTE methods. Random Undersampling can address the problem of imbalanced data with a precision of 76.28%, recall of 76.74%, f-measure of 76.48%, and accuracy of 76.21%. MWMOTE can address the problem of imbalanced data with a precision of 86.04%, recall of 87.30%, f-measure of 86.66%, and accuracy of 86.61%. It can be concluded that the MWMOTE method is better than the Random Undersampling method because the average evaluation of the confusion matrix of the Random Undersampling method is smaller than the MWMOTE method.

Downloads

Download data is not yet available.

References

P. Agustia Rahayuningsih and P. Studi Sistem Informasi Akuntansi Kampus Kota Pontianak, “PENERAPAN TEKNIK SAMPLING UNTUK MENGATASI IMBALANCE CLASS PADA KLASIFIKASI ONLINE SHOPPERS INTENTION,” Jurnal Teknik Informatika Kaputama (JTIK), vol. 4, no. 1, 2020.

Hudori, “Resampling Neural Network untuk Penanganan Class Imbalance pada Prediksi Klaim Asuransi A. PENDAHULUAN,” vol. 10, no. 1, pp. 57–64, 2020, doi: 10.36350/jbs.v10i1.

D. Chen, X. J. Wang, C. Zhou, and B. Wang, “The Distance-Based Balancing Ensemble Method for Data With a High Imbalance Ratio,” IEEE Access, vol. 7, pp. 68940–68956, 2019, doi: 10.1109/ACCESS.2019.2917920.

I. Pratama, A. Y. Chandra, and P. T. Presetyaningrum, “Seleksi Fitur dan Penanganan Imbalanced Data menggunakan RFECV dan ADASYN,” Jurnal Eksplora Informatika, vol. 11, no. 1, pp. 38–49, Jan. 2022, doi: 10.30864/eksplora.v11i1.578.

A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random Forest untuk Klasifikasi Penilaian Kredit,” JURNAL INFORMATIKA, vol. 5, no. 2, 2018.

S. Mutmainah, “PENANGANAN IMBALANCE DATA PADA KLASIFIKASI KEMUNGKINAN PENYAKIT STROKE,” 2021. [Online]. Available: https://library.uii.ac.id/osr

S. Keputusan Dirjen Penguatan Riset dan Pengembangan Ristek Dikti, A. Nikmatul Kasanah, U. Pujianto, T. Elektro, F. Teknik, and U. Negeri Malang, “Terakreditasi SINTA Peringkat 2 Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” masa berlaku mulai, vol. 1, no. 3, pp. 196–201, 2017.

H. Ali, N. A. Samat, and H. M. Ashgher, “Adaptive Semi-Unsupervised Weighted Oversampling with Sparsity Factor for Imbalanced Biomedical Data,” Journal of Soft Computing and Data Mining, vol. 01, no. 01, Mar. 2020, doi: 10.30880/jscdm.2020.01.01.003.

P. Statistika STIS, P. M. Statistika STIS Alfa Rizki, and P. Statistika STIS Rani Nooraeni, “Penerapan Metode Resampling dalam Mengatasi Imbalanced Data Pada Determinan Kasus Diare Pada Balita di Indonesia Andriansyah Muqiit WS Intan Putri Ananda Zahrotin Dwi Hapsari.”

T. Purwa, “Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017),” Jurnal Matematika, Statistika dan Komputasi, vol. 16, no. 1, p. 58, Jun. 2019, doi: 10.20956/jmsk.v16i1.6494.

S. Bagui and K. Li, “Resampling imbalanced data for network intrusion detection datasets,” J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-020-00390-x.

M. Bach, A. Werner, and M. Palt, “The proposal of undersampling method for learning from imbalanced datasets,” in Procedia Computer Science, 2019, vol. 159, pp. 125–134. doi: 10.1016/j.procs.2019.09.167.

S. Mishra, “Handling Imbalanced Data: SMOTE vs. Random Undersampling,” International Research Journal of Engineering and Technology, 2017, [Online]. Available: www.irjet.net

A. Fauzi, “KOMPARASI ALGORITMA DENGAN PENDEKATAN RANDOM UNDERSAMPLING UNTUK MENANGANI KETIDAKSEIMBANGAN KELAS PADA PREDIKSI CACAT SOFTWARE,” Maret, vol. 15, no. 1, p. 27, 2019, [Online]. Available: www.nusamandiri.ac.id

I. Nekooeimehr and S. K. Lai-Yuen, “Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets,” Expert Syst Appl, vol. 46, pp. 405–416, Mar. 2016, doi: 10.1016/j.eswa.2015.10.031.

S. Barua, M. M. Islam, X. Yao, and K. Murase, “MWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning,” IEEE Trans Knowl Data Eng, vol. 26, no. 2, pp. 405–425, Feb. 2014, doi: 10.1109/TKDE.2012.232.

M. C. Untoro and J. L. Buliali, “Penanganan imbalance class data laboratorium kesehatan dengan majority weighted minority oversampling technique,” Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 4, no. 1, pp. 23–29, Jan. 2018, doi: 10.26594/register.v4i1.1184.

M. C. Untoro, M. Praseptiawan, M. Widianingsih, I. F. Ashari, A. Afriansyah, and Oktafianto, “Evaluation of Decision Tree, K-NN, Naive Bayes and SVM with MWMOTE on UCI Dataset,” in Journal of Physics: Conference Series, 2020, vol. 1477, no. 3. doi: 10.1088/1742-6596/1477/3/032005.

P. Y. Saputra, M. Z. Abdullah, and A. P. Kirana, “Improvisasi Teknik Oversampling MWMOTE Untuk Penanganan Data Tidak Seimbang,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 2, p. 398, Apr. 2021, doi: 10.30865/mib.v5i2.2811.

M. C. Untoro, “MWMOTE optimization for imbalanced data using complete linkage,” Jurnal Teknologi dan Sistem Komputer, vol. 9, no. 2, pp. 77–82, Apr. 2021, doi: 10.14710/jtsiskom.2021.13748.

M. Iqbal Ramadhan, “PENERAPAN DATA MINING UNTUK ANALISIS DATA BENCANA MILIK BNPB MENGGUNAKAN ALGORITMA K-MEANS DAN LINEAR REGRESSION,” 2017.

Y. Pristyanto, “PENERAPAN METODE ENSEMBLE UNTUK MENINGKATKAN KINERJA ALGORITME KLASIFIKASI PADA IMBALANCED DATASET,” 2019. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/User+Knowledge

C. E. Puspita, O. N. Pratiwi, and E. Sutoyo, “PERBANDINGAN ALGORITMA KLASIFIKASI SUPPORT VECTOR MACHINE DAN NAIVE BAYES PADA IMBALANCE DATA,” JURTEKSI (Jurnal Teknologi dan Sistem Informasi), vol. 8, no. 1, pp. 11–18, Dec. 2021, doi: 10.33330/jurteksi.v8i1.1185.

M. Koziarski, “Radial-Based Undersampling for imbalanced data classification,” Pattern Recognit, vol. 102, Jun. 2020, doi: 10.1016/j.patcog.2020.107262.

J. Wei, H. Huang, L. Yao, Y. Hu, Q. Fan, and D. Huang, “NI-MWMOTE: An improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems,” Expert Syst Appl, vol. 158, Nov. 2020, doi: 10.1016/j.eswa.2020.113504.

T. Purwa, “Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017),” Jurnal Matematika, Statistika dan Komputasi, vol. 16, no. 1, p. 58, Jun. 2019, doi: 10.20956/jmsk.v16i1.6494.

Downloads

Published

2023-08-18

How to Cite

Untoro, M. C., & Yusuf, M. A. N. M. . (2023). Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset. IT Journal Research and Development, 8(1), 1–13. https://doi.org/10.25299/itjrd.2023.12412

Issue

Section

Articles