Predicting Heart Disease Using Machine Learning: An Evaluation of Logistic Regression, Random Forest, SVM, and KNN Models on the UCI Heart Disease Dataset
DOI:
https://doi.org/10.25299/itjrd.2025.17941Keywords:
Heart Disease Prediction, Machine Learning, Random Forest, SVM, UCI DatasetAbstract
This study evaluates the performance of three machine learning models—Random Forest, Support Vector Machine (SVM), and Logistic Regression—in predicting heart disease using the "Heart Disease UCI" dataset from Kaggle. The models were assessed based on accuracy, precision, recall, and F1-score, both with and without feature selection techniques such as Chi-Square and Mutual Information.Without feature selection, Random Forest achieved the highest performance with an accuracy of 89.7%, followed by SVM with 87.0%, and Logistic Regression with 84.2%. Using Mutual Information for feature selection, Random Forest achieved an accuracy of 85.3%, SVM 87.0%, and Logistic Regression 82.6%. With Chi-Square feature selection, Random Forest and Logistic Regression both showed an accuracy of 83.2%, while SVM achieved 82.6%.The results indicate that Random Forest consistently performs well across different scenarios, making it a robust choice for heart disease prediction. Feature selection did not significantly enhance model performance, suggesting that the initial features in the dataset are already highly relevant. These findings highlight the potential of machine learning, especially Random Forest, in aiding clinical diagnosis of heart disease. Further research is needed to validate these models on larger, more diverse datasets and to explore advanced feature selection techniques for improved model performance.
Downloads
References
A. Tucker, Z. Wang, Y. Rotalinti, and P. Myles, “Generating high-fidelity synthetic patient data for assessing machine learning healthcare software,” NPJ Digit. Med., vol. 3, no. 1, pp. 1–13, 2020.
H. Habehh and S. Gohel, “Machine learning in healthcare,” Curr. Genomics, vol. 22, no. 4, p. 291, 2021.
D. S. Char, M. D. Abràmoff, and C. Feudtner, “Identifying ethical considerations for machine learning healthcare applications,” Am. J. Bioeth., vol. 20, no. 11, pp. 7–17, 2020.
C. Shen, S. Panda, and J. T. Vogelstein, “The chi-square test of distance correlation,” J. Comput. Graph. Stat., vol. 31, no. 1, pp. 254–262, 2022.
N. S. Turhan, “Karl Pearson’s Chi-Square Tests.,” Educ. Res. Rev., vol. 16, no. 9, pp. 575–580, 2020.
Z. Wang, W. Chen, S. Gu, Y. Wang, and J. Wang, “Evaluation of trunk borer infestation duration using MOS E-nose combined with different feature extraction methods and GS-SVM,” Comput. Electron. Agric., vol. 170, no. December 2019, p. 105293, 2020, doi: 10.1016/j.compag.2020.105293.
M. Zhou, K. Yan, J. Huang, Z. Yang, X. Fu, and F. Zhao, “Mutual information-driven pan-sharpening,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1798–1808.
H. Zhou, X. Wang, and R. Zhu, “Feature selection based on mutual information with correlation coefficient,” Appl. Intell., vol. 52, no. 5, pp. 5457–5474, 2022.
J. D. Morgenstern et al., “Predicting population health with machine learning: a scoping review,” BMJ Open, vol. 10, no. 10, p. e037860, 2020.
G. Biau and E. Scornet, “A Random Forest Guided Tour,” Nov. 2015, [Online]. Available: http://arxiv.org/abs/1511.05741
P.-H. Chen, C.-J. Lin, and B. Schölkopf, “A Tutorial on ν-Support Vector Machines.”
S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, “Learning k for kNN Classification,” ACM Trans. Intell. Syst. Technol., vol. 8, no. 3, Jan. 2017, doi: 10.1145/2990508.
N. Nasution, F. Feldiansyah, A. Zamsuri, and M. A. Hasan, “Synthetic Minority Oversampling Technique for Efforts to Improve Imbalanced Data in Classification of Lettuce Plant Diseases,” J. Teknol. DAN OPEN SOURCE, pp. 31–40, Feb. 2023, doi: 10.36378/jtos.v6i1.2883.
W. E. Pratiwi et al., “Classification of Orange Fruit Using Convolutional Neural Network, Support Vector Machine, K-Nearest Neighbor and Naive Bayes Methods Based on Color Analysis,” in 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE), 2023, pp. 484–488. doi: 10.1109/ICCoSITE57641.2023.10127775.
N. Nasution, F. B. Nasution, E. Erlin, and M. A. Hasan, “Evaluation Study of the Chi-Square Method for Feature Selection in Stroke Prediction with Random Forest Regression,” EAI, 2024. doi: 10.4108/eai.30-10-2023.2343096.

Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nurliana Nasution, Mhd Arief Hasan, Feldiansyah Bakri Nasution

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This is an open access journal which means that all content is freely available without charge to the user or his/her institution. The copyright in the text of individual articles (including research articles, opinion articles, and abstracts) is the property of their respective authors, subject to a Creative Commons CC-BY-SA licence granted to all others. ITJRD allows the author(s) to hold the copyright without restrictions and allows the author to retain publishing rights without restrictions.