Predicting Heart Disease Using Machine Learning: An Evaluation of Logistic Regression, Random Forest, SVM, and KNN Models on the UCI Heart Disease Dataset

Nurliana Nasution; Mhd Arief Hasan; Feldiansyah Bakri Nasution

doi:10.25299/itjrd.2025.17941

Authors

Nurliana Nasution Informatic Engineering Study Program, Universitas Lancang Kuning
Mhd Arief Hasan Informatic Engineering Study Program, Universitas Lancang Kuning
Feldiansyah Bakri Nasution Informatic Engineering Study Program, Universitas Lancang Kuning

DOI:

https://doi.org/10.25299/itjrd.2025.17941

Keywords:

Heart Disease Prediction, Machine Learning, Random Forest, SVM, UCI Dataset

Abstract

This study evaluates the performance of three machine learning models—Random Forest, Support Vector Machine (SVM), and Logistic Regression—in predicting heart disease using the "Heart Disease UCI" dataset from Kaggle. The models were assessed based on accuracy, precision, recall, and F1-score, both with and without feature selection techniques such as Chi-Square and Mutual Information.Without feature selection, Random Forest achieved the highest performance with an accuracy of 89.7%, followed by SVM with 87.0%, and Logistic Regression with 84.2%. Using Mutual Information for feature selection, Random Forest achieved an accuracy of 85.3%, SVM 87.0%, and Logistic Regression 82.6%. With Chi-Square feature selection, Random Forest and Logistic Regression both showed an accuracy of 83.2%, while SVM achieved 82.6%.The results indicate that Random Forest consistently performs well across different scenarios, making it a robust choice for heart disease prediction. Feature selection did not significantly enhance model performance, suggesting that the initial features in the dataset are already highly relevant. These findings highlight the potential of machine learning, especially Random Forest, in aiding clinical diagnosis of heart disease. Further research is needed to validate these models on larger, more diverse datasets and to explore advanced feature selection techniques for improved model performance.

Downloads

Download data is not yet available.

References

A. Tucker, Z. Wang, Y. Rotalinti, and P. Myles, “Generating high-fidelity synthetic patient data for assessing machine learning healthcare software,” NPJ Digit. Med., vol. 3, no. 1, pp. 1–13, 2020. DOI: https://doi.org/10.1038/s41746-020-00353-9

H. Habehh and S. Gohel, “Machine learning in healthcare,” Curr. Genomics, vol. 22, no. 4, p. 291, 2021. DOI: https://doi.org/10.2174/1389202922666210705124359

D. S. Char, M. D. Abràmoff, and C. Feudtner, “Identifying ethical considerations for machine learning healthcare applications,” Am. J. Bioeth., vol. 20, no. 11, pp. 7–17, 2020. DOI: https://doi.org/10.1080/15265161.2020.1819469

C. Shen, S. Panda, and J. T. Vogelstein, “The chi-square test of distance correlation,” J. Comput. Graph. Stat., vol. 31, no. 1, pp. 254–262, 2022. DOI: https://doi.org/10.1080/10618600.2021.1938585

N. S. Turhan, “Karl Pearson’s Chi-Square Tests.,” Educ. Res. Rev., vol. 16, no. 9, pp. 575–580, 2020. DOI: https://doi.org/10.5897/ERR2019.3817

Z. Wang, W. Chen, S. Gu, Y. Wang, and J. Wang, “Evaluation of trunk borer infestation duration using MOS E-nose combined with different feature extraction methods and GS-SVM,” Comput. Electron. Agric., vol. 170, no. December 2019, p. 105293, 2020, doi: 10.1016/j.compag.2020.105293. DOI: https://doi.org/10.1016/j.compag.2020.105293

M. Zhou, K. Yan, J. Huang, Z. Yang, X. Fu, and F. Zhao, “Mutual information-driven pan-sharpening,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1798–1808. DOI: https://doi.org/10.1109/CVPR52688.2022.00184

H. Zhou, X. Wang, and R. Zhu, “Feature selection based on mutual information with correlation coefficient,” Appl. Intell., vol. 52, no. 5, pp. 5457–5474, 2022. DOI: https://doi.org/10.1007/s10489-021-02524-x

J. D. Morgenstern et al., “Predicting population health with machine learning: a scoping review,” BMJ Open, vol. 10, no. 10, p. e037860, 2020. DOI: https://doi.org/10.1136/bmjopen-2020-037860

G. Biau and E. Scornet, “A Random Forest Guided Tour,” Nov. 2015, [Online]. Available: http://arxiv.org/abs/1511.05741

P.-H. Chen, C.-J. Lin, and B. Schölkopf, “A Tutorial on ν-Support Vector Machines.”

S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, “Learning k for kNN Classification,” ACM Trans. Intell. Syst. Technol., vol. 8, no. 3, Jan. 2017, doi: 10.1145/2990508. DOI: https://doi.org/10.1145/2990508

N. Nasution, F. Feldiansyah, A. Zamsuri, and M. A. Hasan, “Synthetic Minority Oversampling Technique for Efforts to Improve Imbalanced Data in Classification of Lettuce Plant Diseases,” J. Teknol. DAN OPEN SOURCE, pp. 31–40, Feb. 2023, doi: 10.36378/jtos.v6i1.2883. DOI: https://doi.org/10.36378/jtos.v6i1.2883

W. E. Pratiwi et al., “Classification of Orange Fruit Using Convolutional Neural Network, Support Vector Machine, K-Nearest Neighbor and Naive Bayes Methods Based on Color Analysis,” in 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE), 2023, pp. 484–488. doi: 10.1109/ICCoSITE57641.2023.10127775. DOI: https://doi.org/10.1109/ICCoSITE57641.2023.10127775

N. Nasution, F. B. Nasution, E. Erlin, and M. A. Hasan, “Evaluation Study of the Chi-Square Method for Feature Selection in Stroke Prediction with Random Forest Regression,” EAI, 2024. doi: 10.4108/eai.30-10-2023.2343096. DOI: https://doi.org/10.4108/eai.30-10-2023.2343096