Improving Parkinson’s Diagnosis: A Stacked Ensemble Learning with Vocal Biomarkers and Data Balancing
DOI:
https://doi.org/10.25299/itjrd.2025.22310Keywords:
Parkinson’s Disease, Ensemble Learning, Vocal Biomarkers, Stacking Classifier, Machine LearningAbstract
Parkinson’s disease (PD) presents diagnostic challenges due to subtle early symptoms and overlap with other movement disorders. This study proposes a stacked ensemble learning approach for early PD detection using vocal biomarkers. A Kaggle dataset with 195 voice recordings from 31 individuals (23 with PD) was used to train a model combining CatBoostClassifier and RandomForestClassifier as base learners, with Logistic Regression as the meta-learner. Class imbalance was addressed using RandomOverSampler, and 20-fold stratified cross-validation ensured robust performance evaluation. Key vocal features such as jitter, shimmer, pitch period entropy (PPE), spread1, and spread2 were extracted to distinguish PD patients from healthy controls. The model achieved 100% classification accuracy, a perfect ROC AUC of 1.00, and a low average Brier Score of 0.0071, reflecting excellent probability calibration. SHAP analysis identified spread2, spread1, and PPE as the most influential features, reinforcing pitch instability as a key PD biomarker. The classifier produced no false positives and few false negatives, indicating high reliability. To evaluate generalizability, the model was tested on the Parkinson’s Disease Smartwatch (PADS) dataset, which includes 469 participants. It maintained strong performance, supporting its potential as a non- invasive, voice-based screening tool for early PD diagnosis, particularly in telemedicine.
Downloads
References
[1] S. Ramesh and A. S. P. M. Arachchige, “Depletion of dopamine in Parkinson’s disease and relevant therapeutic options: A review of the literature,” 2023, AIMS Press. doi: 10.3934/NEUROSCIENCE.2023017.
[2] J.-H. Lee, “Understanding Parkinson’s Disorders: Classification and Evaluation Methods, Movement Disorders, and Treatment Methods,” International Journal of Advanced Culture Technology, vol. 11, no. 3, pp. 9–17, 2023, doi: 10.17703/IJACT.2023.11.3.9.
[3] W. Wang, J. Lee, F. Harrou, and Y. Sun, “Early Detection of Parkinson’s Disease Using Deep Learning and Machine Learning,” IEEE Access, vol. 8, pp. 147635–147646, 2020, doi: 10.1109/ACCESS.2020.3016062.
[4] W. S. Lim et al., “An integrated biometric voice and facial features for early detection of Parkinson’s disease,” NPJ Parkinsons Dis, vol. 8, no. 1, Dec. 2022, doi: 10.1038/s41531-022-00414-8.
[5] I. Naeem, A. Ditta, T. Mazhar, M. Anwar, M. M. Saeed, and H. Hamam, “Voice biomarkers as prognostic indicators for Parkinson’s disease using machine learning techniques,” Sci Rep, vol. 15, no. 1, p. 12129, Apr. 2025, doi: 10.1038/s41598-025-96950-3.
[6] A. Iyer et al., “A machine learning method to process voice samples for identification of Parkinson’s disease,” Sci Rep, vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-47568-w.
[7] rania khaskhoussy and Y. Ben Ayed, “An I-vector-based approach for discriminating between patients with Parkinson’s disease and healthy people,” SPIE-Intl Soc Optical Eng, Mar. 2022, p. 34. doi: 10.1117/12.2623240.
[8] L.-C. Chang et al., “Machine learning approaches to identify Parkinson’s disease using voice signal features.”
[9] M. S. Mirian, R. B. Z Q Zhang, S. Chen, X. Chen, G.-Z. Yang, and Y. G-z, “Detection and assessment of Parkinson’s disease based on gait analysis: A survey.”
[10] M. A. Islam, M. Z. Hasan Majumder, M. A. Hussein, K. M. Hossain, and M. S. Miah, “A review of machine learning and deep learning algorithms for Parkinson’s disease detection using handwriting and voice datasets,” Heliyon, vol. 10, no. 3, Feb. 2024, doi: 10.1016/j.heliyon.2024.e25469.
[11] L. Yuan, Yao Liu, and Hsuan-Ming Feng, “Parkinson disease prediction using machine learning-based features from speech signal,” Service Oriented Computing and Applications, vol. 18, no. 1, pp. 101–107, 2024.
[12] U. Kumar, D. S. Baskaran, D. D. Sumathi, and P. G. Scholar, “Implementing a Model to Detect Parkinson Disease using Machine Learning Classifiers,” J Algebr Stat, vol. 13, no. 1, pp. 99–110, 2022, [Online]. Available: https://publishoa.com
[13] O. Escamilla-Luna, M. A. Wister, and J. Hernandez-Torruco, “Machine Learning Algorithms for Classification Patients with Parkinson’s Disease and Hereditary Ataxias,” Journal of Communications Software and Systems, vol. 19, no. 1, pp. 9–18, 2023, doi: 10.24138/jcomss-2022-0157.
[14] A. Govindu and S. Palwe, “Early detection of Parkinson’s disease using machine learning,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 249–261. doi: 10.1016/j.procs.2023.01.007.
[15] A. M. Elshewey, M. Y. Shams, N. El-Rashidy, A. M. Elhady, S. M. Shohieb, and Z. Tarek, “Bayesian Optimization with Support Vector Machine Model for Parkinson Disease Classification,” Sensors, vol. 23, no. 4, Feb. 2023, doi: 10.3390/s23042085.
[16] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features.” [Online]. Available: https://github.com/catboost/catboost
[17] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.11363
[18] K. M. Ting and I. H. Witten, “Issues in Stacked Generalization,” 1999.
[19] J. Varghese, A. Brenner, M. Fujarski, C. M. van Alen, L. Plagwitz, and T. Warnecke, “Machine Learning in the Parkinson’s disease smartwatch (PADS) dataset,” NPJ Parkinsons Dis, vol. 10, no. 1, Dec. 2024, doi: 10.1038/s41531-023-00625-7.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Oluwasegun Abiodun Abioye, Abraham Evwiekpaefe, Philip Odion, Olalekan Joel Awujoola

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This is an open access journal which means that all content is freely available without charge to the user or his/her institution. The copyright in the text of individual articles (including research articles, opinion articles, and abstracts) is the property of their respective authors, subject to a Creative Commons CC-BY-SA licence granted to all others. ITJRD allows the author(s) to hold the copyright without restrictions and allows the author to retain publishing rights without restrictions.












