Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets

Jayakumar Kaliappan; I. J. Saravana Kumar; S. Sundaravelan; T. Anesh; R. R. Rithik; Yashbir Singh; Diana V. Vera-Garcia; Yassine Himeur; Wathiq Mansoor; Shadi Atalla; Kathiravan Srinivasan

Journal ArticleOPEN ACCESS

Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets

Frontiers in Artificial Intelligence (2024) 7

DOI: 10.3389/frai.2024.1421751

0Citations

1Readers

Get full text

Abstract

Introduction: In the evolving landscape of healthcare and medicine, the merging of extensive medical datasets with the powerful capabilities of machine learning (ML) models presents a significant opportunity for transforming diagnostics, treatments, and patient care. Methods: This research paper delves into the realm of data-driven healthcare, placing a special focus on identifying the most effective ML models for diabetes prediction and uncovering the critical features that aid in this prediction. The prediction performance is analyzed using a variety of ML models, such as Random Forest (RF), XG Boost (XGB), Linear Regression (LR), Gradient Boosting (GB), and Support VectorMachine (SVM), across numerousmedical datasets. The study of feature importance is conducted using methods including Filter-based, Wrapper-based techniques, and Explainable Artificial Intelligence (Explainable AI). By utilizing Explainable AI techniques, specifically Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), the decision-making process of the models is ensured to be transparent, thereby bolstering trust in AI-driven decisions. Results: Features identified by RF in Wrapper-based techniques and the Chi-square in Filter-based techniques have been shown to enhance prediction performance. A notable precision and recall values, reaching up to 0.9 is achieved in predicting diabetes. Discussion: Both approaches are found to assign considerable importance to features like age, family history of diabetes, polyuria, polydipsia, and high blood pressure, which are strongly associated with diabetes. In this age of data-driven healthcare, the research presented here aspires to substantially improve healthcare outcomes.

Author supplied keywords

Cite

CITATION STYLE

APA

Kaliappan, J., Saravana Kumar, I. J., Sundaravelan, S., Anesh, T., Rithik, R. R., Singh, Y., … Srinivasan, K. (2024). Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets. Frontiers in Artificial Intelligence, 7. https://doi.org/10.3389/frai.2024.1421751

Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions