Abstract Macrovascular complications are leading causes of morbidity and mortality in patients with type 2 diabetes mellitus (T2DM), yet ear
Abstract Macrovascular complications are leading causes of morbidity and mortality in patients with type 2 diabetes mellitus (T2DM), yet early diagnosis of cardiovascular disease (CVD) in this population remains clinically challenging. This study aims to develop a machine learning model that can accurately predict diabetic macroangiopathy in Chinese patients. A retrospective cross-sectional analytical study was conducted on 1566 hospitalized patients with T2DM. Feature selection was performed using recursive feature elimination (RFE) within the mlr3 framework. Model performance was benchmarked using 29 machine learning (ML) models, with the ranger model selected for its superior performance. Hyperparameters were optimized through grid search and 5-fold cross-validation. Model interpretability was enhanced using SHAP values and PDPs. An external validation set of 106 patients was used to test the model. Key predictive variables identified included the duration of T2DM, age, fibrinogen, and serum urea nitrogen. The predictive model for macroangiopathy was established and showed good discrimination performance with an accuracy of 0.716 and an AUC of 0.777 in the training set. Validation on the external dataset confirmed its robustness with an AUC of 0.745. This study establish an approach based on machine learning algorithm in features selection and the development of prediction tools for diabetic macroangiopathy.