Abstract
The development of Coronary Artery Disease (CAD), one of the most prevalent diseases in the world, is heavily influenced by several modifiable risk factors. Predictive models built using machine learning (ML) algorithms may assist healthcare practitioners in timely detection of CAD, and ultimately, may improve outcomes. In this study, we have applied six different ML algorithms to predict the presence of CAD amongst patients listed in an openly available dataset provided by the University of California Irvine (UCI) Machine Learning Repository, named “the Cleveland dataset.” All six ML algorithms achieved accuracies greater than 80%, with the “Neural Network” algorithm achieving accuracy greater than 93%. The recall achieved with the “Neural Network” model is also highest of the six models (0.93). Additionally, five of the six algorithms resulted in very similar AUC-ROC curves. The AUC-ROC curve corresponding to the “Neural Network” algorithm is slightly steeper implying higher “true positive percentage” achieved with this model. We also extracted the variables of importance in the “Neural Network” model to help in the risk assessment. We have released the full computer code generated in this study in the public domain as a preliminary effort toward developing an open solution for predicting the presence of coronary artery disease in a given population and present a workflow model for implementing a possible solution.