Statistical guarantees for multiclass leukemia machine learning classifiers using conformal risk control
ML/AI session
monday
Abstract
Quantifying the uncertainty of diagnostic machine learning models has become a key concern in precision oncology research. As the reliability of such models has increasingly come under scrutiny, conformal prediction has emerged as a distribution-agnostic framework for statistical guarantees. In this study, conformal prediction was applied to three different machine learning models that use patient gene expression signatures to predict acute lymphoblastic leukemia (ALL) subtypes. Using predictions generated by these models from 1147 patient samples taken at diagnosis, a conformal predictor, ALLCoP, was calibrated and validated using conformal risk control, a technique adapted to support multi-class predictions. ALLCoP was further used to create statistically guaranteed prediction sets for 111 ALL samples whose subtype was unknown at diagnosis. Finally, ALLCoP was developed into conformist, a generalizable, open-source Python framework allowing conformal risk control to be implemented on top of existing classifiers both from the medical domain and beyond.