Robust and interpretable machine learning models for cancer outcome prediction
Recently, high-throughput profiling techniques such as microarray and next generation sequencing have revolutionized biological science and enabled cancer research at an unprecedented scale. Abundance of gene expression profiling of tumor tissues of different cancer types have advanced new research frontiers of preventive, personalized and precision healthcare. The research field of personalized and precision cancer outcome prediction is very crucial for early diagnosis of cancer and reduce deaths in cancer patients. In this dissertation, we address three key problems in cancer outcome prediction: (1) to identify the utility of network-based features, (2) to improve the interpretability of machine learning models in cancer prediction to detect bias/undesirable behavior, and (3) to develop personalized models for cancer patients to tackle the heterogeneity nature of cancer.
First, we compared both network features and edge features against gene-based features. Indeed, our results show that both the network and edge features resulted in better prediction accuracy and more robust biomarkers. Second, we propose interpretable models trained on a small number of gene clusters which obtained similar prediction accuracy compared to the gene expression of the whole genome. We also propose a post-hoc interpretation method for model interpretation and a metric named Inter-Classifier Stability (ICS) to evaluate model interpretation methods. Our results indicate that our proposed method has better inter-classifier stability compared to the state-of-the-art interpretation methods. Third, we propose a method to construct personalized models for each individual patient by utilizing the patients that are most similar or most dissimilar with the patient being tested. Results show that this approach improves the prediction accuracy compared to the models trained on the whole training dataset. In addition, personalized models obtained better prediction accuracy than the models trained on separate subtypes. Overall, this dissertation proposes several novel methods that can not only improve the prediction of cancer outcomes but also enhance the mechanistic understanding of cancer development and progression.