Prediction and Analysis of Primary and Metastatic Tumors with Deep Learning
More than 200 different cancer types, each with different genetic mechanisms, are discovered through clinical tests. The lack of well-annotated samples as well as error prone tendency of biological experiments have hampered biologists to better understand and identify these rare cancer types. In this dissertation, we propose a wide variety of deep learning models to predicate known and unknown Cancer types as well as their underlying biological function. CancerSiamese, which is one of our main models, receives pairs of gene expressions to learn representation of similar or dissimilar cancer types through two parallel Convolutional Neural Networks joined by a similarity function. Network transfer learning is utilized to ease the training of CancerSiamese across primary and metastatic tumors. This model yields around 8% and 4% accuracy improvement over similar model when it is tested for 10 unknown cancer types for primary and metastatic tumors, respectively. Moreover, a side-by-side investigation of common primary and metastatic tumors through interpretation of CancerSiamese enabled us to gain a better understanding between gene markers that distinguish primary and metastatic tumors. Specifically, we harnessed guided gradient saliency maps to discover attention of CancerSiamese over similar cancer types. In spite of highly similar gene expression structure between primary and metastatic tumors for each cancer type, we are able to separate unknown primary and metastatic tumors with only top 10 significant gene markers extracted by CancerSiamese. This a remarkable achievement as it provides genetic mechanism of primary and metastatic tumors regardless of their cancer type or colonized tissue.