Data-driven Modeling of Biological Systems Using Network Based Methods
With data-driven modeling gaining precision, we can use these methods to prevent the top two causes of death in the United States: hearth disease and cancer. First study Macrophage cells, which we created a dynamic Boolean network and verify the model's performance across published literature. We propose the final stage of the Boolean networks as a possible phenotype for Macrophage subtypes. We then come across a cancer database, the Cancer Genome Atlas (TCGA), and decided to perform a full analysis using Graph Convolution Neural Networks (GCNN). The mRNA expression data in TCGA dataset can be embedded in a graph where the GCNN can take place. We analyzed the TCGA dataset building a GCNN that would classify the 33 different cancer types along with a normal tissue class that is non-cancerous from all tissue types. The GCNN could classify the cancer types to above 94% accuracy with the goal for the model to learn cancer specific features that we investigate with a knockdown analysis approach where we identify 428 potential biomarkers. Next step is to perform survival analysis using the GCNN to identify important features for cancer survival. Using the loss function for Cox-PH we develop GCN_SURV to predict survival rate for 13 cancer types, compare the model against other results, and interpret significant gene markers for breast cancer survival. We used HOTNET2 to identify modules in our graph for potential cancer markers matching with 230 well known cancer genes. This research is essential in the development of new medical techniques.