Novel Algorithms and Metrics for Functional Discoveries from Networked Biological Data
The advances in high-throughput assays and sequencing technologies have enabled biologists to produce large-scale "omic" data at an unprecedented rate. These data are not isolated or independent, but rather represent data points from an intrinsically well networked system. And these rampant networked data grow so big and fast that the methods used to analyze them become critical. On one hand, the methods play a very important role in sifting true signals from noise, generating hypotheses to test and discovering the underlying mechanisms of diseases. On the other hand, applying methods improperly could lead to misleading conclusions and missed opportunities to reduce medical and financial losses. In this dissertation, I focus on developing network-based approaches (1) to assist in interpreting the large scale biological data at a systems-level and (2) to identify statistically significant and meaningful genes, loci and network components. This project is comprised of three parts. First, I attempt to combine gene co-expression network analysis and differential expression analysis to identify the core gene sets and subnetworks from large-scale gene expression microarray data. In the second part, I propose two complimentary methods to associate experimentally identified genes to functional pathways within the context of protein-protein interaction networks, which help interpret the genes with their functional meanings. The last part is developing a novel algorithm and metrics to identify differentially interacting chromosome regions by constructing and comparing networks. Our results from both statistical evaluations and collaborative work with biologists strongly support that these methods will be valuable tools to help biologists make new discoveries from otherwise complicated and potentially chaotic biological data.