A randomized steiner tree approach for biomarker discovery and classification of breast cancer metastasis
Motivation: DNA microarray has become an important tool to help identify biomarker genes for improving the prognosis of metastatic breast cancer - a leading cause of cancer-related deaths in women worldwide. Recently, pathway-level relationships between genes have been increasingly used to build more robust classification models which also can provide useful biological insights. Due to the unavailability of complete pathways, protein-protein interaction (PPI) network is becoming more popular to researcher and opens a way to investigate the developmental process of breast cancer. Here, a network-based method is proposed to combine microarray gene expression profiles and PPI network for biomarker discovery for breast cancer metastasis. The key idea is to identify a small number of genes to connect differentially expressed genes into a single component in a PPI network; these intermediate genes contain important information about the pathways involved in metastasis and have a high probability of being biomarkers.
Results: We applied this approach on three breast cancer microarray datasets, and identified significant numbers of well-known biomarker genes for breast cancer metastasis. Those selected genes are enriched with biological processes and pathways related to carcinogenic process, and, importantly, have higher stability across different datasets than in previous studies. Furthermore, those genes significantly increased cross-data classification accuracy in breast cancer metastasis. The randomized Steiner tree based approach described here is a new way to discover biomarker genes for breast cancer, and improves the prediction accuracy of metastasis. The analysis is limited here only to breast cancer, but can be easily applied to other diseases.