Application of the Cox Proportional Hazards Model for the Quantitative Analysis of LC-MS Proteomics Data
Along with quantitative, analytical genomics, proteomics continues to be a growing field for determining the gene and cellular functions at the protein level. As the liquid chromatography mass spectrometryphy (LC-MS) experiments produce protein peak intensities data, statistical and computational techniques are required to conduct quantitative analytical proteomics. The LC-MS proteomics data often have large quantities of missing peak intensities due to censoring of the low-abundance spectral features. Because of this, the observed peak intensities from the LC-MS method are all positive, skewed, and often left-censored. The classical survival analysis methods are ideal to detect differentially expressed proteins among different groups. These methods include the non-parametric rank sum (RS) tests such as the Kolmogorov-Smirnov (KS) and Wilcoxon-Mann-Whitney (WMW) tests, parametric surivival models such as the accelerated failure time (AFT) model with popular lifetime distributions; log-normal (LN), log-logistic (LL), and Weibull (W) for modeling the peak intensity data. As an alternative approach, here we propose the Cox proportional hazards (PH) method, a popular semi-parametric model for modeling survival data. The proposed regression-based method allows for leniency on the hazard function by alleviating the requirements of distribution-specific hazard functions. With the hopes of gaining more insightful biological information for cellular functions at the protein level, the statistical properties of each method are investigated through a simulation study and an application to the Type I diabetes dataset.