JURSW Volume 5

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12588/71

Browse

Now showing 1 - 2 of 2

Application of the Cox Proportional Hazards Model for the Quantitative Analysis of LC-MS Proteomics Data
(Office of the Vice President for Research, 2019) Arreola, Ivan; Han, David
Along with quantitative, analytical genomics, proteomics continues to be a growing field for determining the gene and cellular functions at the protein level. As the liquid chromatography mass spectrometryphy (LC-MS) experiments produce protein peak intensities data, statistical and computational techniques are required to conduct quantitative analytical proteomics. The LC-MS proteomics data often have large quantities of missing peak intensities due to censoring of the low-abundance spectral features. Because of this, the observed peak intensities from the LC-MS method are all positive, skewed, and often left-censored. The classical survival analysis methods are ideal to detect differentially expressed proteins among different groups. These methods include the non-parametric rank sum (RS) tests such as the Kolmogorov-Smirnov (KS) and Wilcoxon-Mann-Whitney (WMW) tests, parametric surivival models such as the accelerated failure time (AFT) model with popular lifetime distributions; log-normal (LN), log-logistic (LL), and Weibull (W) for modeling the peak intensity data. As an alternative approach, here we propose the Cox proportional hazards (PH) method, a popular semi-parametric model for modeling survival data. The proposed regression-based method allows for leniency on the hazard function by alleviating the requirements of distribution-specific hazard functions. With the hopes of gaining more insightful biological information for cellular functions at the protein level, the statistical properties of each method are investigated through a simulation study and an application to the Type I diabetes dataset.
Comparison of Regression Methods to Identify Differential Expression in RNA-Sequencing Count Data from the Serial Analysis of Gene Expression
(Office of the Vice President for Research, 2019) Arreola, Ivan; Han, David
Comparative RNA-sequencing analysis for the Serial Analysis of Gene Expression (SAGE) can help identify changes in gene expression which are characteristic to human diseases. Since the RNA-sequencing experiment measures gene expressions in the form of counts, usually with a large degree of skewness, the analysis methods based on continuous probability distributions are generally inappropriate for modeling this type of data. Currently, the parametric regression techniques for solving this problem are based on the well-known discrete probability distributions such as Poisson and negative binomial. In order to overcome this modeling challenge with higher flexibilities to account for a wide range of dispersion levels, here we introduce an alternative Generalized Linear Model (GLM) based on the Conway-Maxwell-Poisson distribution, also known as COM-Poisson or CMP distribution. The CMP regression model generalizes the standard Poisson and negative binomial regressions, and it is suitable for fitting count data with varying degrees of over- and under-dispersions. Using simulated and real SAGE datasets, the performance of the proposed method is assessed in comparison to the Poisson- and negative binomial-based regression models.

Browse

Browsing JURSW Volume 5 by Author "Arreola, Ivan"