Comparison of Regression Methods to Identify Differential Expression in RNA-Sequencing Count Data from the Serial Analysis of Gene Expression

Date

2019

Authors

Arreola, Ivan
Han, David

Journal Title

Journal ISSN

Volume Title

Publisher

Office of the Vice President for Research

Abstract

Comparative RNA-sequencing analysis for the Serial Analysis of Gene Expression (SAGE) can help identify changes in gene expression which are characteristic to human diseases. Since the RNA-sequencing experiment measures gene expressions in the form of counts, usually with a large degree of skewness, the analysis methods based on continuous probability distributions are generally inappropriate for modeling this type of data. Currently, the parametric regression techniques for solving this problem are based on the well-known discrete probability distributions such as Poisson and negative binomial. In order to overcome this modeling challenge with higher flexibilities to account for a wide range of dispersion levels, here we introduce an alternative Generalized Linear Model (GLM) based on the Conway-Maxwell-Poisson distribution, also known as COM-Poisson or CMP distribution. The CMP regression model generalizes the standard Poisson and negative binomial regressions, and it is suitable for fitting count data with varying degrees of over- and under-dispersions. Using simulated and real SAGE datasets, the performance of the proposed method is assessed in comparison to the Poisson- and negative binomial-based regression models.

Description

Keywords

Conway-Maxwell-Poisson Regression, Generalized Linear Models, RNA-Sequencing, Serial Analysis of Gene Expression

Citation

Department

Management Science and Statistics