Computational prediction of genome-wide microRNA targets and functions
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
MiRNA is a 19 to 25 nucleotides long non-coding RNA that has been discovered to repress transcription and/or protein translation of hundreds of genes by binding to the complementary sites in the 3' Untranslated Region (UTR) of target genes (Bartel 2004; Yue et al. 2009; Yue et al. 2012). MiRNAs are shown to play important roles in many biological processes including cell development, stress responses and viral infection (Grey et al. 2008). Predicting the miRNA targets, understanding the functions and regulatory mechanisms of miRNA is one of the most active areas of research; such understanding will help us to identify new therapeutic targets for effective treatment of various diseases (Alvarez-Garcia and Miska 2005; Lu et al. 2008).
Identifying targeting genes that miRNAs regulate is important first step for understanding miRNA's specific biological functions. First of all, a two-stages SVM based algorithm, SVMicrO (Liu et al. 2008), was proposed for target prediction based on sequence information. A large amount of positive and negative targets are carefully derived from the most up-to-date literatures to build the training and evaluating dataset. Based on the known statistical characteristics as well as our own understanding of miRNA:Target interactions, 113 and 30 novel features are extracted for constructing Site-SVM and UTR-SVM respectively. mRMR (minimum Redundancy Maximum Relevance) (Ding and Peng 2005) and SFS (sequential forward search) (Peng et al. 2005) are used for feature selection. Sample weight and class weight are introduced into SVMicrO to deal with the imbalanced dataset. To validate the performance, SVMicrO are evaluated based on the results of high confidence target identification experiments and compared with several other popular algorithms. The results show that the SVMicrO can produce better prediction performance.
Secondly, we considered the integration of microarray data with sequence binding information for target prediction. Particularly, a logistic regression model is first used to map SVMicrO prediction result to the probability space and then a Gaussian Mixture Model, whose parameters are estimated by VBEM algorithm (Bernardo et al. 2003; Beal 2003), is constructed to model gene expression profiling data. The evaluation results indicate that the proposed algorithm, that integrates two types of information, outperforms sequence-based prediction and prediction based expression data alone.
Thirdly, a Bayesian decision fusion approach was developed for miRNA target prediction (Yue et al. 2010). Since different existing algorithms rely on different features and classifiers, there is a poor agreement between the results of different algorithms. To benefit from the advantages of different algorithms, we proposed an algorithm called BCmicrO that combines the prediction of different algorithms with Bayesian Network. BCmicrO was evaluated in training data and tested by the proteomic data. The results show that BCmicrO improves both the sensitive and the specificity of each individual algorithm.
In the end, to understand the functions of miRNAs, we proposed a SVM based algorithm - PathMicrO that elucidates the miRNA function by predicting the miRNA regulated pathways (Yue, Chen, Gao, and Huang 2010). PathMicrO combines the sequence-level target predictions with the gene expression profiling from the miRNA transfection experiments. The performance of PathMicrO is evaluated with cross-validation using a careful constructed training data, two independent testing data and two miRNAs with known functions. PathMicrO is compared with another popular miRNA function prediction algorithm - SigTerms (Creighton et al. 2008). PathMicrO attains 31% more Area under the curve (AUC) of ROC curve on the training data. On two independent testing data, PathmicrO's ROC increase 32.54% and 40.72%. When PathMicrO is tested with known functions, PathMicrO predicts 200% and 66% more known functions in two miRNAs on the top 40 predictions compared to SigTerms.