Deep Learning Models for Predicting Genomic and Transcriptomic Functional Sites

Salekin, Sirajul
Journal Title
Journal ISSN
Volume Title

While the biological science is entering into an era of “big data” technology owing to the recent advancements in high-throughput genomic and biomedical data, the concurrent innovations of deep neural networks are doing wonders in solving the problems imposed by the rapidly increasing biological data. This study delves into these two developments specially focusing on the application of deep learning (supervised, unsupervised and generative models) to genomics and transcriptomics sequences. The first portion of this research discusses a supervised deep learning model that has been developed to predict transcription factor (TF) binding location at single nucleotide resolution de novo from DNA sequence. The model adopts a novel deconvolutional neural network (deconvNet) and is inspired by the similarity to image segmentation by deconvNet. The deconvNet architecture, known as DeepSNR, was trained using TF specific data from ChIP-exonuclease (ChIP-exo) experiments and has been shown to outperform motif search based methods for several evaluation metrics. Although, in principle, deep learning models can serve as valuable tools for analyzing genomic data, in many genomic applications, especially those focusing on the prediction of functional sites using genomic sequence, we are often faced with the challenging issue of extremely low amount of labeled data. One viable solution to this problem is the employment of unsupervised representation-learning algorithms to discover a set of features or latent variables whose variations can capture the underlying data-generating distribution. Hence, the second half of this study introduces our solution to extract features from pre-mRNA sequence in an unsupervised manner via state-of-the-art deep generative model called Generative Adversarial Network (GAN). Specifically, an adversarial network with a large compendium of pre-mRNA sequences of whole human genome was trained to learn hidden representation. This representation help classifiers trained using only a few labeled examples to generalize to unseen parts of the data distribution. The results demonstrate that features extracted using the unsupervised learning framework classifies between different types of RNA modifications with high precision even if the data size is small and extremely imbalanced.

This item is available only to currently enrolled UTSA students, faculty or staff.
Deconvolutional network, Deep learning, Generative adversarial network, Genomics functional sites, Predicting RNA modifications, Unsupervised feature learning
Electrical and Computer Engineering