An investigation of Bayesian inference in bioinformatic signal processing
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation discusses the investigation of the use of Bayesian inference in the analysis of bioinformatic data, specifically gene clustering using expression profile data, and mitigation of small-signal suppression in liquid chromatography/mass spectrometry profiles of peptide data.
A method for gene clustering from expression profiles using shape information is presented. Conventional clustering approaches such as K-means assume that genes with similar functions have similar expression levels and hence allocate genes with similar expression levels into the same cluster. However, genes with similar function often exhibit similarity in signal shape even though the expression magnitude can be far apart. Therefore, this investigation studies clustering according to signal shape similarity. This shape information is captured in the form of normalized and time-scaled forward first differences, which then are subject to a variational Bayes clustering plus a non-Bayesian (Silhouette) cluster statistic. The statistic shows an improved ability to identify the correct number of clusters and assign the components of cluster. Based on initial results for both generated test data and E. coli microarray expression data and initial validation of the E. coli results, it is shown the method has promise in being able to better cluster time-series microarray data according to shape similarity.
Modeling and characterization of mass spectrometry data from Fourier Transform Mass Spectrometry coupled to an Electrospray Ionization liquid chromatography column (ESI-FTMS) is presented in terms of necessary steps to correct data exhibiting small-signal suppression. This data characterization is preparatory to use in improving peak picking algorithms. Current modeling work involving correction of suppressed data will be presented with initial results. Although the method is not fully successful at this point, it is shown there is definite promise in being able to use post-measurement correction to prepare the data for later identification analysis.