Intra- and inter- platform renormalization and analysis of microarray data from the NCBI GEO database

dc.contributor.authorRobbins, Kay A.
dc.contributor.authorBurkhardt, Cory
dc.description.abstractBackground: The availability of large repositories for microarray data such as NCBI GEO makes analysis across the entire set of available experiments an attractive prospect. However, large scale comparison of expression levels from microarrays is hindered by quality control, technology differences, and a lack of standardized inter- and intra- platform normalization procedures. This study proposes a simple, platform-wide, normalization scheme based on sample cumulative distributions and analyzes the implications of this renormalization for downstream analysis. We also show that platform-wide binormalization after renormalization is an effective method for removing intrinsic background correlation. Results: We downloaded all available samples for 18 most frequently used oligonucleotide platforms from NCBI GEO and calculated the large-scale statistical properties of the samples. A total of 57,933 unique microarray samples from 2,715 series were included in the study. By careful examination of the over-plotted sample cumulative distribution functions (CDFs), we were able to empirically distinguish the normalization properties of the samples without reference to sample meta-data. We found specific CDF signatures for outliers, log-transformed expression, raw expression, and values reported as ratios. We used these characteristics to automatically convert raw expression to log-transformed expression and to eliminate all other types of samples. The renormalization resulted in test corpus of 45,847 unique samples from 2,423 series. As an illustration of the usefulness of a platform-wide context for analysis, we examined several methods for computing sample correlation. We showed that platform-wide binormalization preserved expected correlation relations in the MicroArray Quality Control (MAQC) titration data, which contains replicates of titrated mixtures over multiple sites. The results suggest that platform-wide binormalization is a useful method of removing intrinsic background correlation without destroying actual data relationships. We also examined how normalization affected the performance of various metrics for differential expression in a cross-platform study of breast cancer. Conclusions: The empirical CDF renormalization approach proposed in this paper allows microarray samples to be renormalized on a platform-wide scale, providing a context for examining sample characteristics such as differential expression and correlation. Researchers can use simple CDF signatures to detect outliers and determine the sample’s normalization type. An additional platform-wide binormalization step improves correlation relationships, making it a useful pre-processing step for subsequent clustering or classification. This work demonstrates the feasibility of large-scale, automatic intra- and inter- platform renormalization in heterogeneous collections of microarrays.
dc.description.departmentComputer Science
dc.description.sponsorshipThis work was supported by NIH Research Centers in Minority Institutions 2G12RR1364-06A1 and NIH-NSF CRCNS EIA-0217884.
dc.publisherUTSA Department of Computer Science
dc.relation.ispartofseriesTechnical Report; CS-TR-2007-007
dc.titleIntra- and inter- platform renormalization and analysis of microarray data from the NCBI GEO database
dc.typeTechnical Report


Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
3.7 MB
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
1.86 KB
Item-specific license agreed upon to submission