The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis.
Authors
Sims, Andrew HSmethurst, Graeme J
Hey, Yvonne
Okoniewski, Michal J
Pepper, Stuart D
Howell, Anthony
Miller, Crispin J
Clarke, Robert B
Affiliation
Applied Bioinformatics of Cancer Research Group, Breakthrough Research Unit, Edinburgh Cancer Research Centre, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XR, UK. andrew.sims@ed.ac.ukIssue Date
2008
Metadata
Show full item recordAbstract
BACKGROUND: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. RESULTS: A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. CONCLUSION: Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power.Citation
The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis. 2008, 1:42 BMC Med GenomicsJournal
BMC Medical GenomicsDOI
10.1186/1755-8794-1-42PubMed ID
18803878Type
ArticleLanguage
enISSN
1755-8794ae974a485f413a2113503eed53cd6c53
10.1186/1755-8794-1-42
Scopus Count
Collections
Related articles
- Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis.
- Authors: Turnbull AK, Kitchen RR, Larionov AA, Renshaw L, Dixon JM, Sims AH
- Issue date: 2012 Aug 21
- Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments.
- Authors: Kitchen RR, Sabine VS, Simen AA, Dixon JM, Bartlett JM, Sims AH
- Issue date: 2011 Dec 1
- Consensus and Meta-analysis regulatory networks for combining multiple microarray gene expression datasets.
- Authors: Steele E, Tucker A
- Issue date: 2008 Dec
- A multi-platform normalization method for meta-analysis of gene expression data.
- Authors: Tihagam RD, Bhatnagar S
- Issue date: 2023 Sep
- Cross-platform comparison and visualisation of gene expression data using co-inertia analysis.
- Authors: Culhane AC, Perrière G, Higgins DG
- Issue date: 2003 Nov 21