The group collaborates with clinicians and basic scientists on the analysis of bioinformatics data.

Recent Submissions

  • Methods comparison for high-resolution transcriptional analysis of archival material on Affymetrix Plus 2.0 and Exon 1.0 microarrays.

    Linton, Kim M; Hey, Yvonne; Dibben, Sian; Miller, Crispin J; Freemont, Anthony J; Radford, John A; Pepper, Stuart D; Cancer Research UK Department of Medical Oncology, The Christie NHS Foundation Trust, Manchester, UK. (2009-07)
    Microarray gene expression profiling of formalin-fixed paraffin-embedded (FFPE) tissues is a new and evolving technique. This report compares transcript detection rates on Affymetrix U133 Plus 2.0 and Human Exon 1.0 ST GeneChips across several RNA extraction and target labeling protocols, using routinely collected archival FFPE samples. All RNA extraction protocols tested (Ambion-Optimum, Ambion-RecoverAll, and Qiagen-RNeasy FFPE) provided extracts suitable for microarray hybridization. Compared with Affymetrix One-Cycle labeled extracts, NuGEN system protocols utilizing oligo(dT) and random hexamer primers, and cDNA target preparations instead of cRNA, achieved percent present rates up to 55% on Plus 2.0 arrays. Based on two paired-sample analyses, at 90% specificity this equalled an average 30 percentage-point increase (from 50% to 80%) in FFPE transcript sensitivity relative to fresh frozen tissues, which we have assumed to have 100% sensitivity and specificity. The high content of Exon arrays, with multiple probe sets per exon, improved FFPE sensitivity to 92% at 96% specificity, corresponding to an absolute increase of ~600 genes over Plus 2.0 arrays. While larger series are needed to confirm high correspondence between fresh-frozen and FFPE expression patterns, these data suggest that both Plus 2.0 and Exon arrays are suitable platforms for FFPE microarray expression analyses.
  • rHVDM: an R package to predict the activity and targets of a transcription factor.

    Barenco, M; Papouli, E; Shah, S; Brewer, D; Miller, Crispin J; Hubank, M; Institute of Child Health, University College London, 30 Guilford street, London WC1N 1EH, UK. (2009-02-01)
    SUMMARY: Highly parallel genomic platforms like microarrays often present researchers with long lists of differentially expressed genes but contain little or no information on how these genes are regulated. rHVDM is a novel R package which uses gene expression time course data to predict the activity and targets of a transcription factor. In the first step, rHVDM uses a small number of known targets to derive the activity profile of a given transcription factor. Then, in a subsequent step, this activity profile is used to predict other putative targets of that transcription factor. A dynamic and mechanistic model of gene expression is at the heart of the technique. Measurement error is taken into account during the process, which allows an objective assessment of the robustness of fit and, therefore, the quality of the predictions. The package relies on efficient algorithms and vectorization to accomplish potentially time consuming tasks including optimization and differential equation integration. We demonstrate the efficiency and accuracy of rHVDM by examining the activity of the tumour-suppressing transcription factor, p53. AVAILABILITY: The version of the package presented here (1.8.1) is freely available from the Bioconductor Web site (
  • Quantitative proteomics analysis demonstrates post-transcriptional regulation of embryonic stem cell differentiation to hematopoiesis.

    Williamson, Andrew J K; Smith, Duncan L; Blinco, David; Unwin, Richard D; Pearson, Stella; Wilson, Claire L; Miller, Crispin J; Lancashire, Lee J; Lacaud, Georges; Kouskoff, Valerie; Whetton, Anthony D; Stem Cell and Leukemia Proteomics Laboratory, Faculty of Medical and Human Sciences, University of Manchester, Kinnaird House, Kinnaird Road, Manchester M20 4QL, United Kingdom. (2008-03)
    Embryonic stem (ES) cells can differentiate in vitro to produce the endothelial and hematopoietic precursor, the hemangioblasts, which are derived from the mesoderm germ layer. Differentiation of Bry(GFP/+) ES cell to hemangioblasts can be followed by the expression of the Bry(GFP/+) and Flk1 genes. Proteomic and transcriptomic changes during this differentiation process were analyzed to identify mechanisms for phenotypic change during early differentiation. Three populations of differentiating Bry(GFP) ES cells were obtained by flow cytometric sorting, GFP-Flk1- (epiblast), GFP+Flk1- (mesoderm), and GFP+Flk1+ (hemangioblast). Microarray analyses and relative quantification two-dimensional LCLC-MS/MS on nuclear extracts were performed. We identified and quantified 2389 proteins, 1057 of which were associated to their microarray probe set. These included a variety of low abundance transcription factors, e.g. UTF1, Sox2, Oct4, and E2F4, demonstrating a high level of proteomic penetrance. When paired comparisons of changes in the mRNA and protein expression levels were performed low levels of correlation were found. A strong correlation between isobaric tag-derived relative quantification and Western blot analysis was found for a number of nuclear proteins. Pathway and ontology analysis identified proteins known to be involved in the regulation of stem cell differentiation, and proteins with no described function in early ES cell development were also shown to change markedly at the proteome level only. ES cell development is regulated at the mRNA and protein level.
  • Exon level integration of proteomics and microarray data.

    Bitton, Danny A; Okoniewski, Michal J; Connolly, Yvonne; Miller, Crispin J; Cancer Research UK, Applied Computational Biology and Bioinformatics Group, Paterson Institute for Cancer Research, The University of Manchester, Christie Hospital Site, Wilmslow Road, Manchester, M20 4BX, UK. (2008)
    BACKGROUND: Previous studies comparing quantitative proteomics and microarray data have generally found poor correspondence between the two. We hypothesised that this might in part be because the different assays were targeting different parts of the expressed genome and might therefore be subjected to confounding effects from processes such as alternative splicing. RESULTS: Using a genome database as a platform for integration, we combined quantitative protein mass spectrometry with Affymetrix Exon array data at the level of individual exons. We found significantly higher degrees of correlation than have been previously observed (r = 0.808). The study was performed using cell lines in equilibrium in order to reduce a major potential source of biological variation, thus allowing the analysis to focus on the data integration methods in order to establish their performance. CONCLUSION: We conclude that part of the variation observed when integrating microarray and proteomics data may occur as a consequence both of the data analysis and of the high granularity to which studies have until recently been limited. The approach opens up the possibility for the first time of considering combined microarray and proteomics datasets at the level of individual exons and isoforms, important given the high proportion of alternative splicing observed in the human genome.
  • Eight-channel iTRAQ enables comparison of the activity of six leukemogenic tyrosine kinases.

    Pierce, Andrew; Unwin, Richard D; Evans, Caroline A; Griffiths, Stephen D; Carney, Louise; Zhang, Liqun; Jaworska, Ewa; Lee, Chia-Fang; Blinco, David; Okoniewski, Michal J; Miller, Crispin J; Bitton, Danny A; Spooncer, Elaine; Whetton, Anthony D; Stem Cell and Leukaemia Proteomics Laboratory, University of Manchester, Christie Hospital, Kinnaird House, Kinnaird Road, Manchester M204QL, United Kingdom. (2008-05)
    There are a number of leukemogenic protein-tyrosine kinases (PTKs) associated with leukemic transformation. Although each is linked with a specific disease their functional activity poses the question whether they have a degree of commonality in their effects upon target cells. Exon array analysis of the effects of six leukemogenic PTKs (BCR/ABL, TEL/PDGFRbeta, FIP1/PDGFRalpha, D816V KIT, NPM/ALK, and FLT3ITD) revealed few common effects on the transcriptome. It is apparent, however, that proteome changes are not directly governed by transcriptome changes. Therefore, we assessed and used a new generation of iTRAQ tagging, enabling eight-channel relative quantification discovery proteomics, to analyze the effects of these six leukemogenic PTKs. Again these were found to have disparate effects on the proteome with few common targets. BCR/ABL had the greatest effect on the proteome and had more effects in common with FIP1/PDGFRalpha. The proteomic effects of the four type III receptor kinases were relatively remotely related. The only protein commonly affected was eosinophil-associated ribonuclease 7. Five of six PTKs affected the motility-related proteins CAPG and vimentin, although this did not correspond to changes in motility. However, correlation of the proteomics data with that from the exon microarray not only showed poor levels of correlation between transcript and protein levels but also revealed alternative patterns of regulation of the CAPG protein by different oncogenes, illustrating the utility of such a combined approach.
  • X:Map: annotation and visualization of genome structure for Affymetrix exon array analysis.

    Yates, Tim; Okoniewski, Michal J; Miller, Crispin J; Cancer Research UK, Bioinformatics Group, Paterson Institute for Cancer Research, The University of Manchester, Christie Hospital Site, Wilmslow Road, Withington, Manchester, M20 4BX, UK. (2008-01)
    Affymetrix exon arrays aim to target every known and predicted exon in the human, mouse or rat genomes, and have reporters that extend beyond protein coding regions to other areas of the transcribed genome. This combination of increased coverage and precision is important because a substantial proportion of protein coding genes are predicted to be alternatively spliced, and because many non-coding genes are known also to be of biological significance. In order to fully exploit these arrays, it is necessary to associate each reporter on the array with the features of the genome it is targeting, and to relate these to gene and genome structure. X:Map is a genome annotation database that provides this information. Data can be browsed using a novel Google-maps based interface, and analysed and further visualized through an associated BioConductor package. The database can be found at