Establishing a colorectal cancer research database from routinely collected health data: the process and potential from a pilot study
Authors
Tamm, A.Jones, H. J.
Perry, W.
Campbell, D.
Carten, R.
Davies, J.
Galdikas, A.
English, L.
Garbett, Alexander
Glampson, B.
Harris, S.
Khan, K.
Little, S.
Malcomson, Lee
Matharu, S.
Mayer, E.
Mercuri, L.
Morris, E. J.
Muirhead, R.
Norris, R.
O'Hara, Catherine
Papadimitriou, D.
Peek, N.
Renehan, Andrew G
Roadknight, G.
Starling, N.
Teare, M.
Turner, R.
Várnai, K. A.
Wasan, H.
Woods, K.
Cunningham, C.
Affiliation
NIHR Oxford Biomedical Research Centre, Oxford, UKIssue Date
2022
Metadata
Show full item recordAbstract
Objective: Colorectal cancer is a common cause of death and morbidity. A significant amount of data are routinely collected during patient treatment, but they are not generally available for research. The National Institute for Health Research Health Informatics Collaborative in the UK is developing infrastructure to enable routinely collected data to be used for collaborative, cross-centre research. This paper presents an overview of the process for collating colorectal cancer data and explores the potential of using this data source. Methods: Clinical data were collected from three pilot Trusts, standardised and collated. Not all data were collected in a readily extractable format for research. Natural language processing (NLP) was used to extract relevant information from pseudonymised imaging and histopathology reports. Combining data from many sources allowed reconstruction of longitudinal histories for each patient that could be presented graphically. Results: Three pilot Trusts submitted data, covering 12 903 patients with a diagnosis of colorectal cancer since 2012, with NLP implemented for 4150 patients. Timelines showing individual patient longitudinal history can be grouped into common treatment patterns, visually presenting clusters and outliers for analysis. Difficulties and gaps in data sources have been identified and addressed. Discussion: Algorithms for analysing routinely collected data from a wide range of sites and sources have been developed and refined to provide a rich data set that will be used to better understand the natural history, treatment variation and optimal management of colorectal cancer. Conclusion: The data set has great potential to facilitate research into colorectal cancer.Journal
BMJ Health Care InformDOI
10.1136/bmjhci-2021-100535PubMed ID
35738723Additional Links
https://dx.doi.org/10.1136/bmjhci-2021-100535Type
ArticleLanguage
enae974a485f413a2113503eed53cd6c53
10.1136/bmjhci-2021-100535
Scopus Count
Collections
Related articles
- A method for cohort selection of cardiovascular disease records from an electronic health record system.
- Authors: Abrahão MTF, Nobre MRC, Gutierrez MA
- Issue date: 2017 Jun
- Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system.
- Authors: Fonferko-Shadrach B, Lacey AS, Roberts A, Akbari A, Thompson S, Ford DV, Lyons RA, Rees MI, Pickrell WO
- Issue date: 2019 Apr 1
- Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.
- Authors: Wulff A, Mast M, Hassler M, Montag S, Marschollek M, Jack T
- Issue date: 2020 Dec
- Challenges of Developing a Natural Language Processing Method With Electronic Health Records to Identify Persons With Chronic Mobility Disability.
- Authors: Agaronnik ND, Lindvall C, El-Jawahri A, He W, Iezzoni LI
- Issue date: 2020 Oct
- Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system.
- Authors: Chen Y, Hao L, Zou VZ, Hollander Z, Ng RT, Isaac KV
- Issue date: 2022 May 12