Distributed learning on 20 000+ lung cancer patients - The Personal Health Train
Authors
Deist, TMDankers, FJWM
Ojha, P
Scott, MM
Janssen, T
Faivre-Finn, Corinne
Masciocchi, C
Valentini, V
Wang, J
Chen, J
Zhang, Z
Spezi, E
Button, M
Jan, NJ
Vernhout, R
van, SJ
Jochems, A
Monshouwer, R
Bussink, J
Price, G
Lambin, P
Dekker, A
Affiliation
Department of Radiation Oncology (MAASTRO), GDepartment of Radiation Oncology (MAASTRO), GDepartment of Radiation Oncology, The Netherlands Cancer Institute - Antoni van Leeuwenhoek, Amsterdam, The NetherlandsIssue Date
2020
Metadata
Show full item recordAbstract
BACKGROUND AND PURPOSE: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute. MATERIALS AND METHODS: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots. RESULTS: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015. CONCLUSION: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy.Citation
Deist TM, Dankers F, Ojha P, Scott Marshall M, Janssen T, Faivre-Finn C, et al. Distributed learning on 20 000+ lung cancer patients - The Personal Health Train. Radiother Oncol. 2020;144:189-200.Journal
Radiotherapy and OncologyDOI
10.1016/j.radonc.2019.11.019PubMed ID
31911366Additional Links
https://dx.doi.org/10.1016/j.radonc.2019.11.019Type
ArticleLanguage
enae974a485f413a2113503eed53cd6c53
10.1016/j.radonc.2019.11.019
Scopus Count
Collections
Related articles
- Infrastructure platform for privacy-preserving distributed machine learning development of computer-assisted theragnostics in cancer.
- Authors: Field M, Thwaites DI, Carolan M, Delaney GP, Lehmann J, Sykes J, Vinod S, Holloway L
- Issue date: 2022 Oct
- Privacy-preserving federated machine learning on FAIR health data: A real-world application.
- Authors: Sinaci AA, Gencturk M, Alvarez-Romero C, Laleci Erturkmen GB, Martinez-Garcia A, Escalona-Cuaresma MJ, Parra-Calderon CL
- Issue date: 2024 Dec
- Colorectal cancer health and care quality indicators in a federated setting using the Personal Health Train.
- Authors: Choudhury A, Janssen E, Bongers BC, van Meeteren NLU, Dekker A, van Soest J
- Issue date: 2024 May 9
- Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care.
- Authors: Zerka F, Barakat S, Walsh S, Bogowicz M, Leijenaar RTH, Jochems A, Miraglio B, Townend D, Lambin P
- Issue date: 2020 Mar
- Predicting 30-Day Readmission Risk for Patients With Chronic Obstructive Pulmonary Disease Through a Federated Machine Learning Architecture on Findable, Accessible, Interoperable, and Reusable (FAIR) Data: Development and Validation Study.
- Authors: Alvarez-Romero C, Martinez-Garcia A, Ternero Vega J, Díaz-Jimènez P, Jimènez-Juan C, Nieto-Martín MD, Román Villarán E, Kovacevic T, Bokan D, Hromis S, Djekic Malbasa J, Beslać S, Zaric B, Gencturk M, Sinaci AA, Ollero Baturone M, Parra Calderón CL
- Issue date: 2022 Jun 2