Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.
Affiliation
School of Computer Science, University of Manchester, Manchester, UKIssue Date
2017-06-07
Metadata
Show full item recordAbstract
De-identification of clinical narratives is one of the main obstacles to making healthcare free text available for research. In this paper we describe our experience in expanding and tailoring two existing tools as part of the 2016 CEGS N-GRID Shared Tasks Track 1, which evaluated de-identification methods on a set of psychiatric evaluation notes for up to 25 different types of Protected Health Information (PHI). The methods we used rely on machine learning on either a large or small feature space, with additional strategies, including two-pass tagging and multi-class models, which both proved to be beneficial. The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F1-scores of ∼90% and above. Yet, some classes (Profession, Organization) proved again to be challenging given the variability of expressions used to reference given information.Citation
Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes. 2017 J Biomed InformJournal
Journal of Biomedical InformaticsDOI
10.1016/j.jbi.2017.06.005PubMed ID
28602908Type
ArticleLanguage
enISSN
1532-0480ae974a485f413a2113503eed53cd6c53
10.1016/j.jbi.2017.06.005
Scopus Count
Collections
Related articles
- Automated de-identification of free-text medical records.
- Authors: Neamatullah I, Douglass MM, Lehman LW, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD
- Issue date: 2008 Jul 24
- Combining knowledge- and data-driven methods for de-identification of clinical narratives.
- Authors: Dehghan A, Kovacevic A, Karystianis G, Keane JA, Nenadic G
- Issue date: 2015 Dec
- Sensitive Data Detection with High-Throughput Machine Learning Models in Electrical Health Records.
- Authors: Zhang K, Jiang X
- Issue date: 2023
- Automatic de-identification of textual documents in the electronic health record: a review of recent research.
- Authors: Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH
- Issue date: 2010 Aug 2
- De-identification of free text data containing personal health information: a scoping review of reviews.
- Authors: Negash B, Katz A, Neilson CJ, Moni M, Nesca M, Singer A, Enns JE
- Issue date: 2023