Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.
AffiliationSchool of Computer Science, University of Manchester, Manchester, UK
MetadataShow full item record
AbstractDe-identification of clinical narratives is one of the main obstacles to making healthcare free text available for research. In this paper we describe our experience in expanding and tailoring two existing tools as part of the 2016 CEGS N-GRID Shared Tasks Track 1, which evaluated de-identification methods on a set of psychiatric evaluation notes for up to 25 different types of Protected Health Information (PHI). The methods we used rely on machine learning on either a large or small feature space, with additional strategies, including two-pass tagging and multi-class models, which both proved to be beneficial. The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F1-scores of ∼90% and above. Yet, some classes (Profession, Organization) proved again to be challenging given the variability of expressions used to reference given information.
CitationLearning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes. 2017 J Biomed Inform
JournalJournal of Biomedical Informatics
- Automated de-identification of free-text medical records.
- Authors: Neamatullah I, Douglass MM, Lehman LW, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD
- Issue date: 2008 Jul 24
- Automatic de-identification of textual documents in the electronic health record: a review of recent research.
- Authors: Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH
- Issue date: 2010 Aug 2
- Combining knowledge- and data-driven methods for de-identification of clinical narratives.
- Authors: Dehghan A, Kovacevic A, Karystianis G, Keane JA, Nenadic G
- Issue date: 2015 Dec
- De-identification of clinical notes via recurrent neural network and conditional random field.
- Authors: Liu Z, Tang B, Wang X, Chen Q
- Issue date: 2017 Nov
- Assessing the difficulty and time cost of de-identification in clinical narratives.
- Authors: Dorr DA, Phillips WF, Phansalkar S, Sims SA, Hurdle JF
- Issue date: 2006