Article Title

Using Natural Language Processing to Identify Health Plan Beneficiaries With Pulmonary Nodules

Publication Date



pulmonary nodule, natural language processing


Background/Aims: The development of a portable, automated method for identifying individuals with lung nodules will facilitate the efficient conduct of population-based studies of nodule care and associated outcomes. We evaluated the performance of a previously developed natural language processing (NLP) algorithm for identifying health plan beneficiaries with pulmonary nodules.

Methods: A cross-sectional study was performed of 500 randomly selected adult, in-network health plan beneficiaries with continuous enrollment at Group Health Cooperative who underwent a computed tomography (CT) of the chest in 2012, had no history of lung cancer and had not undergone a CT between 2009 and 2011. An NLP algorithm originally developed at Kaiser Permanente Southern California assessed electronic radiology reports using keywords and qualifiers relating to pulmonary nodules ranging in size from 5 to 30 mm among individuals who had undergone CT and had an International Classification of Diseases (ICD-9-CM) diagnostic code for a lung nodule. This algorithm was applied to our patient population and modified to identify pulmonary nodules regardless of size. A trained chart abstractor reviewed radiology reports to determine whether the radiologist reported a lung nodule. An experienced, board-certified thoracic surgeon adjudicated radiology reports with unclear documentation of a nodule.

Results: The true prevalence of pulmonary nodules among individuals undergoing CT in 2012 — median age 65 years, 43% men, 84% white, 51% smokers — was 34%. Median nodule size was 6 mm (range 2–87 mm). NLP identified 218 (44%) individuals with a nodule. The accuracy of NLP was as follows: sensitivity 91%, specificity 81%, positive predictive value 72% and negative predictive value 95%.

Discussion: An automated method of using NLP and electronic radiology text reports — originally developed at one Cancer Research Network (CRN) site — reasonably identifies health plan members with pulmonary nodules at another CRN site. This finding supports the notion that automated methods are portable across integrated health systems and institutions using electronic medical records. Ongoing work seeks to determine whether modifications to the NLP algorithm can improve performance. Given its current performance characterized by a high negative predictive value, NLP could be used to decrease the burden of chart abstraction in population-based studies of nodule care.




March 27th, 2015


April 28th, 2015