Identifying pneumonia sub-types from electronic health records using rule-based algorithms


Advocate Aurora Research Institute


Background: International Classification of Disease (ICD) coding for pneumonia classification is based on causal organism or use of general pneumonia codes, creating challenges for epidemiological evaluations, where pneumonia is standardly subtyped by settings, exposures and time of emergence. Pneumonia subtype classification requires data available in electronic health records (EHR), frequently in non-structured formats including radiological interpretation or clinical notes that complicate electronic classification.

Objective: The current study undertook development of a rule-based pneumonia subtyping algorithm for stratifying pneumonia by the setting in which it emerged using information documented in the EHR.

Methods: Pneumonia subtype classification was developed by interrogating patient information within the EHR of a large private Health System. ICD coding was mined in the EHR applying requirements for 'rule of two' pneumonia-related codes or one ICD code and radiologically-confirmed pneumonia validated by natural language processing and/or documented antibiotic prescriptions. A rule-based algorithm flow chart was created to support sub-classification based on features including symptomatic patient point of entry into the healthcare system timing of pneumonia emergence and identification of clinical, laboratory or medication orders that informed definition of the pneumonia sub-classification algorithm.

Results: Data from 65,904 study-eligible patients with 91,998 episodes of pneumonia diagnoses documented by 380,509 encounters were analyzed, while 8,611 episodes were excluded following NLP classification of pneumonia status as 'negative' or 'unknown'. Subtyping of 83,387 episodes identified: community acquired (54.5%), hospital-acquired (20%), aspiration-related (10.7%), healthcare-acquired (5%), ventilator-associated (0.4%) cases, and 9.4% were not classifiable by the algorithm.

Conclusion: Study outcome indicated capacity to achieve electronic pneumonia subtype classification based on interrogation of big data available in the EHR. Examination of portability of the algorithm to achieve rule-based pneumonia classification in other health systems remains to be explored.

Document Type


PubMed ID