Recommended Citation
Biskis A, McKillip R, Chacko R. Improving Diagnosis of Urinary Tract Infections Using Machine Learning. Presented at Scientific Day; May 21, 2025; Park Ridge, IL.
Abstract
Background/Significance:
High rates of misdiagnosis and long turnaround time for urine cultures (UCX) demand novel methods for diagnosing urinary tract infections (UTIs) in the Emergency Department (ED), especially amid escalating rates of antibiotic resistance. Machine learning (ML) models trained on urinalysis (UA) and other chart data have shown promise by outperforming emergency physicians in predicting a positive UCX. It remains unclear if traditional ML algorithms used for classification tasks, such as Extreme Gradient Boosting (XGBoost), will continue to outperform novel untested algorithms as available sample sizes increase.
Purpose:
Investigate whether novel ML algorithms including deep neural networks outperform traditional models if trained on larger datasets for the diagnosis of UTIs.
Methods:
This was a retrospective analysis of patient encounters with a UA from 26 EDs within the Advocate Health Care and Aurora Health Care system between January 2015 and December 2024. Extracted data included UA and UCX results. Preprocessing consisted of dividing the data into predictor variables (i.e. features) and outcome data (i.e. target label), smoothing and discretizing continuous data using k-means clustering, and improving text data through regular expression searches. Model training and validation consisted of randomly partitioning the data into an 80% (training)/20% (testing) split, with a tenfold cross validation on the training set for hyperparameter tuning. Baseline characteristics were analyzed using descriptive statistics. The performance of XGBoost and TabNet, a transformer model, were compared. Area under the receiver operating characteristic (AUROC) curve, F1 scores, and computational time were measured for each model.
Results:
We collected 1.9 million UAs, with approximately 650,000 linked UCXs. XGBoost outperformed TabNet (XGBoost F1=0.93, AUROC = 0.97, TabNet F1=0.90, AUROC=0.96) at approximately 1:1000 the computational time. XGBoost trained on a large-scale dataset outperformed XGBoost trained on a small-scale dataset (large-scale F1=0.93, AUROC = 0.97, small-scale F1=0.90, AUROC=0.96).
Conclusion:
The results highlight the importance of dataset size on classification performance. Despite underperformance, TabNet showed nearly linear improvements in F1 and AUROC with computational time. Future efforts will include training TabNet for longer on a high-powered NVIDIA H00H200 GPU, which has a larger memory capacity and optimized AI architecture.
Presentation Notes
Presented at Scientific Day; May 21, 2025; Park Ridge, IL.
Full Text of Presentation
wf_yes
Document Type
Oral/Podium Presentation
Improving Diagnosis of Urinary Tract Infections Using Machine Learning
Background/Significance:
High rates of misdiagnosis and long turnaround time for urine cultures (UCX) demand novel methods for diagnosing urinary tract infections (UTIs) in the Emergency Department (ED), especially amid escalating rates of antibiotic resistance. Machine learning (ML) models trained on urinalysis (UA) and other chart data have shown promise by outperforming emergency physicians in predicting a positive UCX. It remains unclear if traditional ML algorithms used for classification tasks, such as Extreme Gradient Boosting (XGBoost), will continue to outperform novel untested algorithms as available sample sizes increase.
Purpose:
Investigate whether novel ML algorithms including deep neural networks outperform traditional models if trained on larger datasets for the diagnosis of UTIs.
Methods:
This was a retrospective analysis of patient encounters with a UA from 26 EDs within the Advocate Health Care and Aurora Health Care system between January 2015 and December 2024. Extracted data included UA and UCX results. Preprocessing consisted of dividing the data into predictor variables (i.e. features) and outcome data (i.e. target label), smoothing and discretizing continuous data using k-means clustering, and improving text data through regular expression searches. Model training and validation consisted of randomly partitioning the data into an 80% (training)/20% (testing) split, with a tenfold cross validation on the training set for hyperparameter tuning. Baseline characteristics were analyzed using descriptive statistics. The performance of XGBoost and TabNet, a transformer model, were compared. Area under the receiver operating characteristic (AUROC) curve, F1 scores, and computational time were measured for each model.
Results:
We collected 1.9 million UAs, with approximately 650,000 linked UCXs. XGBoost outperformed TabNet (XGBoost F1=0.93, AUROC = 0.97, TabNet F1=0.90, AUROC=0.96) at approximately 1:1000 the computational time. XGBoost trained on a large-scale dataset outperformed XGBoost trained on a small-scale dataset (large-scale F1=0.93, AUROC = 0.97, small-scale F1=0.90, AUROC=0.96).
Conclusion:
The results highlight the importance of dataset size on classification performance. Despite underperformance, TabNet showed nearly linear improvements in F1 and AUROC with computational time. Future efforts will include training TabNet for longer on a high-powered NVIDIA H00H200 GPU, which has a larger memory capacity and optimized AI architecture.
Affiliations
Advocate Aurora Research Institute, Advocate Christ Medical Center