Wrangling real-world data: Optimizing clinical research through factor selection with LASSO regression
Recommended Citation
Howard KA, Anderson W, Podichetty JT, et al. Wrangling Real-World Data: Optimizing Clinical Research Through Factor Selection with LASSO Regression. Int J Environ Res Public Health. 2025;22(4):464. Published 2025 Mar 21. doi:10.3390/ijerph22040464
Abstract
Data-driven approaches to clinical research are necessary for understanding and effectively treating infectious diseases. However, challenges such as issues with data validity, lack of collaboration, and difficult-to-treat infectious diseases (e.g., those that are rare or newly emerging) hinder research. Prioritizing innovative methods to facilitate the continued use of data generated during routine clinical care for research, but in an organized, accelerated, and shared manner, is crucial. This study investigates the potential of CURE ID, an open-source platform to accelerate drug-repurposing research for difficult-to-treat diseases, with COVID-19 as a use case. Data from eight US health systems were analyzed using least absolute shrinkage and selection operator (LASSO) regression to identify key predictors of 28-day all-cause mortality in COVID-19 patients, including demographics, comorbidities, treatments, and laboratory measurements captured during the first two days of hospitalization. Key findings indicate that age, laboratory measures, severity of illness indicators, oxygen support administration, and comorbidities significantly influenced all-cause 28-day mortality, aligning with previous studies. This work underscores the value of collaborative repositories like CURE ID in providing robust datasets for prognostic research and the importance of factor selection in identifying key variables, helping to streamline future research and drug-repurposing efforts.
Document Type
Article
PubMed ID
40283693
Affiliations
Advocate Christ Medical Center