USF-LVHN SELECT
Detection of Confounders and Potential Confounders in Computed Tomography Lung Datasets.
Publication/Presentation Date
11-11-2025
Abstract
Machine learning models trained on computed tomography (CT) images are highly sensitive to variations in imaging acquisition parameters. Even subtle inconsistencies, often unnoticeable to human radiologists, can significantly degrade model accuracy. In clinical practice, datasets frequently exhibit heterogeneity due to variations in imaging protocols and scanner characteristics, which makes associated metadata a valuable but often underutilized resource for identifying sources of bias. To address this, we propose a novel unsupervised method that systematically identifies confounding and potentially confounding factors embedded in metadata. The key strengths of our method include automated detection of influential metadata attributes, minimal reliance on manual input, and the capability to proactively flag variables that could induce model drift post-deployment. Empirical evaluation in two distinct CT datasets demonstrates that controlling for factors identified by our method drastically improves model performance, increasing classification accuracy by 5 to 15% compared to datasets where these factors remain uncontrolled. These comparative results underscore the potential of our approach to substantially improve the robustness, consistency, and clinical applicability of radiomic machine learning models.
ISSN
2948-2933
Published In/Presented At
Fetisov, N., Ho, W. L. J., Zamzmi, G., Hall, L., Goldgof, D., & Schabath, M. (2025). Detection of Confounders and Potential Confounders in Computed Tomography Lung Datasets. Journal of imaging informatics in medicine, 10.1007/s10278-025-01610-7. Advance online publication. https://doi.org/10.1007/s10278-025-01610-7
Disciplines
Medical Education | Medicine and Health Sciences
PubMedID
41217728
Department(s)
USF-LVHN SELECT Program, USF-LVHN SELECT Program Students
Document Type
Article