USF-LVHN SELECT

Detection of Confounders and Potential Confounders in Computed Tomography Lung Datasets.

Publication/Presentation Date

11-11-2025

Abstract

Machine learning models trained on computed tomography (CT) images are highly sensitive to variations in imaging acquisition parameters. Even subtle inconsistencies, often unnoticeable to human radiologists, can significantly degrade model accuracy. In clinical practice, datasets frequently exhibit heterogeneity due to variations in imaging protocols and scanner characteristics, which makes associated metadata a valuable but often underutilized resource for identifying sources of bias. To address this, we propose a novel unsupervised method that systematically identifies confounding and potentially confounding factors embedded in metadata. The key strengths of our method include automated detection of influential metadata attributes, minimal reliance on manual input, and the capability to proactively flag variables that could induce model drift post-deployment. Empirical evaluation in two distinct CT datasets demonstrates that controlling for factors identified by our method drastically improves model performance, increasing classification accuracy by 5 to 15% compared to datasets where these factors remain uncontrolled. These comparative results underscore the potential of our approach to substantially improve the robustness, consistency, and clinical applicability of radiomic machine learning models.

ISSN

2948-2933

Disciplines

Medical Education | Medicine and Health Sciences

PubMedID

41217728

Department(s)

USF-LVHN SELECT Program, USF-LVHN SELECT Program Students

Document Type

Article

Share

COinS