Automated Patient Note Grading: Examining Scoring Reliability and Feasibility.
Publication/Presentation Date
11-1-2023
Abstract
PURPOSE: Scoring postencounter patient notes (PNs) yields significant insights into student performance, but the resource intensity of scoring limits its use. Recent advances in natural language processing (NLP) and machine learning allow application of automated short answer grading (ASAG) for this task. This retrospective study evaluated psychometric characteristics and reliability of an ASAG system for PNs and factors contributing to implementation, including feasibility and case-specific phrase annotation required to tune the system for a new case.
METHOD: PNs from standardized patient (SP) cases within a graduation competency exam were used to train the ASAG system, applying a feed-forward neural networks algorithm for scoring. Using faculty phrase-level annotation, 10 PNs per case were required to tune the ASAG system. After tuning, ASAG item-level ratings for 20 notes were compared across ASAG-faculty (4 cases, 80 pairings) and ASAG-nonfaculty (2 cases, 40 pairings). Psychometric characteristics were examined using item analysis and Cronbach's alpha. Inter-rater reliability (IRR) was examined using kappa.
RESULTS: ASAG scores demonstrated sufficient variability in differentiating learner PN performance and high IRR between machine and human ratings. Across all items the ASAG-faculty scoring mean kappa was .83 (SE ± .02). The ASAG-nonfaculty pairings kappa was .83 (SE ± .02). The ASAG scoring demonstrated high item discrimination. Internal consistency reliability values at the case level ranged from a Cronbach's alpha of .65 to .77. Faculty time cost to train and supervise nonfaculty raters for 4 cases was approximately $1,856. Faculty cost to tune the ASAG system was approximately $928.
CONCLUSIONS: NLP-based automated scoring of PNs demonstrated a high degree of reliability and psychometric confidence for use as learner feedback. The small number of phrase-level annotations required to tune the system to a new case enhances feasibility. ASAG-enabled PN scoring has broad implications for improving feedback in case-based learning contexts in medical education.
Volume
98
Issue
11S
First Page
90
Last Page
90
ISSN
1938-808X
Published In/Presented At
Bond, W. F., Zhou, J., Bhat, S., Park, Y. S., Ebert-Allen, R. A., Ruger, R. L., & Yudkowsky, R. (2023). Automated Patient Note Grading: Examining Scoring Reliability and Feasibility. Academic medicine : journal of the Association of American Medical Colleges, 98(11S), S90–S97. https://doi.org/10.1097/ACM.0000000000005357
Disciplines
Medicine and Health Sciences
PubMedID
37983401
Department(s)
Department of Emergency Medicine
Document Type
Article