Department of Emergency Medicine

Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study.

Publication/Presentation Date

9-1-2017

Abstract

STUDY OBJECTIVE: Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs.

METHODS: We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson's correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality.

RESULTS: One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8).

CONCLUSION: The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment.

Volume

Issue

First Page

394

Last Page

401

ISSN

1097-6760

Published In/Presented At

Thoma, B., Sebok-Syer, S. S., Krishnan, K., Siemens, M., Trueger, N. S., Colmers-Gray, I., Woods, R., Petrusa, E., Chan, T., & METRIQ Study Collaborators (2017). Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study. Annals of emergency medicine, 70(3), 394–401. https://doi.org/10.1016/j.annemergmed.2016.12.025

Disciplines

Medicine and Health Sciences

PubMedID

28262317

Department(s)

Department of Emergency Medicine

Document Type

Article

Link to Full Text

Find in your library

COinS

Department of Emergency Medicine

Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study.

Publication/Presentation Date

Abstract

Volume

Issue

First Page

Last Page

ISSN

Published In/Presented At

Disciplines

PubMedID

Department(s)

Document Type

Search

Browse

Author Corner

Department of Emergency Medicine

Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study.

Authors

Publication/Presentation Date

Abstract

Volume

Issue

First Page

Last Page

ISSN

Published In/Presented At

Disciplines

PubMedID

Department(s)

Document Type

Share

Search

Browse

Author Corner