Department of Surgery

Evaluation of Artificial Intelligence Chatbots for Facial Injection Planning: Comparative Performance and Safety Limitations.

Thomas Radulesco
Dario Ebode
Antonino Maniaci
Stéphane Gargula
Alberto M Saibene
Carlos Chiesa-Estomba
Isabelle Gengler MD, Lehigh Valley Health NetworkFollow
Luigi Vaira
Priya Vishnumurthy
Jérôme R Lechien
Justin Michel

Publication/Presentation Date

7-16-2025

Abstract

BACKGROUND: To evaluate the performance of artificial intelligence (AI)-powered chatbots in generating treatment plans for facial aesthetic injections, focusing on their accuracy, safety, and clinical applicability.

METHODS: A comparative observational study was conducted in an otolaryngology tertiary care department according to STROBE guidelines. Patients seeking facial injections were recruited from July to October 2024. Forty patients (85% female; mean age: 45.8 years) underwent photographic documentation and received AI-generated treatment plans for botulinum toxin and hyaluronic acid injections. Six AI chatbots and three generative vision models were evaluated based on five criteria: product selection, injection strategy, facial analysis, alignment with patient preferences, and safety. Likert scale ratings, each ranging from - 2 to + 2, were analyzed using Friedman and Durbin-Conover pairwise tests to identify significant differences (p < 0.05). The sum of the five Likert scales provided an overall score ranging from - 10 to + 10.

RESULTS: ChatGPTo1 and ChatGPT4o achieved higher scores than other chatbots across most evaluation criteria, with mean total scores of 7.87 ± 0.29 and 7.85 ± 0.44, respectively (p = 0.295). Both chatbots were statistically superior (p < 0.05) to Claude, CopilotPro, and Llama in product selection (ChatGPT4o = 1.92 ± 0.05), injection strategy precision (ChatGPTo1 = 1.67 ± 0.08), alignment with patient preferences (ChatGPTo1 = 1.95 ± 0.03) and safety (ChatGPTo1 = 1.30 ± 0.17). Claude provided relevant facial analysis (1.50 ± 0.16) without significant difference compared to ChatGPT models (all p > 0.05). Generative vision models failed to produce relevant visual annotations.

CONCLUSION: Among the AI systems tested, ChatGPT-based chatbots demonstrated relatively superior performance in generating treatment plans for facial injections. However, safety limitations remain and preclude unsupervised clinical use.

LEVEL OF EVIDENCE IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

ISSN

1432-5241

Published In/Presented At

Radulesco, T., Ebode, D., Maniaci, A., Gargula, S., Saibene, A. M., Chiesa-Estomba, C., Gengler, I., Vaira, L., Vishnumurthy, P., Lechien, J. R., & Michel, J. (2025). Evaluation of Artificial Intelligence Chatbots for Facial Injection Planning: Comparative Performance and Safety Limitations. Aesthetic plastic surgery, 10.1007/s00266-025-05010-8. Advance online publication. https://doi.org/10.1007/s00266-025-05010-8

Disciplines

Medicine and Health Sciences

PubMedID

40670654

Department(s)

Department of Surgery, Division of Otolaryngology

Document Type

Article

Link to Full Text

Find in your library

COinS

Department of Surgery

Evaluation of Artificial Intelligence Chatbots for Facial Injection Planning: Comparative Performance and Safety Limitations.

Publication/Presentation Date

Abstract

ISSN

Published In/Presented At

Disciplines

PubMedID

Department(s)

Document Type

Search

Browse

Author Corner

Department of Surgery

Evaluation of Artificial Intelligence Chatbots for Facial Injection Planning: Comparative Performance and Safety Limitations.

Authors

Publication/Presentation Date

Abstract

ISSN

Published In/Presented At

Disciplines

PubMedID

Department(s)

Document Type

Share

Search

Browse

Author Corner