Evaluation of Artificial Intelligence Chatbots for Facial Injection Planning: Comparative Performance and Safety Limitations.
Publication/Presentation Date
7-16-2025
Abstract
BACKGROUND: To evaluate the performance of artificial intelligence (AI)-powered chatbots in generating treatment plans for facial aesthetic injections, focusing on their accuracy, safety, and clinical applicability.
METHODS: A comparative observational study was conducted in an otolaryngology tertiary care department according to STROBE guidelines. Patients seeking facial injections were recruited from July to October 2024. Forty patients (85% female; mean age: 45.8 years) underwent photographic documentation and received AI-generated treatment plans for botulinum toxin and hyaluronic acid injections. Six AI chatbots and three generative vision models were evaluated based on five criteria: product selection, injection strategy, facial analysis, alignment with patient preferences, and safety. Likert scale ratings, each ranging from - 2 to + 2, were analyzed using Friedman and Durbin-Conover pairwise tests to identify significant differences (p < 0.05). The sum of the five Likert scales provided an overall score ranging from - 10 to + 10.
RESULTS: ChatGPTo1 and ChatGPT4o achieved higher scores than other chatbots across most evaluation criteria, with mean total scores of 7.87 ± 0.29 and 7.85 ± 0.44, respectively (p = 0.295). Both chatbots were statistically superior (p < 0.05) to Claude, CopilotPro, and Llama in product selection (ChatGPT4o = 1.92 ± 0.05), injection strategy precision (ChatGPTo1 = 1.67 ± 0.08), alignment with patient preferences (ChatGPTo1 = 1.95 ± 0.03) and safety (ChatGPTo1 = 1.30 ± 0.17). Claude provided relevant facial analysis (1.50 ± 0.16) without significant difference compared to ChatGPT models (all p > 0.05). Generative vision models failed to produce relevant visual annotations.
CONCLUSION: Among the AI systems tested, ChatGPT-based chatbots demonstrated relatively superior performance in generating treatment plans for facial injections. However, safety limitations remain and preclude unsupervised clinical use.
LEVEL OF EVIDENCE IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
ISSN
1432-5241
Published In/Presented At
Radulesco, T., Ebode, D., Maniaci, A., Gargula, S., Saibene, A. M., Chiesa-Estomba, C., Gengler, I., Vaira, L., Vishnumurthy, P., Lechien, J. R., & Michel, J. (2025). Evaluation of Artificial Intelligence Chatbots for Facial Injection Planning: Comparative Performance and Safety Limitations. Aesthetic plastic surgery, 10.1007/s00266-025-05010-8. Advance online publication. https://doi.org/10.1007/s00266-025-05010-8
Disciplines
Medicine and Health Sciences
PubMedID
40670654
Department(s)
Department of Surgery, Division of Otolaryngology
Document Type
Article