Norbert KISS 医师
医学博士
其他作者: Mehdi Boostani, Giovanni Pellacani, Mohamad Goldust, Carmen Cantisani, Gyorgy Paragh, Paweł Pietkiewicz, András Bánvölgyi, Norbert Wikonkál, Péter Holló
Large language models for skin condition diagnosis: promise and pitfalls
Objectives: To provide attendees with an overview of the diagnostic potential and current limitations of multimodal large language models (LLMs) in dermatology, highlighting their performance across common and severe skin diseases. Learning outcomes include understanding LLM diagnostic accuracy, pitfalls in clinical use, and implications for patient safety and future integration.
Introduction: Patients increasingly use LLMs with vision capabilities for self-assessment of skin lesions. Our group conducted multiple prospective studies to evaluate their role in real-world dermatology, covering inflammatory and oncologic conditions. These investigations provide direct evidence of both promise and limitations.
Materials / method: We performed prospective image-based studies between 2022–2025 at Semmelweis University and collaborating centers. Standardized clinical and dermoscopic images from patients with acne, rosacea, hidradenitis suppurativa, actinic keratosis, squamous cell carcinoma, and melanoma were analyzed using GPT-4o, Gemini 2.0 Flash, and Claude Sonnet. Outputs were validated against board-certified dermatologists and histopathology where applicable.
Results: LLMs achieved high diagnostic accuracy in acne/rosacea (up to 93%) and melanoma dermoscopy (sensitivity >90%), while performance was variable in HS staging and limited in distinguishing AK from SCC. Across conditions, negative predictive values remained low and outputs inconsistent between models, raising safety concerns. Combined model use improved sensitivity but not overall reliability.
Conclusion: Our original investigations demonstrate that LLMs can reach dermatologist-level accuracy in selected tasks, yet their current limitations prevent safe stand-alone use. Integration into dermatology requires rigorous validation, careful oversight, and structured regulation to harness their potential while avoiding patient harm.