Artificial intelligence for skin lesion classification and diagnosis in dermatology: A narrative review

Artículo de revisión

Published on 20 de noviembre de 2025 | http://doi.org/10.5867/medwave.2025.10.3120

Artificial intelligence for skin lesion classification and diagnosis in dermatology: A narrative review

Detailed description of the selected articles.

Article reference	Authors - Country of origin	Type of article	Objective	Main Findings
[13]	Han SS., et al.Korea2020	Experimental study	Train an algorithm for: Malignancy prediction. Suggesting primary treatment options.	The algorithm showed high AUCs for malignancy detection (Edinburgh: 0.928; SNU: 0.937) and treatment suggestion (AUCs up to 0.918). Multi-class classification accuracy reached up to 56.7% (top 1) and 92.0% (top 5).
[14]	Gouabou AC., et al.France2022	Experimental study	To develop a new framework for automated melanoma diagnosis using artificial intelligence, specifically CNNs.	Achieved an area under the receiver operating curve of 0.93 for melanoma, 0.96 for nevus, and 0.97 for benign keratosis.
[15]	Guimarães, et al.Germany2022	Experimental study	To propose a fully automatic approach based on CNNs for the analysis of multiphoton tomography data in dermatology, specifically for atopic dermatitis (AD).	The proposed algorithm correctly diagnosed AD in 97.0 ± 0.2% of all images presenting living cells. Achieved a sensitivity of 0.966 ± 0.003, specificity of 0.977 ± 0.003, and F-score of 0.964 ± 0.002.
[16]	Fink A., et al.Germany2020	Diagnostic test accuracy study	To assess the diagnostic performance of a CNN in comparison with dermatologists for the differentiation of melanomas from combined naevi.	CNN sensitivity: 97.1% CNN specificity: 78.8% Dermatologists' sensitivity: 90.6% Dermatologists' specificity: 71.0% Benefit observed in 'beginners' with CNN verification (DOR = 98).
[17]	Brinker, et al.Germany2019	Experimental (CNN training)	To demonstrate the training of an image-classifier CNN that outperforms the winner of the ISBI 2016 CNNs challenge using open-source images exclusively.	Achieved an average precision of 0.709 (vs. 0.637 of the ISBI winner) and an area under the receiver operating curve of 0.85.
[18]	Yoo KH., et al.Korea2022	Comparative study	Investigate errors in estimating BSA in psoriasis by comparing physicians' results with those of computer-assisted.	The mean proportion of correct assessments by physicians was 49.4%. Physicians tended to overestimate the BSA of psoriatic lesions by 8.76% ± 8.82% compared to the CAIA.
[19]	Hekler A., et al.Germany2019	Comparative study	Investigate the potential benefit of combining human and artificial intelligence for skin cancer classification.	For the multiclass task, the combination of human and machine achieved an accuracy of 82.95%, 1.36% higher than the best individual classifier (CNN).
[20]	Winkler, et al.Germany2021	Cross-sectional analysis	To investigate whether scale bars in dermoscopic images are associated with the diagnostic accuracy of a market-approved CNN.	In images without scale bars, the CNN had a sensitivity of 87.0%, specificity of 87.9%, and ROC-AUC of 0.953. Superimposed scale bars may impair the CNN’s diagnostic accuracy by increasing false-positive diagnoses.
[21]	Sies K., et al.Germany2020	Cross-sectional study	Head-to-head comparison of a market-approved CNN (Moleanalyzer-Pro™, developed in 2018) to a CIA (Moleanalyzer-3™/Dynamole™; developed in 2004).	CNN sensitivity: 77.6%, specificity: 95.3%, ROC-AUC: 0.945.CIAI sensitivity: 53.4%, specificity: 86.6%, ROC-AUC: 0.738.Pairwise comparisons favor CNN, indicating clear outperformance (all p < 0.001).
[22],[23]	Chanki Yu, et al.Korea2018	Experimental	To evaluate the usefulness of a CNN in the early diagnosis of acral melanoma and benign nevi from dermoscopy images on the hands and feet, comparing with dermatologist.	The CNN achieved an accuracy of 83.51% and 80.23%, higher than the non-expert’s evaluation (67.84%, 62.71%) and close to that of the expert (81.08%, 81.64%).
[31][24]	Ba W., et al. China2022	Experimental	To evaluate the impact of CNN assistance on dermatologists for the interpretation of clinical images of cutaneous tumors.	The CNN achieved an overall accuracy of 78.45% and kappa of 0.73 in classifying cutaneous tumors. Dermatologists with CNN assistance showed significantly higher accuracy and kappa compared to unassisted dermatologists in interpreting clinical images.
[25]	Hekler A., et al. Germany2019	Experimental	To directly compare the performance of a CNN with 11 histopathologists in classifying histopathological melanoma images.	The CNN achieved a significantly higher mean sensitivity, specificity, and accuracy (76%, 60%, 68%) compared to the 11 histopathologists (51.8%, 66.5%, 59.2%) in classifying cropped histopathological melanoma images.
[26]	Zunair H., et al.Canada2020	Experimental	To address the class imbalance problem in skin lesion datasets and improve melanoma detection by proposing a two-stage framework.	The proposed approach demonstrates superiority over standard baseline methods, achieving significant performance improvements in skin lesion classification.
[27]	Birkner, et al.Germany2022	Experimental study	To develop a deep CNN for analyzing wound photographs to aid in the diagnosis of pyoderma gangrenosum and differentiate it from conventional leg ulcers.	A CNN was trained using 422 expert-selected pictures of PG and LU. In a comparison between man and machine, 33 PG pictures and 36 LU pictures were presented for diagnosis to 18 dermatologists and the CNN.
[28]	Yanagisawa, et al. Japan2019	Methodological study	To develop a CNN segmentation model that can automatically extract lesions and generate segmented images of the skin area from non-standardized conventional photo images.	The CNN segmentation model achieved approximately 90% of sensitivity and specificity in differentiating atopic dermatitis from malignant diseases and complications.
[29]	Brinker, et al.Germany2019	Observational study	To compare the performance of a CNN, exclusively trained with dermoscopic images, with dermatologists in the clinical image classification of melanoma.	Dermatologists achieved a mean sensitivity of 89.4% and a mean specificity of 64.4% with clinical images.
[30]	Felmingham C., et al.Australia2022	Prospective clinical study.	Validate the performance of a CNN developed by MoleMap Ltd and Monash eResearch in diagnosing skin cancers in a real-world clinical setting.	The impact of the AI algorithm on diagnostic and management decisions will be evaluated by: (1) comparing the initial management decision of the registrar with their AI-assisted decision, and (2) comparing the benign to malignant ratio (for lesions biopsied) between the preintervention and postintervention periods.
[31]	Tognetti, et al.Italy2021	Experimental	To develop a deep CNN model to support dermatologists in the classification and management of atypical melanocytic skin lesions (aMSL) by integrating dermoscopic images with clinical data.	The deep CNN_aMSL model achieved the best accuracy, with an AUC of 90.3%, sensitivity of 86.5%, and specificity of 73.6%. It was particularly effective in supporting management decisions, reducing inappropriate excisions.
[32]	Burlina, et al.USA2019	Experimental	To develop deep learning approaches using deep convolutional neural networks for detecting acute Lyme disease from erythema migrans images.	The machine achieved an accuracy of 86.53%, ROC-AUC of 0.9510, and Kappa of 0.7143 for detecting erythema migrans.
[33]	Filipescu, et al.Romania2021	Experimental	To classify dermoscopic images from distinct categories of lesions, distinguishing between malignant and benign cases.	The model achieved an overall accuracy of 78.11%. Out of 5031 samples in the test subset, 3958 were correctly classified.
[6]	Young, et al. USA2020	Literature review	Review dermatological applications of deep learning in artificial intelligence.	AI accuracy matches or exceeds dermatologists in skin lesion diagnosis. Lack of real-world clinical validation. Addresses applications in teledermatology, clinical assessment, and dermatopathology.
[34]	Pai V., et al.India2021	Narrative review	To assess the application of artificial intelligence in dermatology, emphasizing its impact on medical diagnosis, medical statistics, robotics, and human biology.	The application of deep learning methods, especially artificial neural networks, in the analysis of medical images, such as pigmented skin lesions, is highlighted.
[35]	Lee, et al.Korea2020	Cross-sectional analysis	Develop a deep learning framework to determine the Severity of Alopecia Tool (SALT) score for measurement of hair loss in patients with alopecia areata.	Improved accuracy and interrater reliability in the challenge study with computer-assisted approach Increased explanatory power of SALT score for predicting hair regrowth
[36]	Koo, et al.Korea2021	Experimental	To develop a deep learning-based autodetection model for quicker, more convenient, and consistent detection of hyphae in superficial fungal infections using microscopy images.	The model achieved high sensitivity and specificity, with values ranging from 93.2% to 100%, indicating reliable accuracy in detecting hyphae in microscopic images.
[37]	S.S. Han, et al. Korea2018	Observational study	To test the use of a deep learning algorithm for the classification of clinical images of various skin diseases, comparing its performance with dermatologists.	The algorithm demonstrated high accuracy (area under the curve) for the diagnosis of basal cell carcinoma, squamous cell carcinoma, intraepithelial carcinoma, and melanoma across different datasets.
[38]	S.S. Han, et al.Korea2018	Observational study	To evaluate the diagnostic accuracy of deep learning models in the diagnosis of onychomycosis using standardized nail images and compare it with dermatologists' assessments.	The AI (ensemble model) demonstrated high sensitivity, specificity, and area under the curve values for different datasets.
[3]	Fujisawa Y., et al.Japan2018	Observational study	To determine the efficiency of a skin tumor classification system using deep-learning technology with a small dataset of clinical images.	The deep CNN achieved an overall classification accuracy of 76.5%, with high sensitivity (96.3%) and specificity (89.5%).
[39]	Thomsen, et al.Denmark2019	Systematic review	To evaluate the utilization of machine learning in dermatology by conducting a systematic review of existing literature.	Identified eight major categories where machine learning tools were tested in dermatology. Most systems involved image recognition tools primarily aimed at binary classification of malignant melanoma (MM).
[40]	Koller S., et al.Austria2011	Observational study	To investigate the applicability of an automated image analysis system using a machine learning algorithm for diagnostic discrimination of benign and malignant melanocytic skin tumors in RCM	The classification tree analysis correctly classified 93.60% of melanoma and 90.40% of nevi images in the learning set. When applied to the independent test set, the system classified 46.71% of images in benign lesions as 'malignant' and 55.68% in malignant lesions
[41]	Gareau, et al. USA 2017	Observational study	To develop an automated approach for generating imaging biomarkers, applying machine learning algorithms to create an overall risk	The approach achieved 98% sensitivity and 36% specificity for melanoma detection, approaching the sensitivity/specificity of expert lesion evaluation.
[42]	Bruland, et al. Germany2015	Experimental	To develop a software tool: PIACS - Prurigo Image Analyzing and Comparing System, for automatic detection, categorization, and comparison of scratch-related skin lesions.	The software detected and categorized lesions with an overall sensitivity of 95.7% and an accuracy of 75.3%.
[43],[44]	Gareau, et al. USA2010	Analytic diagnostic accuracy study	To assess the potential of in-vivo reflectance confocal microscopy (RCM) for the early detection of superficial spreading melanoma by quantifying specific markers	The pattern recognition algorithm identified PMs in all five superficial spreading melanoma and none in the nevi.
44]	Lim K., et al. UK2021	Cross-sectional questionnaire study	To obtain patient opinions on the use of AI in a dermatology setting, specifically in aiding the diagnosis of skin cancers.	47% of respondents were not concerned if AI technology was used by a skin specialist to aid skin cancer diagnosis.81% considered it important for a dermatologist to examine and confirm a diagnosis and to be present for the discussion of a cancer diagnosis.
[45]	Samaran R., et al. France2021	Cross-sectional study using a questionnaire.	Assess difficulties faced by GPs in diagnosing non-pigmented skin tumors (NPSTs). Determine the interest of GPs in using artificial intelligence software as a diagnostic tool for NPSTs.	147 respondents (98%) had faced difficulties diagnosing NPSTs. 86% agreed that an AI diagnostic tool could be useful in a GP’s office. 83% agreed that AI could change their practice. 68% would not be willing to pay for this kind of software.
[46]	S.S. Han, et al. Korea2020	Experimental (algorithm development and evaluation)	To evaluate whether an algorithm could automatically locate suspected areas and predict the probability of a skin lesion being malignant.	The algorithm achieved an area under the receiver operating characteristic curve of 0.910 in the validation data set. At a high-sensitivity cutoff threshold, the model demonstrated a sensitivity of 76.8% and specificity of 90.6%.
[47]	Hurault G., et al. UK2020	Experimental study	To develop a mechanistic machine learning model that predicts the patient-specific evolution of atopic dermatitis AD severity scores on a daily basis.	The model successfully predicted the daily evolution of AD severity scores at the individual level, improving chance-level forecast by 60%.
[48]	A Jain, et al. USA2021	Multiple-reader, multiple-case diagnostic study.	To evaluate an AI based tool that assists with the diagnosis of dermatologic conditions by primary care physicians and 20 nurse practitioners.	AI assistance was significantly associated with higher agreement with reference diagnoses. For primary care physicians, the increase in diagnostic agreement was 10% (from 48% to 58%). For nurse practitioners, the increase was 12% (from 46% to 58%).
[49]	R. Moore et al. USA2021	Observational study	To assess whether automated digital analysis of TILs improves the accuracy of predicting disease-specific survival (DSS) in early-stage melanomas.	ADTA, when applied to a validation cohort, stratified patients into high and low-risk groups, demonstrating significant differences in Kaplan-Meier analysis (p ≤ 0.001). Multivariable Cox proportional hazards analysis showed that ADTA contributed to DSS prediction
[50]	K. Sohn, et al. USA2021	Retrospective study	Develop an algorithm to aid surgeons in detecting BCC on frozen sections during Mohs micrographic surgery	The final model achieved an AUC of ROC of 0.753, with a sensitivity of 70.6% at a specificity of 79.1%.
[51]	Chang W., et al. Taiwan2013	Retrospective study	Evaluate the feasibility of using CADx for diagnosing both melanocytic and non-melanocytic skin lesions.	CADx system performance: Area Under the Receiver Operating Characteristic (ROC) curve Az = 0.949, sensitivity = 85.63%, specificity = 87.65%, maximum accuracy = 90.64%.
[52]	Blum A., et al. Germany2020	Review and perspective	To examine how AI can allow medical work to focus on skin cancer patients and expedite treatment.	Advantages of AI in dermato-oncology include optimized medical focus on skin cancer patients, enhanced treatment speed, and user learning opportunities. Problematic aspects in AI use encompass medicolegal issues and remuneration challenges.
[53]	Goyal M., et al. USA2020	Review	To review and discuss advancements in digital image-based AI solutions for the diagnosis of skin cancer.	While claims suggest AI systems may achieve higher accuracy than dermatologists, their clinical application for assisting in skin cancer diagnosis is still in its early stages.
[54]	S.S. Han, et al.Korea2022	Single-center, paralleled, unmasked, randomized controlled trial	To validate whether AI could augment the accuracy of nonexpert physicians in real-world settings.	The accuracy of the AI-assisted group was significantly higher than the unaided group, particularly benefiting nondermatology trainees.
[55]	Rajpara S., et al.UK2009	Systematic review and meta-analysis.	To compare the diagnostic accuracy of different dermoscopic algorithms with each other and with digital dermoscopy/artificial intelligence for the detection of melanoma.	Pooled sensitivity for artificial intelligence was slightly higher than for dermoscopy (91% vs. 88%; p = 0.076). Pooled specificity for dermoscopy was significantly better than artificial intelligence (86% vs. 79%; P < 0.001).

CNN, Convolutional Neural Networks. AUC, Area Under the Curve. AD, Atopic dermatitis. BSA, Body Surface Area. ROC, Receiver Operating Characteristic. PG, Pyoderma Gangrenosum. LU, Leg Ulcers. AI, Artificial intelligence. SALT, Severity of Alopecia Tool. MM, Malignant Melanoma. PIACS, Prurigo Image Analyzing and Comparing System. RCM, Reflectance Confocal Microscopy. NPST, Non-Pigmented Skin Tumors. DSS, Disease-Specific Survival. USA, United States of America. UK, United Kingdom. DOR, Diagnostic Odds Ratio. ISBI, International Society for Burn Injuries. CIA, Conventional Image Analyzer. aMSL, atypical Melanocytic Skin Lesions. GP, General Practitioner. CAIA, Computer-Assisted Image Analysis. PMs, Pigmented Melanoma. TILs, Tumor-Infiltrating Lymphocytes. ADTA, Automated Digital (TIL) Analysis. BCC, Basal Cell Carcinoma. CADx, Computer-Aided Diagnosis.

Prepared by the authors based on the results of the study.

Este sitio usa cookies