Clinical evidence portfolio

Five peer-reviewed publications. Evidence across six countries. 2M+ API calls with zero adverse events. Built for MDR Class IIa, and for the scrutiny that comes with it.

1. Executive summary

Autoderm is an AI-powered dermatology decision support API that analyses smartphone images to screen for 70+ skin conditions, returning results in under 1 second. The software has been researched since 2018 and has processed over 2 million API calls across deployments with Boots UK, DocMorris Germany, myGP UK, and others. This document summarises Autoderm’s complete clinical evidence portfolio across nine categories: peer-reviewed performance studies, white paper performance studies, exploratory diversity evaluations, human-AI reader studies, real-world safety data, algorithm progression, demographic evidence, regulatory compliance, and evidence strength assessment.

Intended use

The Autoderm AI as a medical device is intended to be integrated into third party services for either of the following purposes

  • a decision support tool for healthcare workers to enhance decision making & diagnoses in clinical workflows
  • a skin analytics tool on skin diseases to be used as a search engine, symptom checker (assessment) or educational tool on skin disease and help find the right information for the possible skin disease to make informed decisions

Intended users

  • Healthcare professionals including, but not limited, doctors, pharmacists, nurses
  • Laypersons

Overall assessment

Autoderm’s clinical evidence portfolio is sufficient to transition from MDD Class I to MDR Class IIa. The combination of five peer-reviewed publications across five countries, five white papers providing reader studies, standalone performance validation, bias evaluation, and real-world deployment data, 2M+ API calls with zero adverse events, and FDA Breakthrough Device designation places Autoderm among the best-evidenced AI dermatology devices at Class IIa level.This document summarises Autoderm’s complete clinical evidence portfolio across nine categories: peer-reviewed performance studies, white paper performance studies, exploratory diversity evaluations, human-AI reader studies, real-world safety data, algorithm progression, demographic evidence, regulatory compliance, and evidence strength assessment.

Regulatory Note on White Papers

The Boots UK GP Reader Study (listed as white paper #6 below) is the same study as the Boots UK PMCF study described in Section 9.2. It is listed in both contexts because it serves dual purposes: as standalone clinical evidence and as a formal PMCF obligation under the Quality Management System.

5 5 2M+ 6 0
Peer-Reviewed Studies White Papers API Calls Countries Adverse Events

2. Peer-reviewed publications

The Boots UK GP Reader Study (listed as white paper #6 below) is the same study as the Boots UK PMCF study described in Section 9.2. It is listed in both contexts because it serves dual purposes: as standalone clinical evidence and as a formal PMCF obligation under the Quality Management System.

2.1 Standalone AI performance studies

Scroll right to read full table

Study Type Population Key Findings Model / Notes
Zhu et al. 2023
Chinese Journal of Dermatology
DOI: 10.35541/cjd.20220925
PMID: 37032592
Prospective, independent, peer-reviewed n=920, Chinese population, hospital dermatology (Zhengzhou) Top-1 sensitivity: 41.8% (95% CI: 24.2–59.5%)
Top-3 sensitivity: 65.8% (95% CI: 51.9–79.6%)
Specificity: 96.8% / 91.5%
Overall accuracy: 92.9% / 89.9%
κ = 0.420 / 0.464
In-distribution cases: 871/920 (94.7%)
Best performer: Acne 83.1% top-1; Worst: Eczema 2.0% top-1
Primary validation on Asian population (FST III-IV)
V2.1 (68 conditions)
Data collected: Feb 16 – Jul 4, 2022
Published: Aug 8, 2023
Ethics: 2022-KY-0268
Definitive performance study. Strongest peer-reviewed evidence.
Lu Feng et al. 2022
Henan Medical Research
DOI: 10.3969/j.issn.1004-437X.2022.08.010
Prospective, peer-reviewed (regional Chinese journal) n=761, Chinese population, hospital dermatology (Zhengzhou) Top-1 sensitivity: 53.4% (95% CI: 36.4–70.4)
Specificity: 96.3% (95% CI: 90.8–100.0)
Accuracy: 92.3% (95% CI: 86.3–98.2)
NHSO rejection: 22/30 (73.3%) correctly identified as non-skin
NHS photos: All 20 identified as skin
Skin detection cutoff: ~83 cm average
Same research group as Zhu 2023. Forms two-paper validation series.
V2.0 (44 conditions, ~60K training images)
Data collected: Feb – Aug 2021
Published: Apr 2022
Received: Feb 18, 2022
Preliminary study from same centre as Zhu 2023. Only study testing non-skin object rejection.
Escalé-Besa et al. 2023
Nature Scientific Reports
DOI: 10.1038/s41598-023-31340-1
Prospective, real-world, peer-reviewed n=100, Spanish primary care (Catalonia/CatSalut), consecutive patients. Note: 18/100 cases had conditions outside Autoderm’s trained scope. Analysis 1: All patients (n=100, including 18 out-of-scope conditions):
Top-1: AI 39% / GP 64% / Dermatologist 72%
Top-3: AI 61% / GP 76% / Dermatologist 90%
Top-5: AI 72%
Specificity: AI 98% / GP 99% / Dermatologist 99%

Analysis 2: In-scope conditions only (n=82):
Top-1: AI 48%
Top-3: AI 75% (comparable to GP Top-3 of 76%)
Top-5: AI 89% (comparable to Dermatologist Top-3 of 90%)
Specificity: 98–99%
92% GP satisfaction as decision support tool
60% aided correct GP final diagnosis
34% dermatology referrals avoidable

V2.0 (V2.3.0 current and is expected to perform better)
Cornerstone publication. The n=82 in-scope analysis is the primary performance benchmark for regulatory purposes: AI matched GPs at Top-3 and matched dermatologists at Top-5. The all-100 analysis demonstrates real-world robustness, no dangerous false positives on unfamiliar conditions.
Zaar et al. 2020
Acta Dermato-Venereologica
Sahlgrenska University Hospital
Retrospective, external, peer-reviewed n=521 images, 26 diagnoses, Sweden (FST I-II predominantly) Top-1 accuracy: 22.8%
Top-5 accuracy: 56.4%
Baseline benchmark for algorithm progression.
Tested V0.1 (MVP, trained on ~63K images)
V0.1 (MVP year 2018)
~63K training images
Same model as Kamulegeya 2023. Enables direct cross-population comparison.
Kamulegeya et al. 2023
African Health Sciences
PMC10782289
Retrospective observational, external, peer-reviewed n=123, Fitzpatrick Skin Type VI exclusively, Uganda (TMCG telehealth platform). Median age 23 yrs. Top-5 retrieval rate: 17% (21/123 cases)
Note: percentages are distribution scores, not standard diagnostic accuracy metrics.
Best-performing condition: Dermatitis (80% top-1 retrieval)
Worst-performing: Fungal diseases (0%), most prevalent condition in dataset
Performance higher in females vs males across conditions
V0.1 / classification system v0.1_33 skin diseases
~63K training images (~5–10% dark skin)
Same model as Zaar 2020. Together these two studies directly demonstrate the V0.1 FST bias: 56.4% top-5 (Sweden, FST I-II) vs 17% retrieval (Uganda, FST VI).
Motivated Fitzpatrick17K bias evaluation and dataset diversification in later versions.

3. Exploratory diversity evaluation

The following study was conducted as an external exploratory evaluation on Fitzpatrick Skin Type VI (African dark skin) populations using V0.1 of the algorithm. It is classified separately from the peer-reviewed performance studies due to important methodological differences in how accuracy was measured.

Scroll right to read full table

Study Type Population Key Findings & Interpretation Model / Notes
Kamulegeya et al. 2023
African Health Sciences
PMC10782289
Data collected: Jan–Mar 2018
Published: 2023
Retrospective observational, external, peer-reviewed n=123, Fitzpatrick Skin Type VI exclusively, Uganda (TMCG telehealth platform). Median age 23 yrs. Top-5 retrieval rate: 17% (21/123 cases)

IMPORTANT METHODOLOGICAL NOTE: The percentages reported in this study are distribution scores (similarity scores per condition that do not sum to 100%), not standard diagnostic accuracy metrics. The 17% figure represents cases where the correct diagnosis appeared anywhere in the ranked top-5 list — a retrieval rate, not sensitivity/specificity.

The paper’s comparison to 69.9% on ‘Caucasian skin’ refers to Autoderm’s own stated training performance — not a finding of this study.

Best-performing condition: Dermatitis (80% top-1 retrieval)
Worst-performing: Fungal diseases (0%), most prevalent condition in dataset
Performance higher in females vs males across conditions

V0.1 / classification system v0.1_33 skin diseases
~63K training images (~5–10% dark skin)

Same model as Zaar 2020. Together these two studies directly demonstrate the V0.1 FST bias: 56.4% top-5 (Sweden, FST I-II) vs 17% retrieval (Uganda, FST VI).

Motivated Fitzpatrick17K bias evaluation and dataset diversification in later versions.


Methodological note on kamulegeya et al. 2023

This study used Autoderm’s distribution scores (per-condition similarity scores that do not sum to 100%) as the basis for ranking diagnoses. The scoring system recorded the rank position (1–5) of the correct diagnosis in the top-5 output list. The resulting 17% figure is therefore a top-5 retrieval rate, the proportion of cases where the correct diagnosis appeared anywhere in the ranked output, not a standard sensitivity or specificity metric. This methodology differs fundamentally from the top-1/top-3 sensitivity metrics used in Zhu 2023, Lu 2022, and the accuracy metrics in Zaar 2020. Figures from this study should not be directly compared to those from other studies in this portfolio without this caveat clearly stated. The paper’s reference to ‘69.9% on Caucasian skin’ was Autoderm’s own stated training performance figure used as a benchmark, it was not a finding of this study.

Strategic Value

Despite these methodological limitations, this study holds significant strategic and regulatory value. Together with Zaar 2020, it provides the only direct same-model (V0.1) cross-population comparison in the portfolio: 56.4% top-5 retrieval (Sweden, FST I-II) versus 17% top-5 retrieval (Uganda, FST VI). This contrast directly motivated the Fitzpatrick17K bias evaluation and subsequent dataset diversification in V2.x, forming a compelling ‘identified problem → implemented solution → validated improvement’ narrative for regulators and enterprise partners.

3.5 Standalone AI performance: White papers

Why White Papers?

The pace of AI model development frequently outstrips the peer-review publication cycle, which typically takes 12–24 months from submission to publication. By the time a study is published, the model under evaluation may already have been superseded by a newer version with improved performance. Autoderm therefore supplements its peer-reviewed evidence base with white papers that provide timely performance data on current or recent model versions, while peer-reviewed publications for the same studies are pursued in parallel where appropriate.

Scroll right to read full table

Study Type Population Key Findings Model / Notes
Coachella Study
Jayawickrama, Charalambous & Börve
May 2025
Retrospective, white paper n=91 images, 40 different skin conditions, First Derm platform Top-1 accuracy: 53%
Top-3 accuracy: 84%
Top-5 accuracy: 93%
Treatment accuracy: 95%
Reference standard: Board-certified dermatologists via First Derm platform (93% accuracy)
V2.2
Data collected: 2024 (exact dates TBC)
Published: May 29, 2025
Top-5 matches dermatologist-level accuracy. Treatment accuracy (95%) exceeds diagnostic accuracy.

4. Human-AI reader studies

Three reader studies demonstrate that clinicians and students using Autoderm as a decision support tool consistently outperform those working without AI assistance. All three studies show the same directional result: AI + clinician > clinician alone.

Scroll right to read full table

Study Type Population Key Findings Model / Notes
UK GP Reader Study
Boots UK (Nov 2024)
White paper
Reader study, randomized control l vs AI-assisted groups. PMCF study. n=10 UK GPs (8–25 yrs experience), 20 cases, V2.3 GP+AI: 69% top-1, 81% top-3
GP alone: 48% top-1, 74% top-3
Referrals reduced: 37→22 (-40%)
Time per case significantly reduced
100% would use AI in practice
100% said AI helped with differentials
V2.2
Completed PMCF study.
Also serves as PMCF Study 2 (see Section 9.2).
Medical Students Study
Sept 2024
White paper
Reader study, white paper n=20 4th-year medical students, 20 cases, V2.2 AI improved accuracy in 14/20 cases (70%)
Top-3 improved in 15/20 cases (75%)
Management decision improved in 16/20 (75%)
Time reduced ~40%
90% would use AI in practice
V2.2
Demonstrates utility for non-specialist users and educational settings.

4. Human-AI reader studies

Three reader studies demonstrate that clinicians and students using Autoderm as a decision support tool consistently outperform those working without AI assistance. All three studies show the same directional result: AI + clinician > clinician alone.

Scroll right to read full table

Study Type Population Key Findings Model / Notes
UK GP Reader Study
Boots UK (Nov 2024)
White paper
Reader study, randomized control l vs AI-assisted groups. PMCF study. n=10 UK GPs (8–25 yrs experience), 20 cases, V2.3 GP+AI: 69% top-1, 81% top-3
GP alone: 48% top-1, 74% top-3
Referrals reduced: 37→22 (-40%)
Time per case significantly reduced
100% would use AI in practice
100% said AI helped with differentials
V2.2
Completed PMCF study.
Also serves as PMCF Study 2 (see Section 9.2).
Medical Students Study
Sept 2024
White paper
Reader study, white paper n=20 4th-year medical students, 20 cases, V2.2 AI improved accuracy in 14/20 cases (70%)
Top-3 improved in 15/20 cases (75%)
Management decision improved in 16/20 (75%)
Time reduced ~40%
90% would use AI in practice
V2.2
Demonstrates utility for non-specialist users and educational settings.

5. Aggregate performance summary

The table below synthesises key performance metrics across all comparable studies. Note that direct numerical comparisons should be interpreted with caution given differences in study design, population, conditions evaluated, and model versions tested.

Scroll right to read full table

Metric Range Across Studies Notes
Top-1 accuracy (standalone AI) 22.8% – 53.4% V0.1 to V2.x; varies by population, setting, and conditions tested. Escalé-Besa 2023: 39% all-100 / 48% in-scope n=82. Coachella 2025: 53% (V2.2). Lu 2022: 53.4% (V2.0)
Top-3 accuracy (standalone AI) 61% – 84% Coachella 2025: 84%. Escalé-Besa 2023 n=82: 75% — comparable to GP top-3 (76%). Zhu 2023: 65.8%
Top-5 accuracy (standalone AI) 56.4% – 93% Coachella 2025: 93% (matches dermatologist accuracy). Escalé-Besa 2023 n=82: 89%. Zaar 2020 V0.1: 56.4%
Specificity 91.5% – 96.8% Zhu 2023; consistently high across conditions. Lu 2022: 96.3%
Treatment accuracy (standalone AI) 95% Coachella 2025: even when top-1 diagnosis incorrect, AI treatment recommendation correct in 95% of cases
Top-1 accuracy (GP+AI vs GP alone) 48%→69% (UK GP Study 2024) Human-AI team consistently outperforms clinician alone. 60% aided correct clinical assessment (Escalé-Besa 2023). Standalone AI (n=82) matched GP Top-3 and dermatologist Top-3 at Top-5.
Referral reduction 34% – 40% Escalé-Besa 2023 (34%) and UK GP Study 2024 (40%)
Time savings 40% – 70% Across reader studies
Clinician acceptance 90% – 100% would use in practice Across all reader studies


Regulatory framing: Escalé-Besa 2023 dual analysis

The Escalé-Besa 2023 study contains two complementary analyses that serve distinct regulatory arguments. The all-patients analysis (n=100) demonstrates real-world robustness: even when presented with 18 conditions outside its trained scope, Autoderm maintained 98% specificity and did not generate dangerous false positives. The in-scope analysis (n=82) reflects true clinical performance within the intended use, conditions the algorithm was designed to screen, where Autoderm matched GPs at Top-3 (AI 75% vs GP 76%) and matched dermatologists at Top-3 when assessed at Top-5 (AI 89% vs dermatologist 90%). For FDA and MDR submissions, the n=82 analysis is the primary clinical performance benchmark; the all-100 analysis supports the safety and robustness argument. Both were conducted on V2.0; V2.3.0 is expected to perform better.

6. Real-world safety record

Autoderm has operated continuously since 2018. Post-market surveillance data represents the strongest real-world safety evidence among comparable AI dermatology devices at MDR Class IIa level.

Scroll right to read full table

Safety Indicator Data Source / Period
Total API calls 2,000,000+ Cumulative since 2018
Adverse events reported Zero Post-market surveillance 2018–present
Fatalities or morbidity attributable to device None PMS vigilance framework; MHRA, BfArM, MAUDE database review
User dissatisfaction rate 18% Primarily attributed to image quality issues, not algorithm output errors
Continuous operation since 2018 6+ years uninterrupted deployment

The device is intended for use as a clinical decision support tool (CDST) under healthcare worker oversight, not as a standalone diagnostic system. All output is presented as a ranked list of possible skin conditions rather than definitive diagnoses, mitigating risk of over-reliance. Residual risks include false negatives for serious conditions (e.g. melanoma ~5% miss rate in top-5) and image quality dependency, both addressed in the Instructions for Use (IFU V3.11).

7. Algorithm progression

Autoderm’s algorithm has undergone documented continuous development from V0.1 (2017 MVP) to V2.3.0 (current production version). Each major version introduced expanded training data, improved condition coverage, and measurable accuracy gains validated by external studies.

Scroll right to read full table

Scroll right to read full table

Version Period Key Characteristics Validated By
V0.1 / v33.1 2017–2019 MVP. ~63K training images. 33 conditions. ~5–10% dark skin images. Zaar et al. 2020 (Sweden, n=521); Kamulegeya et al. 2023 (Uganda, n=123)
V1.0 / V1.1 2019–2021 Expanded training set. Improved condition coverage. Internal validation
V2.0 2021–2022 Major architecture update. Expanded to 70+ conditions. Fitzpatrick diversity improvements. Escalé-Besa et al. 2023 (Spain, n=100); Lu Feng et al. 2022 (China, n=761)
V2.1 / V2.2 2022–2023 Continued dataset expansion. ~150K annotated images. Performance improvements. Medical Students Study 2024 (V2.2); Coachella Study 2025 (V2.2, n=91)
V2.3.0 / V2.1 2023–present Current production version. 70+ conditions. 150K+ annotated images. Fitzpatrick17K bias evaluation completed (26 conditions, FST I-VI). Zhu et al. 2023 (China, n=920, tested V2.1); UK GP Reader Study 2024 (V2.3); myGP PMS Report 2024 (V2.3)

Training data note: The current V2.3.0 model is trained on 150,000+ dermatologist-annotated images from a proprietary dataset of 3M+ real-world smartphone images (100K new images per month). Annotations follow a consensus protocol requiring agreement from at least 2 of 3 independent board-certified dermatologists per image. Expanded training with ~220,680 images including augmentation has also been completed.

8. Demographic & diversity evidence

Autoderm’s portfolio includes validation data across five Fitzpatrick skin type categories (FST I-II, III-IV, VI) and six geographies (Sweden, China, Spain, UK, Uganda, USA), providing unusually broad demographic coverage for a Class IIa SaMD.

Scroll right to read full table

Population / Skin Type Study Finding Model Version
FST I-II (Caucasian, Sweden) Zaar et al. 2020 Top-1: 22.8%, Top-5: 56.4%, baseline benchmark for V0.1 V0.1 (~63K images)
FST III-IV (Asian, China) Zhu et al. 2023 (n=920) Top-1: 41.8%, Top-3: 65.8%, Specificity: 96.8%. Primary validation for V2.x. V2.1 (68 conditions)
FST III-IV (Asian, China) Lu Feng et al. 2022 (n=761) Top-1: 53.4%, Specificity: 96.3%, Accuracy: 92.3%. Earlier validation from same centre. V2.0 (44 conditions)
FST VI (Dark skin, Uganda) Kamulegeya et al. 2023 (n=123) Top-5 retrieval rate: 17% on V0.1. Note: uses distribution scores, not standard accuracy metrics. Directly motivated dataset diversification. (See Section 3 for full methodological note.) V0.1 (~63K images, ~5–10% dark skin)
FST I-VI (All skin types) Fitzpatrick17K Bias Evaluation (internal) 26 conditions evaluated across FST I-VI. No systemic bias detected. Bias is condition-specific, not uniform. Most conditions cluster near zero bias. V2.3.0 (current)
FST III-V (Mixed, Spain) Escalé-Besa et al. 2023 (n=100) Real-world Catalonian primary care population. 92% GP satisfaction. V2.0

Outstanding gap: An absolute per-FST-type sensitivity/specificity table for V2.3.0 remains desirable for the CER but is not a blocking requirement given the Fitzpatrick17K evaluation already completed.

9. Post-market clinical follow-up (PMCF)

Autoderm has an established PMCF programme under its ISO 13485:2021 Quality Management System, comprising three completed formal PMCF studies, an ongoing PMS vigilance monitoring framework, and a pipeline of additional planned studies.

9.1 PMCF Study 1 - Visiba Care (September 2021)

Scroll right to read full table

Field Details
Report date September 2021
Product version Autoderm V2.0 (44 skin disease classes, 20 evaluated)
Observations (n) 1,092 real-world clinical observations
Reference standard GP diagnoses from patient journals
Geographic scope Sweden, Norway, Finland, United Kingdom (Visiba Care platform)
Regulatory framework EU MDR Annex XIV PMCF requirements


Performance Results (Real-World, n=1,092)

Metric Recall Notes
Top-1 recall ~42% Single best match correct
Top-3 recall ~64% Correct diagnosis in top 3 results
Top-5 recall ~75% Correct diagnosis in top 5 (intended use output)


Notable per-condition performance (Top-5 recall)

  • Actinic Keratosis (L57.0): approaching 100% top-5 recall
  • Rosacea (L71.9): approaching 100% top-5 recall
  • Perioral Dermatitis (L71.0): approaching 100% top-5 recall
  • Seborrhoeic Keratosis (L82.9): ~90% top-5 recall
  • Borrelia/Lyme (A69.2): ~85–90% top-5 recall, relevant for FDA De Novo target condition
  • Pityriasis Versicolor (B36.0): ~85–90% top-5 recall
  • Vitiligo (L80.9): ~85–90% top-5 recall

Performance context

Pre-V0.1 closed-set testing (Visiba 2019 contract) documented 49.3% top-1, 70.1% top-3, 81.7% top-5. The real-world V2.0 figures (~42% top-1, ~75% top-5) reflect open-set clinical conditions where the correct diagnosis may fall outside the 44 evaluated conditions. The difference between closed-set and real-world performance is well-characterised in the AI/ML literature and expected.

Calibration analysis

A calibration plot comparing raw model output against a calibrated version demonstrated that raw model outputs over-predict (observed probability consistently exceeds predicted probability). This analysis formally supports the classification of Autoderm’s output percentages as distribution scores across ranked skin disease classes, not calibrated probability estimates. This framing is critical for both MDR CER accuracy and FDA De Novo CDS classification.

Healthcare professional feedback

GPs using Autoderm within Visiba Care reported satisfaction with diagnostic accuracy. No complaints about missing skin diseases were received. GPs requested addition of 7 new conditions (predominantly paediatric): Chickenpox (B01.9), Scarlet fever (A38.9), Hand/foot/mouth disease (B08.4), Measles (B05.9), Viral exanthema (L27.1), Fifth disease (B08.3), and Erysipelas (A46.9).

PMCF conclusion

The clinical data confirmed the overall benefit-risk profile of Autoderm V2.0 was satisfactory. No patient safety concerns were identified. No changes to the PMCF plan were deemed necessary.

9.2 PMCF study 2: Boots UK GP survey (November 2024)

Field Details
Report date November 2024
Product version Autoderm V2.3 (current generation)
Study design Randomized: Group A (control, n=5 GPs, no AI) vs Group B (study, n=5 GPs, with AI top-5)
Participants n=10 UK GPs (8–25 years clinical experience)
Cases 20 real patient cases from First Derm platform; consensus diagnoses from board-certified dermatologists
Geography United Kingdom (Boots UK)
Citation status White paper. Also listed as white paper #6 in Appendix (same study serves dual evidence purpose).


Performance results

Metric Without AI (Group A) With AI (Group B) Change
Top-1 diagnostic accuracy 48% 69% +21 percentage points
Top-3 diagnostic accuracy 74% 81% +7 percentage points
Dermatology referrals 37 cases 22 cases -40%
Consultation time Baseline Reduced ~60% -60%
Management recommendation accuracy Similar Similar No change


User acceptance (Study group)

  • 100% of GPs reported AI helped them consider additional differentials
  • 100% would use Autoderm in clinical practice
  • 40% felt AI assistance alone would have been sufficient for consultation resolution


User acceptance (Study group)

The study confirms Autoderm’s intended use as a clinical decision support tool improves primary care diagnostic performance and workflow efficiency in real-world UK general practice settings.

9.3 Post-market surveillance report: myGP UK (May 2024)

Field Details
Report date June 11, 2024
Product version Autoderm V2.3 (current generation)
Data period May 2024 (single month snapshot from ongoing deployment)
Analyses (n) 18,359 skin image analyses in May 2024 alone
Cumulative volume 370,000+ skin images screened since integration
Daily usage ~500 individuals per day
Platform myGP app by iPlato (Huma company), 3M+ subscribers
Geography United Kingdom


Disease Distribution (May 2024)

Top conditions flagged: Seborrhoeic Keratosis (350 cases), Actinic Keratosis (258), Lentigo (236), Atypical Melanocytic Nevus (205), Basal Cell Carcinoma (202), Dermatofibroma (192). Malignant tumours comprised 8.3% of all results.

Melanoma screening

Autoderm flagged 79 cases as possible melanoma during May 2024 (approximately 2–3 per day). With approximately 46 melanomas diagnosed daily in the UK, this suggests approximately 5% of UK melanoma cases may have been first signposted via the myGP app, demonstrating the potential role of AI screening in raising early awareness at population scale.

Key findings

The disease distribution in the myGP consumer population (35.2% inflammatory, 32.2% benign tumours, 22.9% infectious, 8.3% malignant, 1.4% genital) is consistent with expected population-level skin disease epidemiology. The over-representation of benign tumours such as seborrhoeic keratosis reflects a population of health-conscious users seeking reassurance, directly supporting Autoderm’s intended use as a signposting tool that builds patient awareness and directs them to appropriate care.

9.4 User testimonials

Technical Integration Clinical Practice
Pardeep Kaushik
Chief Technology Officer, MedinyX Technologies GmbH
“The API is well documented, our engineers took a few hours to integrate it, any updates you guys come with, takes us less than 10 minutes to deploy.”
Dr Ulf Österstad
Operations Manager, Bra Liv nära, Sweden
“Autoderm is a perfect example of how AI should support providers. It works instantly and highlights possibilities I may not have initially considered, helping guide our clinical workflow.”

9.5 PMS vigilance monitoring framework

As part of the ISO 13485:2021 QMS, Autoderm has established a programme of post-market surveillance to identify, investigate, and reduce to an acceptable level any risks associated with Autoderm’s quality and performance. Ongoing PMS vigilance monitoring analyses clinical data to:

  • Confirm the safety and performance of Autoderm
  • Confirm the continued acceptability of identified risks
  • Identify previously unknown side-effects or emerging risks
  • Ensure the continued acceptability of the benefit-risk ratio
  • Identify possible systematic misuse or off-label use

Adverse event database searches

Section 11 of the CER summarises adverse event searches across: MHRA Medical Device Alerts and Field Safety Notices (UK); BfArM (Germany); FDA MAUDE database (US); and ClinicalTrials.gov. No alerts related to the safety of Autoderm were identified across any database.

9.6 Planned PMCF studies

Additional PMCF studies comparing primary care professionals with and without Autoderm are planned. Data from these studies will be presented in the next CER update and will be used to:

  • Gather clinical data derived from use of Autoderm according to its intended use
  • Further substantiate the clinical benefit of the device
  • Gather usability feedback from end users (lay persons, GPs, and dermatologists) to identify new risks
  • Gather real clinical data on less common dermatological conditions

10. Regulatory status & certifications

Scroll right to read full table

Designation Details Significance
CE Marking, MDD Class I (legacy, transitioning to MDR Class IIa) EU MDD 93/42/EEC, Class I legacy device. Transitioning to MDR 2017/745, Class IIa. ISO 13485:2021 certified. Enables commercial deployment across EU/EEA under MDD legacy provisions. MDR Class IIa technical file submission planned for 2026.
FDA Breakthrough Device Designation Granted for AI-powered dermatology screening. Expedited FDA review pathway. Accelerated US market entry. Signals FDA recognition of clinical unmet need.
ISO 13485:2021 Quality Management System. IEC 62304 software lifecycle compliance. Foundation for all regulatory submissions.

11. Evidence strength assessment

Scroll right to read full table

Overall Assessment: SUFFICIENT for MDR Class IIa transition, subject to addressing documented gaps.

Evidence Category Status Rationale
Clinical performance data Available 5 peer-reviewed studies (Zhu 2023, Lu 2022, Escalé-Besa 2023, Zaar 2020, Kamulegeya 2023) + 5 white papers providing consistent performance data across 6 countries.
Human-AI team performance Available 3 reader studies (CatSalut Spain, UK GPs, Medical Students) all demonstrate AI + clinician outperforms clinician alone. Directly supports CDST intended use.
Safety data Available 2M+ API calls, zero adverse events since 2018. Strongest post-market safety evidence among comparable AI dermatology devices.
Demographic diversity Available Fitzpatrick17K bias evaluation (26 conditions, FST I-VI) — no systemic bias detected. Asian validation (Zhu 2023, n=920). Exploratory African evaluation (Kamulegeya 2023, V0.1). International studies across Spain, Sweden, China, UK, Uganda.
Algorithm progression Available Documented V0.1 (2017) → V2.3.0 (current) with measurable accuracy improvements at each version. V0.1 bias on dark skin identified and addressed.
PMCF framework Available Three completed PMCF studies: Visiba Care 2021 (n=1,092, formal MDR Annex XIV report), Boots UK GP Survey 2024 (n=10 GPs, randomized), and myGP PMS Report 2024 (18,359 analyses, 370K+ cumulative). PMS vigilance framework active. Zero adverse events across all database searches (MHRA, BfArM, MAUDE). Additional studies planned.

The two remaining critical gaps for MDR submission are: (1) intended purpose language reconciliation between IFU V3.11 and the 2022 CER, a documentation decision rather than an evidence deficiency; and (2) the equivalence pathway pivot, Autoderm should proceed on the basis of its own clinical data rather than the weak SkinVision equivalence claim in the 2022 CER. Both gaps are on track for resolution by June 2026 per the MDR Evidence Mapping roadmap.

Appendix: Full publication & study references

Scroll right to read full table

Peer-reviewed publications

  1. Zhu Y, Lu F, Syed Mohammad Nooruddin M, Liu X, Li X, Yu J, Dong H. Evaluation of the performance of a 34-layer ResNet model-based artificial intelligence application, in the diagnosis of skin diseases. Chinese Journal of Dermatology. 2023. DOI: 10.35541/cjd.20220925. PMID: 37032592. Data collected: Feb 16 – Jul 4, 2022.
  2. Lu Feng, Liu Xin, Zhu Yajie, Li Xiaohong, Yu Jianbin, Dong Huiting. Accuracy of Artificial Intelligence Multi-class Algorithm in the Diagnosis of Common Skin Diseases. Henan Medical Research. 2022;31(8):1387-1392. DOI: 10.3969/j.issn.1004-437X.2022.08.010. Data collected: Feb – Aug 2021.
  3. Escalé-Besa A et al. Using artificial intelligence to improve the diagnostic and management capabilities of primary care physicians in skin lesions with a risk of malignancy. Nature Scientific Reports. 2023. DOI: 10.1038/s41598-023-31340-1
  4. Zaar O et al. Comparison of artificial intelligence and dermatologists for the diagnosis of skin disease. Acta Dermato-Venereologica. 2020.
  5. Kamulegeya L et al. Using artificial intelligence on dermatology conditions in Uganda: a case for diversity in training data sets for machine learning. African Health Sciences. 2023;23(2):753-763. PMC10782289. DOI: 10.4314/ahs.v23i2.86

White papers

  1. Boots UK GP Reader Study. Börve A et al. Autoderm Inc / Boots UK. November 2024. Autoderm V2.3. Randomized reader study, n=10 UK GPs, 20 cases. White paper. (Same study as PMCF Study 2, Section 9.2.)
  2. Medical Students Reader Study. September 2024. White paper.
  3. Fitzpatrick17K Bias Evaluation. Internal study. Autoderm V2.3.0. 26 conditions, FST I-VI. 2024.
  4. Jayawickrama M, Charalambous K, Börve A. AI dermatology in action: How its diagnostic accuracy compares to dermatologists (Coachella Study). White paper. May 29, 2025. Autoderm V2.2, n=91, 40 conditions.
  5. Börve A. Post Market Surveillance Report: myGP app. White paper. June 11, 2024. Autoderm V2.3, n=18,359 analyses (May 2024), 370K+ cumulative screened.

PMCF studies

  1. Visiba Care Post-Market Clinical Follow-Up Report. iDoc24 AB / Visiba Group AB. September 2021. Autoderm V2.0, n=1,092 observations, 20 ICD-10 coded conditions. EU MDR Annex XIV.
  2. Boots UK GP Survey: see white paper #6 above (same study).

Historical partnership evidence

  1. Cooperation Agreement: iDoc24 AB / Medicoo Svenska AB (April 2018). First commercial deployment, pre-V0.1, Sweden.
  2. Service Trial Agreement & DPA: iDoc24 Inc / Visiba Group AB (December 2019). Pre-V0.1 production model, Sweden/Norway/Finland/UK.