Clinical evidence portfolio

Five peer-reviewed publications. Evidence across six countries. 2M+ API calls with zero adverse events. Built for MDR Class IIa, and for the scrutiny that comes with it.

1. Executive summary

Autoderm is an AI-powered dermatology decision support API that analyses smartphone images to screen for 70+ skin conditions, returning results in under 1 second. The software has been researched since 2018 and has processed over 2 million API calls across deployments with Boots UK, DocMorris Germany, myGP UK, and others. This document summarises Autoderm’s complete clinical evidence portfolio across nine categories: peer-reviewed performance studies, white paper performance studies, exploratory diversity evaluations, human-AI reader studies, real-world safety data, algorithm progression, demographic evidence, regulatory compliance, and evidence strength assessment.

Intended use

The Autoderm AI as a medical device is intended to be integrated into third party services for either of the following purposes

a decision support tool for healthcare workers to enhance decision making & diagnoses in clinical workflows
a skin analytics tool on skin diseases to be used as a search engine, symptom checker (assessment) or educational tool on skin disease and help find the right information for the possible skin disease to make informed decisions

Intended users

Healthcare professionals including, but not limited, doctors, pharmacists, nurses
Laypersons

Overall assessment

Autoderm’s clinical evidence portfolio is sufficient to transition from MDD Class I to MDR Class IIa. The combination of five peer-reviewed publications across five countries, five white papers providing reader studies, standalone performance validation, bias evaluation, and real-world deployment data, 2M+ API calls with zero adverse events, and FDA Breakthrough Device designation places Autoderm among the best-evidenced AI dermatology devices at Class IIa level.This document summarises Autoderm’s complete clinical evidence portfolio across nine categories: peer-reviewed performance studies, white paper performance studies, exploratory diversity evaluations, human-AI reader studies, real-world safety data, algorithm progression, demographic evidence, regulatory compliance, and evidence strength assessment.

Regulatory Note on White Papers

The Boots UK GP Reader Study (listed as white paper #6 below) is the same study as the Boots UK PMCF study described in Section 9.2. It is listed in both contexts because it serves dual purposes: as standalone clinical evidence and as a formal PMCF obligation under the Quality Management System.

5	5	2M+	6	0
Peer-Reviewed Studies	White Papers	API Calls	Countries	Adverse Events

2. Peer-reviewed publications

2.1 Standalone AI performance studies

Scroll right to read full table

Study	Type	Population	Key Findings	Model / Notes
Zhu et al. 2023 Chinese Journal of Dermatology DOI: 10.35541/cjd.20220925 PMID: 37032592	Prospective, independent, peer-reviewed	n=920, Chinese population, hospital dermatology (Zhengzhou)	Top-1 sensitivity: 41.8% (95% CI: 24.2–59.5%) Top-3 sensitivity: 65.8% (95% CI: 51.9–79.6%) Specificity: 96.8% / 91.5% Overall accuracy: 92.9% / 89.9% κ = 0.420 / 0.464 In-distribution cases: 871/920 (94.7%) Best performer: Acne 83.1% top-1; Worst: Eczema 2.0% top-1 Primary validation on Asian population (FST III-IV)	V2.1 (68 conditions) Data collected: Feb 16 – Jul 4, 2022 Published: Aug 8, 2023 Ethics: 2022-KY-0268 Definitive performance study. Strongest peer-reviewed evidence.
Lu Feng et al. 2022 Henan Medical Research DOI: 10.3969/j.issn.1004-437X.2022.08.010	Prospective, peer-reviewed (regional Chinese journal)	n=761, Chinese population, hospital dermatology (Zhengzhou)	Top-1 sensitivity: 53.4% (95% CI: 36.4–70.4) Specificity: 96.3% (95% CI: 90.8–100.0) Accuracy: 92.3% (95% CI: 86.3–98.2) NHSO rejection: 22/30 (73.3%) correctly identified as non-skin NHS photos: All 20 identified as skin Skin detection cutoff: ~83 cm average Same research group as Zhu 2023. Forms two-paper validation series.	V2.0 (44 conditions, ~60K training images) Data collected: Feb – Aug 2021 Published: Apr 2022 Received: Feb 18, 2022 Preliminary study from same centre as Zhu 2023. Only study testing non-skin object rejection.
Escalé-Besa et al. 2023 Nature Scientific Reports DOI: 10.1038/s41598-023-31340-1	Prospective, real-world, peer-reviewed	n=100, Spanish primary care (Catalonia/CatSalut), consecutive patients. Note: 18/100 cases had conditions outside Autoderm’s trained scope.	Analysis 1: All patients (n=100, including 18 out-of-scope conditions): Top-1: AI 39% / GP 64% / Dermatologist 72% Top-3: AI 61% / GP 76% / Dermatologist 90% Top-5: AI 72% Specificity: AI 98% / GP 99% / Dermatologist 99% Analysis 2: In-scope conditions only (n=82): Top-1: AI 48% Top-3: AI 75% (comparable to GP Top-3 of 76%) Top-5: AI 89% (comparable to Dermatologist Top-3 of 90%) Specificity: 98–99% 92% GP satisfaction as decision support tool 60% aided correct GP final diagnosis 34% dermatology referrals avoidable	V2.0 (V2.3.0 current and is expected to perform better) Cornerstone publication. The n=82 in-scope analysis is the primary performance benchmark for regulatory purposes: AI matched GPs at Top-3 and matched dermatologists at Top-5. The all-100 analysis demonstrates real-world robustness, no dangerous false positives on unfamiliar conditions.
Zaar et al. 2020 Acta Dermato-Venereologica Sahlgrenska University Hospital	Retrospective, external, peer-reviewed	n=521 images, 26 diagnoses, Sweden (FST I-II predominantly)	Top-1 accuracy: 22.8% Top-5 accuracy: 56.4% Baseline benchmark for algorithm progression. Tested V0.1 (MVP, trained on ~63K images)	V0.1 (MVP year 2018) ~63K training images Same model as Kamulegeya 2023. Enables direct cross-population comparison.
Kamulegeya et al. 2023 African Health Sciences PMC10782289	Retrospective observational, external, peer-reviewed	n=123, Fitzpatrick Skin Type VI exclusively, Uganda (TMCG telehealth platform). Median age 23 yrs.	Top-5 retrieval rate: 17% (21/123 cases) Note: percentages are distribution scores, not standard diagnostic accuracy metrics. Best-performing condition: Dermatitis (80% top-1 retrieval) Worst-performing: Fungal diseases (0%), most prevalent condition in dataset Performance higher in females vs males across conditions	V0.1 / classification system v0.1_33 skin diseases ~63K training images (~5–10% dark skin) Same model as Zaar 2020. Together these two studies directly demonstrate the V0.1 FST bias: 56.4% top-5 (Sweden, FST I-II) vs 17% retrieval (Uganda, FST VI). Motivated Fitzpatrick17K bias evaluation and dataset diversification in later versions.

3. Exploratory diversity evaluation

The following study was conducted as an external exploratory evaluation on Fitzpatrick Skin Type VI (African dark skin) populations using V0.1 of the algorithm. It is classified separately from the peer-reviewed performance studies due to important methodological differences in how accuracy was measured.

Scroll right to read full table

Study

Type

Population

Key Findings & Interpretation

Model / Notes

Kamulegeya et al. 2023
African Health Sciences
PMC10782289
Data collected: Jan–Mar 2018
Published: 2023

Retrospective observational, external, peer-reviewed

n=123, Fitzpatrick Skin Type VI exclusively, Uganda (TMCG telehealth platform). Median age 23 yrs.

Top-5 retrieval rate: 17% (21/123 cases)

IMPORTANT METHODOLOGICAL NOTE: The percentages reported in this study are distribution scores (similarity scores per condition that do not sum to 100%), not standard diagnostic accuracy metrics. The 17% figure represents cases where the correct diagnosis appeared anywhere in the ranked top-5 list — a retrieval rate, not sensitivity/specificity.

The paper’s comparison to 69.9% on ‘Caucasian skin’ refers to Autoderm’s own stated training performance — not a finding of this study.

Best-performing condition: Dermatitis (80% top-1 retrieval)
Worst-performing: Fungal diseases (0%), most prevalent condition in dataset
Performance higher in females vs males across conditions

V0.1 / classification system v0.1_33 skin diseases
~63K training images (~5–10% dark skin)

Same model as Zaar 2020. Together these two studies directly demonstrate the V0.1 FST bias: 56.4% top-5 (Sweden, FST I-II) vs 17% retrieval (Uganda, FST VI).

Motivated Fitzpatrick17K bias evaluation and dataset diversification in later versions.

Methodological note on kamulegeya et al. 2023

This study used Autoderm’s distribution scores (per-condition similarity scores that do not sum to 100%) as the basis for ranking diagnoses. The scoring system recorded the rank position (1–5) of the correct diagnosis in the top-5 output list. The resulting 17% figure is therefore a top-5 retrieval rate, the proportion of cases where the correct diagnosis appeared anywhere in the ranked output, not a standard sensitivity or specificity metric. This methodology differs fundamentally from the top-1/top-3 sensitivity metrics used in Zhu 2023, Lu 2022, and the accuracy metrics in Zaar 2020. Figures from this study should not be directly compared to those from other studies in this portfolio without this caveat clearly stated. The paper’s reference to ‘69.9% on Caucasian skin’ was Autoderm’s own stated training performance figure used as a benchmark, it was not a finding of this study.

Strategic Value

Despite these methodological limitations, this study holds significant strategic and regulatory value. Together with Zaar 2020, it provides the only direct same-model (V0.1) cross-population comparison in the portfolio: 56.4% top-5 retrieval (Sweden, FST I-II) versus 17% top-5 retrieval (Uganda, FST VI). This contrast directly motivated the Fitzpatrick17K bias evaluation and subsequent dataset diversification in V2.x, forming a compelling ‘identified problem → implemented solution → validated improvement’ narrative for regulators and enterprise partners.

3.5 Standalone AI performance: White papers

Why White Papers?

The pace of AI model development frequently outstrips the peer-review publication cycle, which typically takes 12–24 months from submission to publication. By the time a study is published, the model under evaluation may already have been superseded by a newer version with improved performance. Autoderm therefore supplements its peer-reviewed evidence base with white papers that provide timely performance data on current or recent model versions, while peer-reviewed publications for the same studies are pursued in parallel where appropriate.

Scroll right to read full table

Study	Type	Population	Key Findings	Model / Notes
Coachella Study Jayawickrama, Charalambous & Börve May 2025	Retrospective, white paper	n=91 images, 40 different skin conditions, First Derm platform	Top-1 accuracy: 53% Top-3 accuracy: 84% Top-5 accuracy: 93% Treatment accuracy: 95% Reference standard: Board-certified dermatologists via First Derm platform (93% accuracy)	V2.2 Data collected: 2024 (exact dates TBC) Published: May 29, 2025 Top-5 matches dermatologist-level accuracy. Treatment accuracy (95%) exceeds diagnostic accuracy.

4. Human-AI reader studies

Three reader studies demonstrate that clinicians and students using Autoderm as a decision support tool consistently outperform those working without AI assistance. All three studies show the same directional result: AI + clinician > clinician alone.

Scroll right to read full table

Study	Type	Population	Key Findings	Model / Notes
UK GP Reader Study Boots UK (Nov 2024) White paper	Reader study, randomized control l vs AI-assisted groups. PMCF study.	n=10 UK GPs (8–25 yrs experience), 20 cases, V2.3	GP+AI: 69% top-1, 81% top-3 GP alone: 48% top-1, 74% top-3 Referrals reduced: 37→22 (-40%) Time per case significantly reduced 100% would use AI in practice 100% said AI helped with differentials	V2.2 Completed PMCF study. Also serves as PMCF Study 2 (see Section 9.2).
Medical Students Study Sept 2024 White paper	Reader study, white paper	n=20 4th-year medical students, 20 cases, V2.2	AI improved accuracy in 14/20 cases (70%) Top-3 improved in 15/20 cases (75%) Management decision improved in 16/20 (75%) Time reduced ~40% 90% would use AI in practice	V2.2 Demonstrates utility for non-specialist users and educational settings.

4. Human-AI reader studies

Scroll right to read full table

Study	Type	Population	Key Findings	Model / Notes
UK GP Reader Study Boots UK (Nov 2024) White paper	Reader study, randomized control l vs AI-assisted groups. PMCF study.	n=10 UK GPs (8–25 yrs experience), 20 cases, V2.3	GP+AI: 69% top-1, 81% top-3 GP alone: 48% top-1, 74% top-3 Referrals reduced: 37→22 (-40%) Time per case significantly reduced 100% would use AI in practice 100% said AI helped with differentials	V2.2 Completed PMCF study. Also serves as PMCF Study 2 (see Section 9.2).
Medical Students Study Sept 2024 White paper	Reader study, white paper	n=20 4th-year medical students, 20 cases, V2.2	AI improved accuracy in 14/20 cases (70%) Top-3 improved in 15/20 cases (75%) Management decision improved in 16/20 (75%) Time reduced ~40% 90% would use AI in practice	V2.2 Demonstrates utility for non-specialist users and educational settings.

5. Aggregate performance summary

The table below synthesises key performance metrics across all comparable studies. Note that direct numerical comparisons should be interpreted with caution given differences in study design, population, conditions evaluated, and model versions tested.

Scroll right to read full table

Metric	Range Across Studies	Notes
Top-1 accuracy (standalone AI)	22.8% – 53.4%	V0.1 to V2.x; varies by population, setting, and conditions tested. Escalé-Besa 2023: 39% all-100 / 48% in-scope n=82. Coachella 2025: 53% (V2.2). Lu 2022: 53.4% (V2.0)
Top-3 accuracy (standalone AI)	61% – 84%	Coachella 2025: 84%. Escalé-Besa 2023 n=82: 75% — comparable to GP top-3 (76%). Zhu 2023: 65.8%
Top-5 accuracy (standalone AI)	56.4% – 93%	Coachella 2025: 93% (matches dermatologist accuracy). Escalé-Besa 2023 n=82: 89%. Zaar 2020 V0.1: 56.4%
Specificity	91.5% – 96.8%	Zhu 2023; consistently high across conditions. Lu 2022: 96.3%
Treatment accuracy (standalone AI)	95%	Coachella 2025: even when top-1 diagnosis incorrect, AI treatment recommendation correct in 95% of cases
Top-1 accuracy (GP+AI vs GP alone)	48%→69% (UK GP Study 2024)	Human-AI team consistently outperforms clinician alone. 60% aided correct clinical assessment (Escalé-Besa 2023). Standalone AI (n=82) matched GP Top-3 and dermatologist Top-3 at Top-5.
Referral reduction	34% – 40%	Escalé-Besa 2023 (34%) and UK GP Study 2024 (40%)
Time savings	40% – 70%	Across reader studies
Clinician acceptance	90% – 100% would use in practice	Across all reader studies

Regulatory framing: Escalé-Besa 2023 dual analysis

The Escalé-Besa 2023 study contains two complementary analyses that serve distinct regulatory arguments. The all-patients analysis (n=100) demonstrates real-world robustness: even when presented with 18 conditions outside its trained scope, Autoderm maintained 98% specificity and did not generate dangerous false positives. The in-scope analysis (n=82) reflects true clinical performance within the intended use, conditions the algorithm was designed to screen, where Autoderm matched GPs at Top-3 (AI 75% vs GP 76%) and matched dermatologists at Top-3 when assessed at Top-5 (AI 89% vs dermatologist 90%). For FDA and MDR submissions, the n=82 analysis is the primary clinical performance benchmark; the all-100 analysis supports the safety and robustness argument. Both were conducted on V2.0; V2.3.0 is expected to perform better.

6. Real-world safety record

Autoderm has operated continuously since 2018. Post-market surveillance data represents the strongest real-world safety evidence among comparable AI dermatology devices at MDR Class IIa level.

Scroll right to read full table

Safety Indicator	Data	Source / Period
Total API calls	2,000,000+	Cumulative since 2018
Adverse events reported	Zero	Post-market surveillance 2018–present
Fatalities or morbidity attributable to device	None	PMS vigilance framework; MHRA, BfArM, MAUDE database review
User dissatisfaction rate	18%	Primarily attributed to image quality issues, not algorithm output errors
Continuous operation since	2018	6+ years uninterrupted deployment

The device is intended for use as a clinical decision support tool (CDST) under healthcare worker oversight, not as a standalone diagnostic system. All output is presented as a ranked list of possible skin conditions rather than definitive diagnoses, mitigating risk of over-reliance. Residual risks include false negatives for serious conditions (e.g. melanoma ~5% miss rate in top-5) and image quality dependency, both addressed in the Instructions for Use (IFU V3.11).

7. Algorithm progression

Autoderm’s algorithm has undergone documented continuous development from V0.1 (2017 MVP) to V2.3.0 (current production version). Each major version introduced expanded training data, improved condition coverage, and measurable accuracy gains validated by external studies.

Scroll right to read full table

Version	Period	Key Characteristics	Validated By
V0.1 / v33.1	2017–2019	MVP. ~63K training images. 33 conditions. ~5–10% dark skin images.	Zaar et al. 2020 (Sweden, n=521); Kamulegeya et al. 2023 (Uganda, n=123)
V1.0 / V1.1	2019–2021	Expanded training set. Improved condition coverage.	Internal validation
V2.0	2021–2022	Major architecture update. Expanded to 70+ conditions. Fitzpatrick diversity improvements.	Escalé-Besa et al. 2023 (Spain, n=100); Lu Feng et al. 2022 (China, n=761)
V2.1 / V2.2	2022–2023	Continued dataset expansion. ~150K annotated images. Performance improvements.	Medical Students Study 2024 (V2.2); Coachella Study 2025 (V2.2, n=91)
V2.3.0 / V2.1	2023–present	Current production version. 70+ conditions. 150K+ annotated images. Fitzpatrick17K bias evaluation completed (26 conditions, FST I-VI).	Zhu et al. 2023 (China, n=920, tested V2.1); UK GP Reader Study 2024 (V2.3); myGP PMS Report 2024 (V2.3)

Training data note: The current V2.3.0 model is trained on 150,000+ dermatologist-annotated images from a proprietary dataset of 3M+ real-world smartphone images (100K new images per month). Annotations follow a consensus protocol requiring agreement from at least 2 of 3 independent board-certified dermatologists per image. Expanded training with ~220,680 images including augmentation has also been completed.

8. Demographic & diversity evidence

Autoderm’s portfolio includes validation data across five Fitzpatrick skin type categories (FST I-II, III-IV, VI) and six geographies (Sweden, China, Spain, UK, Uganda, USA), providing unusually broad demographic coverage for a Class IIa SaMD.

Scroll right to read full table

Population / Skin Type	Study	Finding	Model Version
FST I-II (Caucasian, Sweden)	Zaar et al. 2020	Top-1: 22.8%, Top-5: 56.4%, baseline benchmark for V0.1	V0.1 (~63K images)
FST III-IV (Asian, China)	Zhu et al. 2023 (n=920)	Top-1: 41.8%, Top-3: 65.8%, Specificity: 96.8%. Primary validation for V2.x.	V2.1 (68 conditions)
FST III-IV (Asian, China)	Lu Feng et al. 2022 (n=761)	Top-1: 53.4%, Specificity: 96.3%, Accuracy: 92.3%. Earlier validation from same centre.	V2.0 (44 conditions)
FST VI (Dark skin, Uganda)	Kamulegeya et al. 2023 (n=123)	Top-5 retrieval rate: 17% on V0.1. Note: uses distribution scores, not standard accuracy metrics. Directly motivated dataset diversification. (See Section 3 for full methodological note.)	V0.1 (~63K images, ~5–10% dark skin)
FST I-VI (All skin types)	Fitzpatrick17K Bias Evaluation (internal)	26 conditions evaluated across FST I-VI. No systemic bias detected. Bias is condition-specific, not uniform. Most conditions cluster near zero bias.	V2.3.0 (current)
FST III-V (Mixed, Spain)	Escalé-Besa et al. 2023 (n=100)	Real-world Catalonian primary care population. 92% GP satisfaction.	V2.0

Outstanding gap: An absolute per-FST-type sensitivity/specificity table for V2.3.0 remains desirable for the CER but is not a blocking requirement given the Fitzpatrick17K evaluation already completed.

9. Post-market clinical follow-up (PMCF)

Autoderm has an established PMCF programme under its ISO 13485:2021 Quality Management System, comprising three completed formal PMCF studies, an ongoing PMS vigilance monitoring framework, and a pipeline of additional planned studies.

9.1 PMCF Study 1 - Visiba Care (September 2021)

Scroll right to read full table

Field	Details
Report date	September 2021
Product version	Autoderm V2.0 (44 skin disease classes, 20 evaluated)
Observations (n)	1,092 real-world clinical observations
Reference standard	GP diagnoses from patient journals
Geographic scope	Sweden, Norway, Finland, United Kingdom (Visiba Care platform)
Regulatory framework	EU MDR Annex XIV PMCF requirements

Performance Results (Real-World, n=1,092)

Metric	Recall	Notes
Top-1 recall	~42%	Single best match correct
Top-3 recall	~64%	Correct diagnosis in top 3 results
Top-5 recall	~75%	Correct diagnosis in top 5 (intended use output)

Notable per-condition performance (Top-5 recall)

Actinic Keratosis (L57.0): approaching 100% top-5 recall
Rosacea (L71.9): approaching 100% top-5 recall
Perioral Dermatitis (L71.0): approaching 100% top-5 recall
Seborrhoeic Keratosis (L82.9): ~90% top-5 recall
Borrelia/Lyme (A69.2): ~85–90% top-5 recall, relevant for FDA De Novo target condition
Pityriasis Versicolor (B36.0): ~85–90% top-5 recall
Vitiligo (L80.9): ~85–90% top-5 recall

Performance context

Pre-V0.1 closed-set testing (Visiba 2019 contract) documented 49.3% top-1, 70.1% top-3, 81.7% top-5. The real-world V2.0 figures (~42% top-1, ~75% top-5) reflect open-set clinical conditions where the correct diagnosis may fall outside the 44 evaluated conditions. The difference between closed-set and real-world performance is well-characterised in the AI/ML literature and expected.

Calibration analysis

A calibration plot comparing raw model output against a calibrated version demonstrated that raw model outputs over-predict (observed probability consistently exceeds predicted probability). This analysis formally supports the classification of Autoderm’s output percentages as distribution scores across ranked skin disease classes, not calibrated probability estimates. This framing is critical for both MDR CER accuracy and FDA De Novo CDS classification.

Healthcare professional feedback

GPs using Autoderm within Visiba Care reported satisfaction with diagnostic accuracy. No complaints about missing skin diseases were received. GPs requested addition of 7 new conditions (predominantly paediatric): Chickenpox (B01.9), Scarlet fever (A38.9), Hand/foot/mouth disease (B08.4), Measles (B05.9), Viral exanthema (L27.1), Fifth disease (B08.3), and Erysipelas (A46.9).

PMCF conclusion

The clinical data confirmed the overall benefit-risk profile of Autoderm V2.0 was satisfactory. No patient safety concerns were identified. No changes to the PMCF plan were deemed necessary.

9.2 PMCF study 2: Boots UK GP survey (November 2024)

Field	Details
Report date	November 2024
Product version	Autoderm V2.3 (current generation)
Study design	Randomized: Group A (control, n=5 GPs, no AI) vs Group B (study, n=5 GPs, with AI top-5)
Participants	n=10 UK GPs (8–25 years clinical experience)
Cases	20 real patient cases from First Derm platform; consensus diagnoses from board-certified dermatologists
Geography	United Kingdom (Boots UK)
Citation status	White paper. Also listed as white paper #6 in Appendix (same study serves dual evidence purpose).

Performance results

Metric	Without AI (Group A)	With AI (Group B)	Change
Top-1 diagnostic accuracy	48%	69%	+21 percentage points
Top-3 diagnostic accuracy	74%	81%	+7 percentage points
Dermatology referrals	37 cases	22 cases	-40%
Consultation time	Baseline	Reduced ~60%	-60%
Management recommendation accuracy	Similar	Similar	No change

User acceptance (Study group)

100% of GPs reported AI helped them consider additional differentials
100% would use Autoderm in clinical practice
40% felt AI assistance alone would have been sufficient for consultation resolution

User acceptance (Study group)

The study confirms Autoderm’s intended use as a clinical decision support tool improves primary care diagnostic performance and workflow efficiency in real-world UK general practice settings.

9.3 Post-market surveillance report: myGP UK (May 2024)

Field	Details
Report date	June 11, 2024
Product version	Autoderm V2.3 (current generation)
Data period	May 2024 (single month snapshot from ongoing deployment)
Analyses (n)	18,359 skin image analyses in May 2024 alone
Cumulative volume	370,000+ skin images screened since integration
Daily usage	~500 individuals per day
Platform	myGP app by iPlato (Huma company), 3M+ subscribers
Geography	United Kingdom

Disease Distribution (May 2024)

Top conditions flagged: Seborrhoeic Keratosis (350 cases), Actinic Keratosis (258), Lentigo (236), Atypical Melanocytic Nevus (205), Basal Cell Carcinoma (202), Dermatofibroma (192). Malignant tumours comprised 8.3% of all results.

Melanoma screening

Autoderm flagged 79 cases as possible melanoma during May 2024 (approximately 2–3 per day). With approximately 46 melanomas diagnosed daily in the UK, this suggests approximately 5% of UK melanoma cases may have been first signposted via the myGP app, demonstrating the potential role of AI screening in raising early awareness at population scale.

Key findings

The disease distribution in the myGP consumer population (35.2% inflammatory, 32.2% benign tumours, 22.9% infectious, 8.3% malignant, 1.4% genital) is consistent with expected population-level skin disease epidemiology. The over-representation of benign tumours such as seborrhoeic keratosis reflects a population of health-conscious users seeking reassurance, directly supporting Autoderm’s intended use as a signposting tool that builds patient awareness and directs them to appropriate care.

9.4 User testimonials

Technical Integration	Clinical Practice
Pardeep Kaushik Chief Technology Officer, MedinyX Technologies GmbH “The API is well documented, our engineers took a few hours to integrate it, any updates you guys come with, takes us less than 10 minutes to deploy.”	Dr Ulf Österstad Operations Manager, Bra Liv nära, Sweden “Autoderm is a perfect example of how AI should support providers. It works instantly and highlights possibilities I may not have initially considered, helping guide our clinical workflow.”

9.5 PMS vigilance monitoring framework

As part of the ISO 13485:2021 QMS, Autoderm has established a programme of post-market surveillance to identify, investigate, and reduce to an acceptable level any risks associated with Autoderm’s quality and performance. Ongoing PMS vigilance monitoring analyses clinical data to:

Confirm the safety and performance of Autoderm
Confirm the continued acceptability of identified risks
Identify previously unknown side-effects or emerging risks
Ensure the continued acceptability of the benefit-risk ratio
Identify possible systematic misuse or off-label use

Adverse event database searches

Section 11 of the CER summarises adverse event searches across: MHRA Medical Device Alerts and Field Safety Notices (UK); BfArM (Germany); FDA MAUDE database (US); and ClinicalTrials.gov. No alerts related to the safety of Autoderm were identified across any database.

9.6 Planned PMCF studies

Additional PMCF studies comparing primary care professionals with and without Autoderm are planned. Data from these studies will be presented in the next CER update and will be used to:

Gather clinical data derived from use of Autoderm according to its intended use
Further substantiate the clinical benefit of the device
Gather usability feedback from end users (lay persons, GPs, and dermatologists) to identify new risks
Gather real clinical data on less common dermatological conditions

10. Regulatory status & certifications

Scroll right to read full table

Designation	Details	Significance
CE Marking, MDD Class I (legacy, transitioning to MDR Class IIa)	EU MDD 93/42/EEC, Class I legacy device. Transitioning to MDR 2017/745, Class IIa. ISO 13485:2021 certified.	Enables commercial deployment across EU/EEA under MDD legacy provisions. MDR Class IIa technical file submission planned for 2026.
FDA Breakthrough Device Designation	Granted for AI-powered dermatology screening. Expedited FDA review pathway.	Accelerated US market entry. Signals FDA recognition of clinical unmet need.
ISO 13485:2021	Quality Management System. IEC 62304 software lifecycle compliance.	Foundation for all regulatory submissions.

11. Evidence strength assessment

Scroll right to read full table

Overall Assessment: SUFFICIENT for MDR Class IIa transition, subject to addressing documented gaps.

Evidence Category	Status	Rationale
Clinical performance data	✓ Available	5 peer-reviewed studies (Zhu 2023, Lu 2022, Escalé-Besa 2023, Zaar 2020, Kamulegeya 2023) + 5 white papers providing consistent performance data across 6 countries.
Human-AI team performance	✓ Available	3 reader studies (CatSalut Spain, UK GPs, Medical Students) all demonstrate AI + clinician outperforms clinician alone. Directly supports CDST intended use.
Safety data	✓ Available	2M+ API calls, zero adverse events since 2018. Strongest post-market safety evidence among comparable AI dermatology devices.
Demographic diversity	✓ Available	Fitzpatrick17K bias evaluation (26 conditions, FST I-VI) — no systemic bias detected. Asian validation (Zhu 2023, n=920). Exploratory African evaluation (Kamulegeya 2023, V0.1). International studies across Spain, Sweden, China, UK, Uganda.
Algorithm progression	✓ Available	Documented V0.1 (2017) → V2.3.0 (current) with measurable accuracy improvements at each version. V0.1 bias on dark skin identified and addressed.
PMCF framework	✓ Available	Three completed PMCF studies: Visiba Care 2021 (n=1,092, formal MDR Annex XIV report), Boots UK GP Survey 2024 (n=10 GPs, randomized), and myGP PMS Report 2024 (18,359 analyses, 370K+ cumulative). PMS vigilance framework active. Zero adverse events across all database searches (MHRA, BfArM, MAUDE). Additional studies planned.

The two remaining critical gaps for MDR submission are: (1) intended purpose language reconciliation between IFU V3.11 and the 2022 CER, a documentation decision rather than an evidence deficiency; and (2) the equivalence pathway pivot, Autoderm should proceed on the basis of its own clinical data rather than the weak SkinVision equivalence claim in the 2022 CER. Both gaps are on track for resolution by June 2026 per the MDR Evidence Mapping roadmap.

Appendix: Full publication & study references

Scroll right to read full table

Peer-reviewed publications

Zhu Y, Lu F, Syed Mohammad Nooruddin M, Liu X, Li X, Yu J, Dong H. Evaluation of the performance of a 34-layer ResNet model-based artificial intelligence application, in the diagnosis of skin diseases. Chinese Journal of Dermatology. 2023. DOI: 10.35541/cjd.20220925. PMID: 37032592. Data collected: Feb 16 – Jul 4, 2022.
Lu Feng, Liu Xin, Zhu Yajie, Li Xiaohong, Yu Jianbin, Dong Huiting. Accuracy of Artificial Intelligence Multi-class Algorithm in the Diagnosis of Common Skin Diseases. Henan Medical Research. 2022;31(8):1387-1392. DOI: 10.3969/j.issn.1004-437X.2022.08.010. Data collected: Feb – Aug 2021.
Escalé-Besa A et al. Using artificial intelligence to improve the diagnostic and management capabilities of primary care physicians in skin lesions with a risk of malignancy. Nature Scientific Reports. 2023. DOI: 10.1038/s41598-023-31340-1
Zaar O et al. Comparison of artificial intelligence and dermatologists for the diagnosis of skin disease. Acta Dermato-Venereologica. 2020.
Kamulegeya L et al. Using artificial intelligence on dermatology conditions in Uganda: a case for diversity in training data sets for machine learning. African Health Sciences. 2023;23(2):753-763. PMC10782289. DOI: 10.4314/ahs.v23i2.86

White papers

Boots UK GP Reader Study. Börve A et al. Autoderm Inc / Boots UK. November 2024. Autoderm V2.3. Randomized reader study, n=10 UK GPs, 20 cases. White paper. (Same study as PMCF Study 2, Section 9.2.)
Medical Students Reader Study. September 2024. White paper.
Fitzpatrick17K Bias Evaluation. Internal study. Autoderm V2.3.0. 26 conditions, FST I-VI. 2024.
Jayawickrama M, Charalambous K, Börve A. AI dermatology in action: How its diagnostic accuracy compares to dermatologists (Coachella Study). White paper. May 29, 2025. Autoderm V2.2, n=91, 40 conditions.
Börve A. Post Market Surveillance Report: myGP app. White paper. June 11, 2024. Autoderm V2.3, n=18,359 analyses (May 2024), 370K+ cumulative screened.

PMCF studies

Visiba Care Post-Market Clinical Follow-Up Report. iDoc24 AB / Visiba Group AB. September 2021. Autoderm V2.0, n=1,092 observations, 20 ICD-10 coded conditions. EU MDR Annex XIV.
Boots UK GP Survey: see white paper #6 above (same study).

Historical partnership evidence

Cooperation Agreement: iDoc24 AB / Medicoo Svenska AB (April 2018). First commercial deployment, pre-V0.1, Sweden.
Service Trial Agreement & DPA: iDoc24 Inc / Visiba Group AB (December 2019). Pre-V0.1 production model, Sweden/Norway/Finland/UK.