Clinical evidence portfolio
Five peer-reviewed publications. Evidence across six countries. 2M+ API calls with zero adverse events. Built for MDR Class IIa, and for the scrutiny that comes with it.
1. Executive summary
Autoderm is an AI-powered dermatology decision support API that analyses smartphone images to screen for 70+ skin conditions, returning results in under 1 second. The software has been researched since 2018 and has processed over 2 million API calls across deployments with Boots UK, DocMorris Germany, myGP UK, and others. This document summarises Autoderm’s complete clinical evidence portfolio across nine categories: peer-reviewed performance studies, white paper performance studies, exploratory diversity evaluations, human-AI reader studies, real-world safety data, algorithm progression, demographic evidence, regulatory compliance, and evidence strength assessment.
Intended use
The Autoderm AI as a medical device is intended to be integrated into third party services for either of the following purposes
- a decision support tool for healthcare workers to enhance decision making & diagnoses in clinical workflows
- a skin analytics tool on skin diseases to be used as a search engine, symptom checker (assessment) or educational tool on skin disease and help find the right information for the possible skin disease to make informed decisions
Intended users
- Healthcare professionals including, but not limited, doctors, pharmacists, nurses
- Laypersons
Overall assessment
Autoderm’s clinical evidence portfolio is sufficient to transition from MDD Class I to MDR Class IIa. The combination of five peer-reviewed publications across five countries, five white papers providing reader studies, standalone performance validation, bias evaluation, and real-world deployment data, 2M+ API calls with zero adverse events, and FDA Breakthrough Device designation places Autoderm among the best-evidenced AI dermatology devices at Class IIa level.This document summarises Autoderm’s complete clinical evidence portfolio across nine categories: peer-reviewed performance studies, white paper performance studies, exploratory diversity evaluations, human-AI reader studies, real-world safety data, algorithm progression, demographic evidence, regulatory compliance, and evidence strength assessment.
Regulatory Note on White Papers
The Boots UK GP Reader Study (listed as white paper #6 below) is the same study as the Boots UK PMCF study described in Section 9.2. It is listed in both contexts because it serves dual purposes: as standalone clinical evidence and as a formal PMCF obligation under the Quality Management System.
| 5 | 5 | 2M+ | 6 | 0 |
| Peer-Reviewed Studies | White Papers | API Calls | Countries | Adverse Events |
2. Peer-reviewed publications
The Boots UK GP Reader Study (listed as white paper #6 below) is the same study as the Boots UK PMCF study described in Section 9.2. It is listed in both contexts because it serves dual purposes: as standalone clinical evidence and as a formal PMCF obligation under the Quality Management System.
2.1 Standalone AI performance studies
| Study | Type | Population | Key Findings | Model / Notes |
| Zhu et al. 2023 Chinese Journal of Dermatology DOI: 10.35541/cjd.20220925 PMID: 37032592 |
Prospective, independent, peer-reviewed | n=920, Chinese population, hospital dermatology (Zhengzhou) | Top-1 sensitivity: 41.8% (95% CI: 24.2–59.5%) Top-3 sensitivity: 65.8% (95% CI: 51.9–79.6%) Specificity: 96.8% / 91.5% Overall accuracy: 92.9% / 89.9% κ = 0.420 / 0.464 In-distribution cases: 871/920 (94.7%) Best performer: Acne 83.1% top-1; Worst: Eczema 2.0% top-1 Primary validation on Asian population (FST III-IV) |
V2.1 (68 conditions) Data collected: Feb 16 – Jul 4, 2022 Published: Aug 8, 2023 Ethics: 2022-KY-0268 Definitive performance study. Strongest peer-reviewed evidence. |
| Lu Feng et al. 2022 Henan Medical Research DOI: 10.3969/j.issn.1004-437X.2022.08.010 |
Prospective, peer-reviewed (regional Chinese journal) | n=761, Chinese population, hospital dermatology (Zhengzhou) | Top-1 sensitivity: 53.4% (95% CI: 36.4–70.4) Specificity: 96.3% (95% CI: 90.8–100.0) Accuracy: 92.3% (95% CI: 86.3–98.2) NHSO rejection: 22/30 (73.3%) correctly identified as non-skin NHS photos: All 20 identified as skin Skin detection cutoff: ~83 cm average Same research group as Zhu 2023. Forms two-paper validation series. |
V2.0 (44 conditions, ~60K training images) Data collected: Feb – Aug 2021 Published: Apr 2022 Received: Feb 18, 2022 Preliminary study from same centre as Zhu 2023. Only study testing non-skin object rejection. |
| Escalé-Besa et al. 2023 Nature Scientific Reports DOI: 10.1038/s41598-023-31340-1 |
Prospective, real-world, peer-reviewed | n=100, Spanish primary care (Catalonia/CatSalut), consecutive patients. Note: 18/100 cases had conditions outside Autoderm’s trained scope. | Analysis 1: All patients (n=100, including 18 out-of-scope conditions): Top-1: AI 39% / GP 64% / Dermatologist 72% Top-3: AI 61% / GP 76% / Dermatologist 90% Top-5: AI 72% Specificity: AI 98% / GP 99% / Dermatologist 99% Analysis 2: In-scope conditions only (n=82): |
V2.0 (V2.3.0 current and is expected to perform better) Cornerstone publication. The n=82 in-scope analysis is the primary performance benchmark for regulatory purposes: AI matched GPs at Top-3 and matched dermatologists at Top-5. The all-100 analysis demonstrates real-world robustness, no dangerous false positives on unfamiliar conditions. |
| Zaar et al. 2020 Acta Dermato-Venereologica Sahlgrenska University Hospital |
Retrospective, external, peer-reviewed | n=521 images, 26 diagnoses, Sweden (FST I-II predominantly) | Top-1 accuracy: 22.8% Top-5 accuracy: 56.4% Baseline benchmark for algorithm progression. Tested V0.1 (MVP, trained on ~63K images) |
V0.1 (MVP year 2018) ~63K training images Same model as Kamulegeya 2023. Enables direct cross-population comparison. |
| Kamulegeya et al. 2023 African Health Sciences PMC10782289 |
Retrospective observational, external, peer-reviewed | n=123, Fitzpatrick Skin Type VI exclusively, Uganda (TMCG telehealth platform). Median age 23 yrs. | Top-5 retrieval rate: 17% (21/123 cases) Note: percentages are distribution scores, not standard diagnostic accuracy metrics. Best-performing condition: Dermatitis (80% top-1 retrieval) Worst-performing: Fungal diseases (0%), most prevalent condition in dataset Performance higher in females vs males across conditions |
V0.1 / classification system v0.1_33 skin diseases ~63K training images (~5–10% dark skin) Same model as Zaar 2020. Together these two studies directly demonstrate the V0.1 FST bias: 56.4% top-5 (Sweden, FST I-II) vs 17% retrieval (Uganda, FST VI). Motivated Fitzpatrick17K bias evaluation and dataset diversification in later versions. |
3. Exploratory diversity evaluation
The following study was conducted as an external exploratory evaluation on Fitzpatrick Skin Type VI (African dark skin) populations using V0.1 of the algorithm. It is classified separately from the peer-reviewed performance studies due to important methodological differences in how accuracy was measured.
| Study | Type | Population | Key Findings & Interpretation | Model / Notes |
| Kamulegeya et al. 2023 African Health Sciences PMC10782289 Data collected: Jan–Mar 2018 Published: 2023 |
Retrospective observational, external, peer-reviewed | n=123, Fitzpatrick Skin Type VI exclusively, Uganda (TMCG telehealth platform). Median age 23 yrs. | Top-5 retrieval rate: 17% (21/123 cases)
IMPORTANT METHODOLOGICAL NOTE: The percentages reported in this study are distribution scores (similarity scores per condition that do not sum to 100%), not standard diagnostic accuracy metrics. The 17% figure represents cases where the correct diagnosis appeared anywhere in the ranked top-5 list — a retrieval rate, not sensitivity/specificity. The paper’s comparison to 69.9% on ‘Caucasian skin’ refers to Autoderm’s own stated training performance — not a finding of this study. Best-performing condition: Dermatitis (80% top-1 retrieval) |
V0.1 / classification system v0.1_33 skin diseases ~63K training images (~5–10% dark skin) Same model as Zaar 2020. Together these two studies directly demonstrate the V0.1 FST bias: 56.4% top-5 (Sweden, FST I-II) vs 17% retrieval (Uganda, FST VI). Motivated Fitzpatrick17K bias evaluation and dataset diversification in later versions. |
Methodological note on kamulegeya et al. 2023
This study used Autoderm’s distribution scores (per-condition similarity scores that do not sum to 100%) as the basis for ranking diagnoses. The scoring system recorded the rank position (1–5) of the correct diagnosis in the top-5 output list. The resulting 17% figure is therefore a top-5 retrieval rate, the proportion of cases where the correct diagnosis appeared anywhere in the ranked output, not a standard sensitivity or specificity metric. This methodology differs fundamentally from the top-1/top-3 sensitivity metrics used in Zhu 2023, Lu 2022, and the accuracy metrics in Zaar 2020. Figures from this study should not be directly compared to those from other studies in this portfolio without this caveat clearly stated. The paper’s reference to ‘69.9% on Caucasian skin’ was Autoderm’s own stated training performance figure used as a benchmark, it was not a finding of this study.
Strategic Value
Despite these methodological limitations, this study holds significant strategic and regulatory value. Together with Zaar 2020, it provides the only direct same-model (V0.1) cross-population comparison in the portfolio: 56.4% top-5 retrieval (Sweden, FST I-II) versus 17% top-5 retrieval (Uganda, FST VI). This contrast directly motivated the Fitzpatrick17K bias evaluation and subsequent dataset diversification in V2.x, forming a compelling ‘identified problem → implemented solution → validated improvement’ narrative for regulators and enterprise partners.
3.5 Standalone AI performance: White papers
Why White Papers?
The pace of AI model development frequently outstrips the peer-review publication cycle, which typically takes 12–24 months from submission to publication. By the time a study is published, the model under evaluation may already have been superseded by a newer version with improved performance. Autoderm therefore supplements its peer-reviewed evidence base with white papers that provide timely performance data on current or recent model versions, while peer-reviewed publications for the same studies are pursued in parallel where appropriate.
| Study | Type | Population | Key Findings | Model / Notes |
| Coachella Study Jayawickrama, Charalambous & Börve May 2025 |
Retrospective, white paper | n=91 images, 40 different skin conditions, First Derm platform | Top-1 accuracy: 53% Top-3 accuracy: 84% Top-5 accuracy: 93% Treatment accuracy: 95% Reference standard: Board-certified dermatologists via First Derm platform (93% accuracy) |
V2.2 Data collected: 2024 (exact dates TBC) Published: May 29, 2025 Top-5 matches dermatologist-level accuracy. Treatment accuracy (95%) exceeds diagnostic accuracy. |
4. Human-AI reader studies
Three reader studies demonstrate that clinicians and students using Autoderm as a decision support tool consistently outperform those working without AI assistance. All three studies show the same directional result: AI + clinician > clinician alone.
| Study | Type | Population | Key Findings | Model / Notes |
| UK GP Reader Study Boots UK (Nov 2024) White paper |
Reader study, randomized control l vs AI-assisted groups. PMCF study. | n=10 UK GPs (8–25 yrs experience), 20 cases, V2.3 | GP+AI: 69% top-1, 81% top-3 GP alone: 48% top-1, 74% top-3 Referrals reduced: 37→22 (-40%) Time per case significantly reduced 100% would use AI in practice 100% said AI helped with differentials |
V2.2 Completed PMCF study. Also serves as PMCF Study 2 (see Section 9.2). |
| Medical Students Study Sept 2024 White paper |
Reader study, white paper | n=20 4th-year medical students, 20 cases, V2.2 | AI improved accuracy in 14/20 cases (70%) Top-3 improved in 15/20 cases (75%) Management decision improved in 16/20 (75%) Time reduced ~40% 90% would use AI in practice |
V2.2 Demonstrates utility for non-specialist users and educational settings. |
4. Human-AI reader studies
Three reader studies demonstrate that clinicians and students using Autoderm as a decision support tool consistently outperform those working without AI assistance. All three studies show the same directional result: AI + clinician > clinician alone.
| Study | Type | Population | Key Findings | Model / Notes |
| UK GP Reader Study Boots UK (Nov 2024) White paper |
Reader study, randomized control l vs AI-assisted groups. PMCF study. | n=10 UK GPs (8–25 yrs experience), 20 cases, V2.3 | GP+AI: 69% top-1, 81% top-3 GP alone: 48% top-1, 74% top-3 Referrals reduced: 37→22 (-40%) Time per case significantly reduced 100% would use AI in practice 100% said AI helped with differentials |
V2.2 Completed PMCF study. Also serves as PMCF Study 2 (see Section 9.2). |
| Medical Students Study Sept 2024 White paper |
Reader study, white paper | n=20 4th-year medical students, 20 cases, V2.2 | AI improved accuracy in 14/20 cases (70%) Top-3 improved in 15/20 cases (75%) Management decision improved in 16/20 (75%) Time reduced ~40% 90% would use AI in practice |
V2.2 Demonstrates utility for non-specialist users and educational settings. |
5. Aggregate performance summary
The table below synthesises key performance metrics across all comparable studies. Note that direct numerical comparisons should be interpreted with caution given differences in study design, population, conditions evaluated, and model versions tested.
| Metric | Range Across Studies | Notes |
| Top-1 accuracy (standalone AI) | 22.8% – 53.4% | V0.1 to V2.x; varies by population, setting, and conditions tested. Escalé-Besa 2023: 39% all-100 / 48% in-scope n=82. Coachella 2025: 53% (V2.2). Lu 2022: 53.4% (V2.0) |
| Top-3 accuracy (standalone AI) | 61% – 84% | Coachella 2025: 84%. Escalé-Besa 2023 n=82: 75% — comparable to GP top-3 (76%). Zhu 2023: 65.8% |
| Top-5 accuracy (standalone AI) | 56.4% – 93% | Coachella 2025: 93% (matches dermatologist accuracy). Escalé-Besa 2023 n=82: 89%. Zaar 2020 V0.1: 56.4% |
| Specificity | 91.5% – 96.8% | Zhu 2023; consistently high across conditions. Lu 2022: 96.3% |
| Treatment accuracy (standalone AI) | 95% | Coachella 2025: even when top-1 diagnosis incorrect, AI treatment recommendation correct in 95% of cases |
| Top-1 accuracy (GP+AI vs GP alone) | 48%→69% (UK GP Study 2024) | Human-AI team consistently outperforms clinician alone. 60% aided correct clinical assessment (Escalé-Besa 2023). Standalone AI (n=82) matched GP Top-3 and dermatologist Top-3 at Top-5. |
| Referral reduction | 34% – 40% | Escalé-Besa 2023 (34%) and UK GP Study 2024 (40%) |
| Time savings | 40% – 70% | Across reader studies |
| Clinician acceptance | 90% – 100% would use in practice | Across all reader studies |
Regulatory framing: Escalé-Besa 2023 dual analysis
The Escalé-Besa 2023 study contains two complementary analyses that serve distinct regulatory arguments. The all-patients analysis (n=100) demonstrates real-world robustness: even when presented with 18 conditions outside its trained scope, Autoderm maintained 98% specificity and did not generate dangerous false positives. The in-scope analysis (n=82) reflects true clinical performance within the intended use, conditions the algorithm was designed to screen, where Autoderm matched GPs at Top-3 (AI 75% vs GP 76%) and matched dermatologists at Top-3 when assessed at Top-5 (AI 89% vs dermatologist 90%). For FDA and MDR submissions, the n=82 analysis is the primary clinical performance benchmark; the all-100 analysis supports the safety and robustness argument. Both were conducted on V2.0; V2.3.0 is expected to perform better.
6. Real-world safety record
Autoderm has operated continuously since 2018. Post-market surveillance data represents the strongest real-world safety evidence among comparable AI dermatology devices at MDR Class IIa level.
| Safety Indicator | Data | Source / Period |
| Total API calls | 2,000,000+ | Cumulative since 2018 |
| Adverse events reported | Zero | Post-market surveillance 2018–present |
| Fatalities or morbidity attributable to device | None | PMS vigilance framework; MHRA, BfArM, MAUDE database review |
| User dissatisfaction rate | 18% | Primarily attributed to image quality issues, not algorithm output errors |
| Continuous operation since | 2018 | 6+ years uninterrupted deployment |
The device is intended for use as a clinical decision support tool (CDST) under healthcare worker oversight, not as a standalone diagnostic system. All output is presented as a ranked list of possible skin conditions rather than definitive diagnoses, mitigating risk of over-reliance. Residual risks include false negatives for serious conditions (e.g. melanoma ~5% miss rate in top-5) and image quality dependency, both addressed in the Instructions for Use (IFU V3.11).
7. Algorithm progression
Autoderm’s algorithm has undergone documented continuous development from V0.1 (2017 MVP) to V2.3.0 (current production version). Each major version introduced expanded training data, improved condition coverage, and measurable accuracy gains validated by external studies.
| Version | Period | Key Characteristics | Validated By |
| V0.1 / v33.1 | 2017–2019 | MVP. ~63K training images. 33 conditions. ~5–10% dark skin images. | Zaar et al. 2020 (Sweden, n=521); Kamulegeya et al. 2023 (Uganda, n=123) |
| V1.0 / V1.1 | 2019–2021 | Expanded training set. Improved condition coverage. | Internal validation |
| V2.0 | 2021–2022 | Major architecture update. Expanded to 70+ conditions. Fitzpatrick diversity improvements. | Escalé-Besa et al. 2023 (Spain, n=100); Lu Feng et al. 2022 (China, n=761) |
| V2.1 / V2.2 | 2022–2023 | Continued dataset expansion. ~150K annotated images. Performance improvements. | Medical Students Study 2024 (V2.2); Coachella Study 2025 (V2.2, n=91) |
| V2.3.0 / V2.1 | 2023–present | Current production version. 70+ conditions. 150K+ annotated images. Fitzpatrick17K bias evaluation completed (26 conditions, FST I-VI). | Zhu et al. 2023 (China, n=920, tested V2.1); UK GP Reader Study 2024 (V2.3); myGP PMS Report 2024 (V2.3) |
Training data note: The current V2.3.0 model is trained on 150,000+ dermatologist-annotated images from a proprietary dataset of 3M+ real-world smartphone images (100K new images per month). Annotations follow a consensus protocol requiring agreement from at least 2 of 3 independent board-certified dermatologists per image. Expanded training with ~220,680 images including augmentation has also been completed.
8. Demographic & diversity evidence
Autoderm’s portfolio includes validation data across five Fitzpatrick skin type categories (FST I-II, III-IV, VI) and six geographies (Sweden, China, Spain, UK, Uganda, USA), providing unusually broad demographic coverage for a Class IIa SaMD.
| Population / Skin Type | Study | Finding | Model Version |
| FST I-II (Caucasian, Sweden) | Zaar et al. 2020 | Top-1: 22.8%, Top-5: 56.4%, baseline benchmark for V0.1 | V0.1 (~63K images) |
| FST III-IV (Asian, China) | Zhu et al. 2023 (n=920) | Top-1: 41.8%, Top-3: 65.8%, Specificity: 96.8%. Primary validation for V2.x. | V2.1 (68 conditions) |
| FST III-IV (Asian, China) | Lu Feng et al. 2022 (n=761) | Top-1: 53.4%, Specificity: 96.3%, Accuracy: 92.3%. Earlier validation from same centre. | V2.0 (44 conditions) |
| FST VI (Dark skin, Uganda) | Kamulegeya et al. 2023 (n=123) | Top-5 retrieval rate: 17% on V0.1. Note: uses distribution scores, not standard accuracy metrics. Directly motivated dataset diversification. (See Section 3 for full methodological note.) | V0.1 (~63K images, ~5–10% dark skin) |
| FST I-VI (All skin types) | Fitzpatrick17K Bias Evaluation (internal) | 26 conditions evaluated across FST I-VI. No systemic bias detected. Bias is condition-specific, not uniform. Most conditions cluster near zero bias. | V2.3.0 (current) |
| FST III-V (Mixed, Spain) | Escalé-Besa et al. 2023 (n=100) | Real-world Catalonian primary care population. 92% GP satisfaction. | V2.0 |
Outstanding gap: An absolute per-FST-type sensitivity/specificity table for V2.3.0 remains desirable for the CER but is not a blocking requirement given the Fitzpatrick17K evaluation already completed.
9. Post-market clinical follow-up (PMCF)
Autoderm has an established PMCF programme under its ISO 13485:2021 Quality Management System, comprising three completed formal PMCF studies, an ongoing PMS vigilance monitoring framework, and a pipeline of additional planned studies.
9.1 PMCF Study 1 - Visiba Care (September 2021)
| Field | Details |
| Report date | September 2021 |
| Product version | Autoderm V2.0 (44 skin disease classes, 20 evaluated) |
| Observations (n) | 1,092 real-world clinical observations |
| Reference standard | GP diagnoses from patient journals |
| Geographic scope | Sweden, Norway, Finland, United Kingdom (Visiba Care platform) |
| Regulatory framework | EU MDR Annex XIV PMCF requirements |
Performance Results (Real-World, n=1,092)
| Metric | Recall | Notes |
| Top-1 recall | ~42% | Single best match correct |
| Top-3 recall | ~64% | Correct diagnosis in top 3 results |
| Top-5 recall | ~75% | Correct diagnosis in top 5 (intended use output) |
Notable per-condition performance (Top-5 recall)
- Actinic Keratosis (L57.0): approaching 100% top-5 recall
- Rosacea (L71.9): approaching 100% top-5 recall
- Perioral Dermatitis (L71.0): approaching 100% top-5 recall
- Seborrhoeic Keratosis (L82.9): ~90% top-5 recall
- Borrelia/Lyme (A69.2): ~85–90% top-5 recall, relevant for FDA De Novo target condition
- Pityriasis Versicolor (B36.0): ~85–90% top-5 recall
- Vitiligo (L80.9): ~85–90% top-5 recall
Performance context
Pre-V0.1 closed-set testing (Visiba 2019 contract) documented 49.3% top-1, 70.1% top-3, 81.7% top-5. The real-world V2.0 figures (~42% top-1, ~75% top-5) reflect open-set clinical conditions where the correct diagnosis may fall outside the 44 evaluated conditions. The difference between closed-set and real-world performance is well-characterised in the AI/ML literature and expected.
Calibration analysis
A calibration plot comparing raw model output against a calibrated version demonstrated that raw model outputs over-predict (observed probability consistently exceeds predicted probability). This analysis formally supports the classification of Autoderm’s output percentages as distribution scores across ranked skin disease classes, not calibrated probability estimates. This framing is critical for both MDR CER accuracy and FDA De Novo CDS classification.
Healthcare professional feedback
GPs using Autoderm within Visiba Care reported satisfaction with diagnostic accuracy. No complaints about missing skin diseases were received. GPs requested addition of 7 new conditions (predominantly paediatric): Chickenpox (B01.9), Scarlet fever (A38.9), Hand/foot/mouth disease (B08.4), Measles (B05.9), Viral exanthema (L27.1), Fifth disease (B08.3), and Erysipelas (A46.9).
PMCF conclusion
The clinical data confirmed the overall benefit-risk profile of Autoderm V2.0 was satisfactory. No patient safety concerns were identified. No changes to the PMCF plan were deemed necessary.
9.2 PMCF study 2: Boots UK GP survey (November 2024)
| Field | Details |
| Report date | November 2024 |
| Product version | Autoderm V2.3 (current generation) |
| Study design | Randomized: Group A (control, n=5 GPs, no AI) vs Group B (study, n=5 GPs, with AI top-5) |
| Participants | n=10 UK GPs (8–25 years clinical experience) |
| Cases | 20 real patient cases from First Derm platform; consensus diagnoses from board-certified dermatologists |
| Geography | United Kingdom (Boots UK) |
| Citation status | White paper. Also listed as white paper #6 in Appendix (same study serves dual evidence purpose). |
Performance results
| Metric | Without AI (Group A) | With AI (Group B) | Change |
| Top-1 diagnostic accuracy | 48% | 69% | +21 percentage points |
| Top-3 diagnostic accuracy | 74% | 81% | +7 percentage points |
| Dermatology referrals | 37 cases | 22 cases | -40% |
| Consultation time | Baseline | Reduced ~60% | -60% |
| Management recommendation accuracy | Similar | Similar | No change |
User acceptance (Study group)
- 100% of GPs reported AI helped them consider additional differentials
- 100% would use Autoderm in clinical practice
- 40% felt AI assistance alone would have been sufficient for consultation resolution
User acceptance (Study group)
The study confirms Autoderm’s intended use as a clinical decision support tool improves primary care diagnostic performance and workflow efficiency in real-world UK general practice settings.
9.3 Post-market surveillance report: myGP UK (May 2024)
| Field | Details |
| Report date | June 11, 2024 |
| Product version | Autoderm V2.3 (current generation) |
| Data period | May 2024 (single month snapshot from ongoing deployment) |
| Analyses (n) | 18,359 skin image analyses in May 2024 alone |
| Cumulative volume | 370,000+ skin images screened since integration |
| Daily usage | ~500 individuals per day |
| Platform | myGP app by iPlato (Huma company), 3M+ subscribers |
| Geography | United Kingdom |
Disease Distribution (May 2024)
Top conditions flagged: Seborrhoeic Keratosis (350 cases), Actinic Keratosis (258), Lentigo (236), Atypical Melanocytic Nevus (205), Basal Cell Carcinoma (202), Dermatofibroma (192). Malignant tumours comprised 8.3% of all results.
Melanoma screening
Autoderm flagged 79 cases as possible melanoma during May 2024 (approximately 2–3 per day). With approximately 46 melanomas diagnosed daily in the UK, this suggests approximately 5% of UK melanoma cases may have been first signposted via the myGP app, demonstrating the potential role of AI screening in raising early awareness at population scale.
Key findings
The disease distribution in the myGP consumer population (35.2% inflammatory, 32.2% benign tumours, 22.9% infectious, 8.3% malignant, 1.4% genital) is consistent with expected population-level skin disease epidemiology. The over-representation of benign tumours such as seborrhoeic keratosis reflects a population of health-conscious users seeking reassurance, directly supporting Autoderm’s intended use as a signposting tool that builds patient awareness and directs them to appropriate care.
9.4 User testimonials
| Technical Integration | Clinical Practice |
| Pardeep Kaushik Chief Technology Officer, MedinyX Technologies GmbH “The API is well documented, our engineers took a few hours to integrate it, any updates you guys come with, takes us less than 10 minutes to deploy.” |
Dr Ulf Österstad Operations Manager, Bra Liv nära, Sweden “Autoderm is a perfect example of how AI should support providers. It works instantly and highlights possibilities I may not have initially considered, helping guide our clinical workflow.” |
9.5 PMS vigilance monitoring framework
As part of the ISO 13485:2021 QMS, Autoderm has established a programme of post-market surveillance to identify, investigate, and reduce to an acceptable level any risks associated with Autoderm’s quality and performance. Ongoing PMS vigilance monitoring analyses clinical data to:
- Confirm the safety and performance of Autoderm
- Confirm the continued acceptability of identified risks
- Identify previously unknown side-effects or emerging risks
- Ensure the continued acceptability of the benefit-risk ratio
- Identify possible systematic misuse or off-label use
Adverse event database searches
Section 11 of the CER summarises adverse event searches across: MHRA Medical Device Alerts and Field Safety Notices (UK); BfArM (Germany); FDA MAUDE database (US); and ClinicalTrials.gov. No alerts related to the safety of Autoderm were identified across any database.
9.6 Planned PMCF studies
Additional PMCF studies comparing primary care professionals with and without Autoderm are planned. Data from these studies will be presented in the next CER update and will be used to:
- Gather clinical data derived from use of Autoderm according to its intended use
- Further substantiate the clinical benefit of the device
- Gather usability feedback from end users (lay persons, GPs, and dermatologists) to identify new risks
- Gather real clinical data on less common dermatological conditions
10. Regulatory status & certifications
| Designation | Details | Significance |
| CE Marking, MDD Class I (legacy, transitioning to MDR Class IIa) | EU MDD 93/42/EEC, Class I legacy device. Transitioning to MDR 2017/745, Class IIa. ISO 13485:2021 certified. | Enables commercial deployment across EU/EEA under MDD legacy provisions. MDR Class IIa technical file submission planned for 2026. |
| FDA Breakthrough Device Designation | Granted for AI-powered dermatology screening. Expedited FDA review pathway. | Accelerated US market entry. Signals FDA recognition of clinical unmet need. |
| ISO 13485:2021 | Quality Management System. IEC 62304 software lifecycle compliance. | Foundation for all regulatory submissions. |
11. Evidence strength assessment
Overall Assessment: SUFFICIENT for MDR Class IIa transition, subject to addressing documented gaps.
| Evidence Category | Status | Rationale |
| Clinical performance data | ✓ Available | 5 peer-reviewed studies (Zhu 2023, Lu 2022, Escalé-Besa 2023, Zaar 2020, Kamulegeya 2023) + 5 white papers providing consistent performance data across 6 countries. |
| Human-AI team performance | ✓ Available | 3 reader studies (CatSalut Spain, UK GPs, Medical Students) all demonstrate AI + clinician outperforms clinician alone. Directly supports CDST intended use. |
| Safety data | ✓ Available | 2M+ API calls, zero adverse events since 2018. Strongest post-market safety evidence among comparable AI dermatology devices. |
| Demographic diversity | ✓ Available | Fitzpatrick17K bias evaluation (26 conditions, FST I-VI) — no systemic bias detected. Asian validation (Zhu 2023, n=920). Exploratory African evaluation (Kamulegeya 2023, V0.1). International studies across Spain, Sweden, China, UK, Uganda. |
| Algorithm progression | ✓ Available | Documented V0.1 (2017) → V2.3.0 (current) with measurable accuracy improvements at each version. V0.1 bias on dark skin identified and addressed. |
| PMCF framework | ✓ Available | Three completed PMCF studies: Visiba Care 2021 (n=1,092, formal MDR Annex XIV report), Boots UK GP Survey 2024 (n=10 GPs, randomized), and myGP PMS Report 2024 (18,359 analyses, 370K+ cumulative). PMS vigilance framework active. Zero adverse events across all database searches (MHRA, BfArM, MAUDE). Additional studies planned. |
The two remaining critical gaps for MDR submission are: (1) intended purpose language reconciliation between IFU V3.11 and the 2022 CER, a documentation decision rather than an evidence deficiency; and (2) the equivalence pathway pivot, Autoderm should proceed on the basis of its own clinical data rather than the weak SkinVision equivalence claim in the 2022 CER. Both gaps are on track for resolution by June 2026 per the MDR Evidence Mapping roadmap.
Appendix: Full publication & study references
Peer-reviewed publications
- Zhu Y, Lu F, Syed Mohammad Nooruddin M, Liu X, Li X, Yu J, Dong H. Evaluation of the performance of a 34-layer ResNet model-based artificial intelligence application, in the diagnosis of skin diseases. Chinese Journal of Dermatology. 2023. DOI: 10.35541/cjd.20220925. PMID: 37032592. Data collected: Feb 16 – Jul 4, 2022.
- Lu Feng, Liu Xin, Zhu Yajie, Li Xiaohong, Yu Jianbin, Dong Huiting. Accuracy of Artificial Intelligence Multi-class Algorithm in the Diagnosis of Common Skin Diseases. Henan Medical Research. 2022;31(8):1387-1392. DOI: 10.3969/j.issn.1004-437X.2022.08.010. Data collected: Feb – Aug 2021.
- Escalé-Besa A et al. Using artificial intelligence to improve the diagnostic and management capabilities of primary care physicians in skin lesions with a risk of malignancy. Nature Scientific Reports. 2023. DOI: 10.1038/s41598-023-31340-1
- Zaar O et al. Comparison of artificial intelligence and dermatologists for the diagnosis of skin disease. Acta Dermato-Venereologica. 2020.
- Kamulegeya L et al. Using artificial intelligence on dermatology conditions in Uganda: a case for diversity in training data sets for machine learning. African Health Sciences. 2023;23(2):753-763. PMC10782289. DOI: 10.4314/ahs.v23i2.86
White papers
- Boots UK GP Reader Study. Börve A et al. Autoderm Inc / Boots UK. November 2024. Autoderm V2.3. Randomized reader study, n=10 UK GPs, 20 cases. White paper. (Same study as PMCF Study 2, Section 9.2.)
- Medical Students Reader Study. September 2024. White paper.
- Fitzpatrick17K Bias Evaluation. Internal study. Autoderm V2.3.0. 26 conditions, FST I-VI. 2024.
- Jayawickrama M, Charalambous K, Börve A. AI dermatology in action: How its diagnostic accuracy compares to dermatologists (Coachella Study). White paper. May 29, 2025. Autoderm V2.2, n=91, 40 conditions.
- Börve A. Post Market Surveillance Report: myGP app. White paper. June 11, 2024. Autoderm V2.3, n=18,359 analyses (May 2024), 370K+ cumulative screened.
PMCF studies
- Visiba Care Post-Market Clinical Follow-Up Report. iDoc24 AB / Visiba Group AB. September 2021. Autoderm V2.0, n=1,092 observations, 20 ICD-10 coded conditions. EU MDR Annex XIV.
- Boots UK GP Survey: see white paper #6 above (same study).
Historical partnership evidence
- Cooperation Agreement: iDoc24 AB / Medicoo Svenska AB (April 2018). First commercial deployment, pre-V0.1, Sweden.
- Service Trial Agreement & DPA: iDoc24 Inc / Visiba Group AB (December 2019). Pre-V0.1 production model, Sweden/Norway/Finland/UK.