Template: SOP Machine Learning Model Development (AI)

1. General Information

This Standard Operating Procedure (SOP) outlines the approach we use for developing, training, and updating machine learning (ML) models for integration into our medical devices. Adhering to this process helps ensure that ML models are deployed and updated as intended.

Process Owner: CTO
Key Performance Indicators (KPI):

Regulatory references:

ISO 13485:2016 Chapter 7.5

Relevant other documentation:

  • SOP Integrated Software Development
  • Intended use
  • Data acquisition instructions
  • Data annotation instructions
  • Software requirements list
  • Applicable regulations list
  • Medical device list
  • Algorithm Validation Report

1.1. Development and Integration Requirements

When establishing the requirements for an ML model, it is crucial to consider the overall medical device requirements:

  • General Medical Device Information\
    Includes intended use and a description of the device.
  • List of Software Requirements\
    Specifies key features required for the ML model, such as minimum latency, response time, and accuracy.
  • Software Development and Maintenance Plan\
    Outlines hardware constraints and interoperability standards.
  • List of Applicable Regulations\
    May include requirements for data privacy, IT security, and compliance.

1.2. Project Management Tool

\ is utilized to monitor the integration process and version control for ML projects. It also documents detailed aspects of the ML model architecture, such as hyperparameters and evaluation metrics.

2. Process Overview

2.1. Configuration of Development Environment

The development environment is where the creation, enhancement, and validation of ML models occur. It must be separate from the production environment to ensure a controlled deployment process that includes model evaluation, testing, refinement, and release.

The development server serves as a testing environment where developers and data scientists can experiment with new ML features, which are tested here before moving to production.

Participants ML developer
Input ML development infrastructure (e.g., cloud server)
Output Configured development environment

2.2. Define Instructions on Data Acquisition and Annotation

Based on the initial device description from the SOP Integrated Software Development, the Medical team gathers and records relevant medical background information.

Instructions for Data Acquisition

The Operations and Machine Learning teams create instructions for data acquisition necessary for ML model training and development. These instructions should detail:

  • Type of data (e.g., patient population, medical modality, data format, etc.)
  • Required data volume and backup measures
  • Data sources (e.g., outpatient clinics, hospital information systems)

These instructions outline dataset composition, size, and other technical specifications, including the rationale for their suitability for model development. They must be detailed enough to guide employees in acquiring the correct data and serve as evidence for validation. Approval by the CTO is required.

Instructions for Data Annotation

The Medical and Machine Learning teams prepare instructions for data annotation. These should specify how datasets must be labeled by medical experts, including the style, accuracy, and format of annotations. The instructions must provide enough detail for evaluation and quality checks. The rationale behind label selection and the methodology for establishing ground truth should be documented. Approval is needed from at least the CTO, QMO, and a medical expert.

Both the data acquisition and annotation instructions should be monitored for relevance and completeness, with updates made as needed.

Participants Operations team, Medical team, ML team
Input Device description
Output Instructions for Data Acquisition, Instructions for Data Annotation

2.3. Collection and Annotation of Data

The Operations team is responsible for obtaining data as per the Data Acquisition instructions from partner organizations.

They must ensure all partners are aware of legal and data privacy concerns. Data exports must be approved by the data protection officer from both the sending and receiving entities.

If data has been previously annotated, it must be reviewed against the Data Annotation instructions. Data not meeting the standards is discarded. For raw data, the Operations team must hire medical experts to perform the required annotation according to the instructions. The annotation workforce must be contracted according to the organization’s procurement procedures.

Participants Operations team
Input Instructions for Data Acquisition, Instructions for Data Annotation, SOP Purchasing (if applicable)
Output Annotated data

2.4. Data Pre-Processing

After data collection, the Machine Learning team conducts necessary pre-processing steps such as data cleaning, normalization, and/or feature extraction to create a refined dataset as specified in the Data Acquisition instructions. This may include:

  • Conducting exploratory data analysis (EDA) to understand data distribution, detect missing values, and identify outliers.
  • Addressing missing data through imputation or removal while considering patient privacy and data integrity.
  • Handling outliers and anomalies using statistical techniques and domain expertise.
  • Normalizing or scaling data to ensure consistency and reduce the impact of feature scale differences.

The final dataset is split into training, validation, and test sets, each stored separately. The split ratio and dataset locations are recorded in \ to enable replication if needed.

Participants Machine Learning team
Input Annotated data
Output Prepared dataset for model development

2.5. ML Model Development

The ML team selects suitable algorithms and techniques based on the development and integration requirements.

Considerations include interpretability, robustness, and performance.

The training dataset is used for model training, with performance evaluated on the validation dataset using metrics such as accuracy, precision, recall, or F1 score. The model is iteratively refined by adjusting algorithms, features, or hyperparameters based on these results.

Final performance testing is conducted on an independent dataset or using cross-validation techniques. Pass/fail criteria are set before testing. Additional validations, including cross-validation or external assessments, may be performed to confirm model generalizability.

The best-performing model is chosen based on evaluation metrics, balancing accuracy, interpretability, and clinical importance. The Machine Learning team then compiles the Algorithm Validation Report, which includes:

  • Details of development resources (e.g., team qualifications, infrastructure, SOUP, and frameworks used)
  • Information on dataset acquisition, annotation, and pre-processing
  • A description of the final model, including architecture, hyperparameters, evaluation metrics, and limitations
Participants Machine Learning team
Input Final dataset for model development
Output Algorithm Validation Report

2.6. Production Deployment of ML Model

The finalized ML model is prepared for deployment and integration into the medical device, ensuring compatibility with memory, processing capabilities, and real-time processing needs.

The ML team configures the production environment, which hosts the deployed ML models for real-time data processing as part of the medical device. Detailed deployment instructions can be found in SOP Deployment.

Participants ML developer
Input ML development infrastructure (e.g., cloud server)
Output Configured production environment

2.7. Handling of Updates

ML model updates are initiated through the SOP Change Management and SOP Integrated Software Development.

New data is consistently collected and analyzed to maintain the model’s relevance and alignment with current technology.

ML model performance is regularly updated and evaluated by revisiting the outlined steps.

Participants Machine Learning team
Input Updated ML model requirements
Output Updated ML model

This template is copyrighted by fdatoday.com and is used under their template license. Kindly retain this notice, even if you make modifications to the contents of the template. 

fdatoday.com templates are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

Related Posts