MLSECOPS: Secure your Large Language Model (LLM) applications

Lê Thành Phúc
6 min readAug 2, 2024

--

In today’s era of advanced technology, machine learning (ML) systems, particularly large language models (LLMs), are revolutionizing various industries. However, with the increasing adoption of these systems, ensuring their security, reliability, and integrity is paramount. ML SecOps (Machine Learning Security Operations) provides a comprehensive approach to integrating security throughout the ML lifecycle. Built on the principles of DevSecOps, the ML SecOps pipeline aims to seamlessly incorporate security practices into every stage of the machine learning workflow, from data collection and model training to deployment and monitoring. Similar to my article on DevSecOps, I have also conducted research on protecting my LLM applications during my master’s thesis. I’m excited to share it with you now.

Proposed MLO Pipeline

A Journey MLSecOps Pipeline

Overview

The MLSecOps pipeline for Large Language Models (LLMs) comprises 10 stages, each designed to ensure the security, integrity, and quality of the machine learning models throughout their lifecycle. Here’s a brief overview of each stage and its purpose:

This article will not delve into the specifics of training large language models (LLMs), including data preprocessing, classification and labeling, configuring training hyper-parameters, and evaluating the model. Perhaps in a future article, I will explore the topic of LLM applications within the security field.

Trigger Pipeline:

  • Events: Merge, Commit, Timer, Jenkins job.
  • Purpose: Automatically initiates the pipeline based on predefined events to maintain continuous integration and deployment.

Load Artifact:

  • Components: Datasets, Pre-trained models, Jupiter Notebook, Library requirements.
  • Purpose: To load the necessary artifacts into the system for the preparation of training and evaluating the model.

Store Artifact:

  • Locations: S3 bucket, Nexus, FTP.
  • Purpose: Artifacts are stored in locations such as S3 buckets, Nexus, or via FTP for future reference and access.

Security and Quality Tools:

  • Tools: Gitleak, Sonarqube, Trivy, OWASP Dependency-Check, NB Defense, OpenPubKey, compliance-checker or custom script for validation and compliance check on dataset.
  • Purpose: Various tools are utilized to scan and test the code and its dependencies for vulnerabilities and quality issues. Additionally, testing is conducted to ensure the training dataset complies with policies and standards.

Quality Gate:

  • Policies and Rules: Ensure compliance with security and quality standards.
  • Purpose: Applies policies and rules to ensure only compliant artifacts proceed further.

Train Model:

  • Technical: Early stopping, KFold.
  • Purpose: Train machine learning models using robust techniques to ensure accuracy and reliability.

Evaluation Model:

  • Metrics: Scoring, Threshold.
  • Purpose: Evaluates the trained model against predefined metrics and thresholds to ensure it meets performance criteria.

Model Testing:

  • Focus: OWASP Top 10 for LLM App, Prompt Injection, Data Poisoning, Sensitive Information Disclosure.
  • Purpose: Conducts comprehensive security testing to identify and mitigate vulnerabilities.

Model Quality Gate:

  • Policies and Rules: Reapply to ensure the model’s quality before final steps.
  • Purpose: Ensure the model adheres to all quality standards before signing.

Sign Model:

  • Artifacts: model.h5, tokenizer.pickle.
  • Purpose: Signs the model and tokenizer to verify their integrity and origin.

Save Model:

  • Locations: S3 bucket, Nexus, FTP.
  • Purpose: Stores the signed model and tokenizer securely.

Monitoring & Key Management:

  • Monitoring Tools: Slack, Telegram, Defect Dojo, ELK, Grafana.
  • Key Management Tools: Hashicorp Vault, KMS, Secret Manager.
  • Purpose: Continuously monitors the pipeline for vulnerabilities and manages security keys throughout all stages.

Detailed Analysis

Stage 1 — Trigger the Pipeline

The pipeline trigger when:

  • Merge request
  • Commit code
  • Timer
  • Jenkins job

At this stage, the pipeline is triggered by specific events. These events include merge requests, code commits, scheduled times (timers), or the execution of a Jenkins job.

Stage 2 — Load Artifact

Load artifact:

  • Datasets
  • Pre-trained models
  • Notebook
  • Library requirements

At this stage, all necessary components are loaded to ensure the model can be trained and evaluated. These components include datasets, pre-trained models, notebooks, and library requirements.

Store Artifacts:

  • AWS S3 bucket
  • Nexus
  • FTP

Artifacts are stored in solutions such as AWS S3 buckets, Nexus repositories, or via FTP. It’s crucial to maintain a whitelist for external libraries and components used in training, keep them on our system, and conduct regular vulnerability assessments. Directly downloading these from the internet is not recommended.

Stage 3 — Testing Tools

Tools:

  • Gitleak
  • Sonarqube
  • Trivy
  • OWASP dependency-check
  • NB Defense
  • compliance-checker
  • Custom script
  • OpenPubKey

At this phase, conducting security testing on all artifacts before training is essential. These tools are utilized to scrutinize the source code, pinpoint potential security risks, evaluate code quality, and verify their integrity and origin, as well as ensure compliance with policies and standards.

Compiance-checker or Custom scripts will be utilized to conduct checks on training datasets, ensuring the absence of sensitive information. The following are essential categories of information to scrutinize:

A table of information categories to consider.

Stage 4 — Quality Gate

Gate:

  • Policy/Rule check

Implement policies and regulations to guarantee that only code and artifacts meeting the established requirements are allowed to advance to subsequent stages.

It’s widely recognized that training data is crucial for machine learning models. Ensuring this data does not contain sensitive or personally identifiable information (PII) is paramount. Thus, the Quality Gate process will mirror DevSecOps, augmented by a compliance check of the training data prior to its use in model training.

Quality Gate — PASS = Security AND Compliance

Stage 5 — Train Model

Technical:

  • Early stopping
  • Kfold

Training models using techniques like early stopping and KFold cross-validation is essential for ensuring robustness and accuracy in model performance. These methods are among the numerous training strategies that contribute to achieving optimal performance.

Regularization by Early Stopping — GeeksforGeeks
3.1. Cross-validation: evaluating estimator performance — scikit-learn 1.5.1 documentation

Stage 6 — Evaluation Model

Metrics:

  • Metrics/Scoring
  • Threshold

The model will be evaluated using a range of metrics and scoring methods to ensure it satisfies the required performance criteria. The appropriate thresholds for the newly trained model will be determined based on these metrics.

ML model evaluation metrics

Stage 7 — Final Testing

Standard & Technical:

  • OWASP Top 10 for LLM App
  • Prompt Injection
  • Data Poisoning
  • Sensitive Information Disclosure

Tools:

  • modelscan
  • Vigil
  • Garak

Conducting comprehensive security tests is crucial to identify vulnerabilities like prompt injections, data poisoning, and sensitive information disclosures. Specialized tools are employed to examine the model for a range of security vulnerabilities to ensure it adheres to security standards.

Stage 8 — Final Quality Gate

Gate:

  • Policy/Rule check

Ensure the final product meets all quality standards prior to its release for use.

Quality Gate — PASS = Security

Stage 9 — Sign Model

Signing:

  • Signing trained mode is crucial to ensure their integrity and origin.

Tools:

  • OpenPubKey

Stage 10 — Store Signed Model

Store Artifacts:

  • AWS S3 bucket
  • Nexus
  • FTP

Save the signed model in the storage systems.

Key Management

Tools:

  • Hasicorp Vault
  • AWS Secret Manager
  • AWS Key Management Service

Managing security keys, including secret keys, public keys, and private keys, is crucial for protecting sensitive data at all stages.

Monitoring

Tools:

  • Telegram/ Slack
  • Defect Dojo
  • ELK
  • PagerDuty
  • Prometheus & Grafana

Notifications, vulnerability management, and event/logging should be implemented across all stages to effectively monitor and manage vulnerabilities.

To enhance security for larger language models (LLM) using an MLSecOps pipeline, this structured approach ensures that every phase of the machine learning model lifecycle is secured. From development and testing to training and monitoring, this method enhances the overall robustness and security of ML systems. By integrating security best practices at each stage, potential vulnerabilities are identified and mitigated early, ensuring that the LLMs operate within a secure and resilient framework. This approach not only protects the models themselves but also safeguards the data and infrastructure they rely on, providing a comprehensive security posture for advanced machine learning applications.

Contribution

We welcome contributions from the community to help us expand and improve this MLSecOps pipeline and repository. If you have suggestions, tools, or resources that you believe should be included, please feel free to submit a pull request or open an issue.

Resources:

Github Repository: MLSecOps-DevSecOps-Awesome

--

--

Lê Thành Phúc
Lê Thành Phúc

No responses yet