Skip to content

Vulnerability Scanning

Model vulnerability scanning is the process of inspecting a model for known attack vectors before it enters your environment. Like scanning container images or dependencies, this is a pre-deployment gate, not a one-time activity.

What you are scanning for

Backdoors and trojans

A backdoored model behaves normally on standard inputs but produces attacker-chosen outputs when a specific trigger is present. Triggers can be a particular word, phrase, pixel pattern, or data structure.

How they get introduced:

  • During pre-training by a malicious actor
  • During fine-tuning with poisoned data
  • Through model modification after training (weight manipulation)

Why they are dangerous:

  • The model passes standard evaluations and benchmarks
  • The trigger can be subtle and difficult to detect
  • Once deployed, the backdoor is active in production

Adversarial triggers

Unlike backdoors (which are intentionally implanted), adversarial triggers exploit natural model vulnerabilities. Specific inputs cause the model to produce incorrect, harmful, or unexpected outputs.

For LLMs, this includes:

  • Prompt injection attacks
  • Jailbreak sequences
  • Instruction-following bypasses
  • Context window manipulation

Serialisation exploits

As covered in Provenance and Integrity, certain model file formats can execute arbitrary code. Scanning includes checking:

  • Pickle files for embedded code execution
  • Model files for unexpected file references
  • Archive formats for path traversal attacks

Model extraction indicators

Signs that a model may be an unauthorised copy or extraction of another model:

  • Suspiciously similar outputs to a known commercial model
  • Model card claims that do not match observed capabilities
  • Licence violations in the model's lineage

Scanning approaches

Static analysis

Inspect the model artefact without running it.

Technique What it catches Tools
Format validation Serialisation exploits, malformed files fickling (pickle scanner), safetensors validation
Weight analysis Statistical anomalies suggesting tampering Custom scripts, pytorch inspection
Metadata inspection Missing or suspicious metadata Model card validators
Dependency audit Vulnerable framework versions Standard dependency scanners

Dynamic analysis

Run the model in a controlled environment and observe behaviour.

Technique What it catches Approach
Trigger scanning Known backdoor triggers Run model against trigger datasets
Behavioural testing Unexpected outputs, safety bypasses Red-team prompt sets, adversarial inputs
Differential testing Output divergence from expected behaviour Compare against reference model outputs
Stress testing Edge case failures, degradation patterns Extreme and unusual inputs

Continuous scanning

Scanning is not a one-time gate. Models should be rescanned when:

  • A new vulnerability class is discovered
  • Scanning tools are updated with new signatures
  • The model is modified (fine-tuning, quantisation, distillation)
  • The model is moved to a new environment

Building a scanning pipeline

A practical model scanning pipeline includes these stages:

Stage 1: Format and integrity

  • Verify file format (reject pickle from untrusted sources)
  • Verify cryptographic hashes
  • Validate model metadata and structure

Stage 2: Static analysis

  • Scan for serialisation exploits
  • Analyse weight distributions for anomalies
  • Check for known vulnerability signatures

Stage 3: Behavioural evaluation

  • Run against standard benchmark suite
  • Run against adversarial test suite
  • Run against organisation-specific test cases
  • Compare outputs against expected baselines

Stage 4: Documentation and approval

  • Record all scan results
  • Flag any findings that require review
  • Route to appropriate approver if findings are present
  • Block deployment if critical findings are unresolved

Scanning is necessary but not sufficient

No scanning tool catches every vulnerability. Scanning reduces risk; it does not eliminate it. Combine scanning with provenance verification, access controls, and runtime monitoring for defence in depth. Once deployed, runtime controls provide the next layer of protection.

Current tooling landscape

The model scanning ecosystem is still maturing. Key tools to evaluate:

Tool Focus Notes
fickling Pickle file analysis Detects code execution in pickle files
ModelScan ML model security scanning Scans for unsafe operations in model files
Rebuff LLM prompt injection detection Focuses on prompt-level attacks
Garak LLM vulnerability scanning Broad LLM vulnerability probe framework
NB Defence Jupyter notebook scanning Secures the development environment

Build your own test suites

Generic scanning tools are a starting point. Build organisation-specific test suites that reflect your use cases, threat model, and acceptable behaviour boundaries. These custom tests often catch more relevant issues than generic scanners.