Vulnerability Scanning¶

Model vulnerability scanning is the process of inspecting a model for known attack vectors before it enters your environment. Like scanning container images or dependencies, this is a pre-deployment gate, not a one-time activity.

What you are scanning for¶

Backdoors and trojans¶

A backdoored model behaves normally on standard inputs but produces attacker-chosen outputs when a specific trigger is present. Triggers can be a particular word, phrase, pixel pattern, or data structure.

How they get introduced:

During pre-training by a malicious actor
During fine-tuning with poisoned data
Through model modification after training (weight manipulation)

Why they are dangerous:

The model passes standard evaluations and benchmarks
The trigger can be subtle and difficult to detect
Once deployed, the backdoor is active in production

Adversarial triggers¶

Unlike backdoors (which are intentionally implanted), adversarial triggers exploit natural model vulnerabilities. Specific inputs cause the model to produce incorrect, harmful, or unexpected outputs.

For LLMs, this includes:

Prompt injection attacks
Jailbreak sequences
Instruction-following bypasses
Context window manipulation

Serialisation exploits¶

As covered in Provenance and Integrity, certain model file formats can execute arbitrary code. Scanning includes checking:

Pickle files for embedded code execution
Model files for unexpected file references
Archive formats for path traversal attacks

Model extraction indicators¶

Signs that a model may be an unauthorised copy or extraction of another model:

Suspiciously similar outputs to a known commercial model
Model card claims that do not match observed capabilities
Licence violations in the model's lineage

Scanning approaches¶

Static analysis¶

Inspect the model artefact without running it.

Technique	What it catches	Tools
Format validation	Serialisation exploits, malformed files	`fickling` (pickle scanner), `safetensors` validation
Weight analysis	Statistical anomalies suggesting tampering	Custom scripts, `pytorch` inspection
Metadata inspection	Missing or suspicious metadata	Model card validators
Dependency audit	Vulnerable framework versions	Standard dependency scanners

Dynamic analysis¶

Run the model in a controlled environment and observe behaviour.

Technique	What it catches	Approach
Trigger scanning	Known backdoor triggers	Run model against trigger datasets
Behavioural testing	Unexpected outputs, safety bypasses	Red-team prompt sets, adversarial inputs
Differential testing	Output divergence from expected behaviour	Compare against reference model outputs
Stress testing	Edge case failures, degradation patterns	Extreme and unusual inputs

Continuous scanning¶

Scanning is not a one-time gate. Models should be rescanned when:

A new vulnerability class is discovered
Scanning tools are updated with new signatures
The model is modified (fine-tuning, quantisation, distillation)
The model is moved to a new environment

Building a scanning pipeline¶

A practical model scanning pipeline includes these stages:

Stage 1: Format and integrity

Verify file format (reject pickle from untrusted sources)
Verify cryptographic hashes
Validate model metadata and structure

Stage 2: Static analysis

Scan for serialisation exploits
Analyse weight distributions for anomalies
Check for known vulnerability signatures

Stage 3: Behavioural evaluation

Run against standard benchmark suite
Run against adversarial test suite
Run against organisation-specific test cases
Compare outputs against expected baselines

Stage 4: Documentation and approval

Record all scan results
Flag any findings that require review
Route to appropriate approver if findings are present
Block deployment if critical findings are unresolved

Scanning is necessary but not sufficient

No scanning tool catches every vulnerability. Scanning reduces risk; it does not eliminate it. Combine scanning with provenance verification, access controls, and runtime monitoring for defence in depth. Once deployed, runtime controls provide the next layer of protection.

Current tooling landscape¶

The model scanning ecosystem is still maturing. Key tools to evaluate:

Tool	Focus	Notes
fickling	Pickle file analysis	Detects code execution in pickle files
ModelScan	ML model security scanning	Scans for unsafe operations in model files
Rebuff	LLM prompt injection detection	Focuses on prompt-level attacks
Garak	LLM vulnerability scanning	Broad LLM vulnerability probe framework
NB Defence	Jupyter notebook scanning	Secures the development environment

Build your own test suites

Generic scanning tools are a starting point. Build organisation-specific test suites that reflect your use cases, threat model, and acceptable behaviour boundaries. These custom tests often catch more relevant issues than generic scanners.

References