Vulnerability Scanning¶
Model vulnerability scanning is the process of inspecting a model for known attack vectors before it enters your environment. Like scanning container images or dependencies, this is a pre-deployment gate, not a one-time activity.
What you are scanning for¶
Backdoors and trojans¶
A backdoored model behaves normally on standard inputs but produces attacker-chosen outputs when a specific trigger is present. Triggers can be a particular word, phrase, pixel pattern, or data structure.
How they get introduced:
- During pre-training by a malicious actor
- During fine-tuning with poisoned data
- Through model modification after training (weight manipulation)
Why they are dangerous:
- The model passes standard evaluations and benchmarks
- The trigger can be subtle and difficult to detect
- Once deployed, the backdoor is active in production
Adversarial triggers¶
Unlike backdoors (which are intentionally implanted), adversarial triggers exploit natural model vulnerabilities. Specific inputs cause the model to produce incorrect, harmful, or unexpected outputs.
For LLMs, this includes:
- Prompt injection attacks
- Jailbreak sequences
- Instruction-following bypasses
- Context window manipulation
Serialisation exploits¶
As covered in Provenance and Integrity, certain model file formats can execute arbitrary code. Scanning includes checking:
- Pickle files for embedded code execution
- Model files for unexpected file references
- Archive formats for path traversal attacks
Model extraction indicators¶
Signs that a model may be an unauthorised copy or extraction of another model:
- Suspiciously similar outputs to a known commercial model
- Model card claims that do not match observed capabilities
- Licence violations in the model's lineage
Scanning approaches¶
Static analysis¶
Inspect the model artefact without running it.
| Technique | What it catches | Tools |
|---|---|---|
| Format validation | Serialisation exploits, malformed files | fickling (pickle scanner), safetensors validation |
| Weight analysis | Statistical anomalies suggesting tampering | Custom scripts, pytorch inspection |
| Metadata inspection | Missing or suspicious metadata | Model card validators |
| Dependency audit | Vulnerable framework versions | Standard dependency scanners |
Dynamic analysis¶
Run the model in a controlled environment and observe behaviour.
| Technique | What it catches | Approach |
|---|---|---|
| Trigger scanning | Known backdoor triggers | Run model against trigger datasets |
| Behavioural testing | Unexpected outputs, safety bypasses | Red-team prompt sets, adversarial inputs |
| Differential testing | Output divergence from expected behaviour | Compare against reference model outputs |
| Stress testing | Edge case failures, degradation patterns | Extreme and unusual inputs |
Continuous scanning¶
Scanning is not a one-time gate. Models should be rescanned when:
- A new vulnerability class is discovered
- Scanning tools are updated with new signatures
- The model is modified (fine-tuning, quantisation, distillation)
- The model is moved to a new environment
Building a scanning pipeline¶
A practical model scanning pipeline includes these stages:
Stage 1: Format and integrity
- Verify file format (reject pickle from untrusted sources)
- Verify cryptographic hashes
- Validate model metadata and structure
Stage 2: Static analysis
- Scan for serialisation exploits
- Analyse weight distributions for anomalies
- Check for known vulnerability signatures
Stage 3: Behavioural evaluation
- Run against standard benchmark suite
- Run against adversarial test suite
- Run against organisation-specific test cases
- Compare outputs against expected baselines
Stage 4: Documentation and approval
- Record all scan results
- Flag any findings that require review
- Route to appropriate approver if findings are present
- Block deployment if critical findings are unresolved
Scanning is necessary but not sufficient
No scanning tool catches every vulnerability. Scanning reduces risk; it does not eliminate it. Combine scanning with provenance verification, access controls, and runtime monitoring for defence in depth. Once deployed, runtime controls provide the next layer of protection.
Current tooling landscape¶
The model scanning ecosystem is still maturing. Key tools to evaluate:
| Tool | Focus | Notes |
|---|---|---|
| fickling | Pickle file analysis | Detects code execution in pickle files |
| ModelScan | ML model security scanning | Scans for unsafe operations in model files |
| Rebuff | LLM prompt injection detection | Focuses on prompt-level attacks |
| Garak | LLM vulnerability scanning | Broad LLM vulnerability probe framework |
| NB Defence | Jupyter notebook scanning | Secures the development environment |
Build your own test suites
Generic scanning tools are a starting point. Build organisation-specific test suites that reflect your use cases, threat model, and acceptable behaviour boundaries. These custom tests often catch more relevant issues than generic scanners.