Model Lifecycle¶

A model's lifecycle runs from its creation through production use to eventual retirement. Each stage transition is a decision point where security controls must apply. Without lifecycle management, you lose track of what is deployed, what was approved, and what should be retired.

Lifecycle stages¶

Training → Evaluation → Registration → Staging → Production → Retirement
              ↑                           ↑          ↑
          Validation                  Approval    Monitoring
            gate                       gate       (runtime)

Training output¶

A training run produces a model artefact. At this point, the model is an unvalidated output, not yet trusted for any purpose.

Required before moving forward:

Training run attestation recorded (inputs, environment, outputs)
Model artefact hash computed and stored
Training metrics logged (loss, accuracy, training curves)
No errors or anomalies in training logs

Evaluation¶

The model is tested against validation datasets, safety benchmarks, and adversarial test suites.

Required before moving forward:

Performance meets minimum thresholds for the target use case
Safety evaluations pass (bias, toxicity, refusal rates)
Adversarial tests pass (prompt injection resistance, jailbreak resistance)
Regression tests pass (no degradation from current production model)
Evaluation results stored as immutable artefacts

Evaluation must be independent

The team that trained the model should not be the only team evaluating it. Independent evaluation, whether by a separate team, automated pipeline, or both, reduces the risk of confirmation bias and missed issues.

Registration¶

An evaluated model is registered in the model registry with its metadata, evaluation results, and provenance information.

The registry should store:

Field	Purpose
Model name and version	Unique identification
Model hash	Integrity verification
Training attestation	Provenance and reproducibility
Evaluation results	Evidence of fitness for purpose
Model card	Documentation of capabilities, limitations, intended use
Licence	Legal and compliance information
Owner	Who is responsible for this model
Status	Current lifecycle stage (registered, staging, production, retired)

Staging¶

The model is deployed to a staging environment that mirrors production. This validates that the model works correctly in the target infrastructure, not just in the evaluation environment.

What staging validates:

Model loads and serves correctly on production hardware
Latency and throughput meet requirements
Integration with upstream and downstream services works
Logging and monitoring are active and producing expected data
Rollback procedures work (deploy previous version, verify it serves correctly)

Production approval¶

Moving from staging to production requires explicit approval. This is the final pre-runtime gate.

Approval should document:

Who approved the deployment and when
What evidence was reviewed (evaluation results, staging validation)
What risks were accepted
What monitoring and rollback plans are in place
When the next review is scheduled

Who approves depends on the risk tier:

Risk tier	Approval authority
Low (internal tools, non-sensitive)	ML team lead
Medium (customer-facing, non-critical)	Product owner + ML lead
High (decision-making, regulated)	AI risk committee or equivalent

Production¶

The model is serving production traffic. From this point, runtime security takes over for monitoring, guardrails, and incident response.

Pre-runtime responsibilities that continue during production:

Monitoring for model drift (performance degradation over time)
Responding to newly discovered vulnerabilities in the model or its dependencies
Planning and executing model updates through the full lifecycle

Retirement¶

Models do not run forever. Retirement is a planned lifecycle stage, not an afterthought.

Retirement triggers:

A newer model replaces it
The use case is discontinued
A security vulnerability is discovered that cannot be mitigated
Regulatory changes make the model non-compliant
Performance has degraded below acceptable thresholds

Retirement process:

Traffic is migrated to the replacement model (or the service is decommissioned)
The model is removed from production serving
The model remains in the registry with a "retired" status for audit purposes
Associated resources (compute, storage) are cleaned up
Documentation is updated to reflect retirement

Version management¶

Version numbering¶

Adopt a consistent versioning scheme. Semantic versioning adapted for models:

Change type	Version bump	Example
Architecture change	Major	1.0.0 → 2.0.0
Retrained on new data	Minor	1.0.0 → 1.1.0
Configuration change	Patch	1.0.0 → 1.0.1
Quantisation or distillation	Qualifier	1.0.0 → 1.0.0-q4

What constitutes a new version¶

Every change that could affect model behaviour produces a new version:

New training data
Changed hyperparameters
Updated base model (for fine-tuned models)
Different quantisation
Changed serving configuration (if it affects output)
Updated prompt templates (for LLM systems)

Prompt changes are model changes

For LLM-based systems, changing the system prompt can alter behaviour as significantly as retraining. Treat prompt changes as version changes that go through the same lifecycle gates.

Rollback¶

Every production deployment must have a tested rollback plan.

Rollback requirements:

Previous model version remains available and deployable
Rollback can be executed quickly (minutes, not hours)
Rollback procedure is documented and tested during staging
Rollback does not require the person who deployed the model

When to rollback:

Output quality drops below acceptable thresholds
Safety monitoring triggers alerts
A security vulnerability is discovered in the model
Downstream systems report integration issues

References