Skip to content

Model Lifecycle

A model's lifecycle runs from its creation through production use to eventual retirement. Each stage transition is a decision point where security controls must apply. Without lifecycle management, you lose track of what is deployed, what was approved, and what should be retired.

Lifecycle stages

Training → Evaluation → Registration → Staging → Production → Retirement
              ↑                           ↑          ↑
          Validation                  Approval    Monitoring
            gate                       gate       (runtime)

Training output

A training run produces a model artefact. At this point, the model is an unvalidated output, not yet trusted for any purpose.

Required before moving forward:

  • Training run attestation recorded (inputs, environment, outputs)
  • Model artefact hash computed and stored
  • Training metrics logged (loss, accuracy, training curves)
  • No errors or anomalies in training logs

Evaluation

The model is tested against validation datasets, safety benchmarks, and adversarial test suites.

Required before moving forward:

  • Performance meets minimum thresholds for the target use case
  • Safety evaluations pass (bias, toxicity, refusal rates)
  • Adversarial tests pass (prompt injection resistance, jailbreak resistance)
  • Regression tests pass (no degradation from current production model)
  • Evaluation results stored as immutable artefacts

Evaluation must be independent

The team that trained the model should not be the only team evaluating it. Independent evaluation, whether by a separate team, automated pipeline, or both, reduces the risk of confirmation bias and missed issues.

Registration

An evaluated model is registered in the model registry with its metadata, evaluation results, and provenance information.

The registry should store:

Field Purpose
Model name and version Unique identification
Model hash Integrity verification
Training attestation Provenance and reproducibility
Evaluation results Evidence of fitness for purpose
Model card Documentation of capabilities, limitations, intended use
Licence Legal and compliance information
Owner Who is responsible for this model
Status Current lifecycle stage (registered, staging, production, retired)

Staging

The model is deployed to a staging environment that mirrors production. This validates that the model works correctly in the target infrastructure, not just in the evaluation environment.

What staging validates:

  • Model loads and serves correctly on production hardware
  • Latency and throughput meet requirements
  • Integration with upstream and downstream services works
  • Logging and monitoring are active and producing expected data
  • Rollback procedures work (deploy previous version, verify it serves correctly)

Production approval

Moving from staging to production requires explicit approval. This is the final pre-runtime gate.

Approval should document:

  • Who approved the deployment and when
  • What evidence was reviewed (evaluation results, staging validation)
  • What risks were accepted
  • What monitoring and rollback plans are in place
  • When the next review is scheduled

Who approves depends on the risk tier:

Risk tier Approval authority
Low (internal tools, non-sensitive) ML team lead
Medium (customer-facing, non-critical) Product owner + ML lead
High (decision-making, regulated) AI risk committee or equivalent

Production

The model is serving production traffic. From this point, runtime security takes over for monitoring, guardrails, and incident response.

Pre-runtime responsibilities that continue during production:

  • Monitoring for model drift (performance degradation over time)
  • Responding to newly discovered vulnerabilities in the model or its dependencies
  • Planning and executing model updates through the full lifecycle

Retirement

Models do not run forever. Retirement is a planned lifecycle stage, not an afterthought.

Retirement triggers:

  • A newer model replaces it
  • The use case is discontinued
  • A security vulnerability is discovered that cannot be mitigated
  • Regulatory changes make the model non-compliant
  • Performance has degraded below acceptable thresholds

Retirement process:

  • Traffic is migrated to the replacement model (or the service is decommissioned)
  • The model is removed from production serving
  • The model remains in the registry with a "retired" status for audit purposes
  • Associated resources (compute, storage) are cleaned up
  • Documentation is updated to reflect retirement

Version management

Version numbering

Adopt a consistent versioning scheme. Semantic versioning adapted for models:

Change type Version bump Example
Architecture change Major 1.0.0 → 2.0.0
Retrained on new data Minor 1.0.0 → 1.1.0
Configuration change Patch 1.0.0 → 1.0.1
Quantisation or distillation Qualifier 1.0.0 → 1.0.0-q4

What constitutes a new version

Every change that could affect model behaviour produces a new version:

  • New training data
  • Changed hyperparameters
  • Updated base model (for fine-tuned models)
  • Different quantisation
  • Changed serving configuration (if it affects output)
  • Updated prompt templates (for LLM systems)

Prompt changes are model changes

For LLM-based systems, changing the system prompt can alter behaviour as significantly as retraining. Treat prompt changes as version changes that go through the same lifecycle gates.

Rollback

Every production deployment must have a tested rollback plan.

Rollback requirements:

  • Previous model version remains available and deployable
  • Rollback can be executed quickly (minutes, not hours)
  • Rollback procedure is documented and tested during staging
  • Rollback does not require the person who deployed the model

When to rollback:

  • Output quality drops below acceptable thresholds
  • Safety monitoring triggers alerts
  • A security vulnerability is discovered in the model
  • Downstream systems report integration issues