Skip to content

AI-Aware SDLC

Traditional software development lifecycles assume you are building deterministic systems from code you write. AI systems break that assumption. The model is not your code. The training data is not your database. The behaviour is not fully predictable. An effective SDLC for AI must account for these differences at every phase, not bolt AI considerations onto the end.

This page maps the complete lifecycle from ideation through to production operation. Each phase links to the detailed guidance elsewhere on this site and on AI Runtime Security. The goal is a single reference that shows where everything fits and what happens when.

AI-Aware SDLC

Phase 1: Ideation and use case evaluation

Before any technical work begins, evaluate whether AI is the right tool for the problem. Not every automation needs a model. Not every model needs to be an LLM.

Is AI the right approach?

Question If yes If no
Does the problem require handling ambiguous, unstructured input? AI is likely appropriate Consider rules, workflows, or traditional automation
Does the problem need natural language understanding or generation? LLM-based AI is likely appropriate Consider simpler ML models, search, or templating
Can the problem be solved with deterministic rules? AI adds complexity without proportionate benefit Use rules. AI is not a universal upgrade.
Is the cost of a wrong answer high? AI may still be appropriate, but requires higher-tier controls Simpler approaches may offer better predictability
Does the problem require real-time autonomous action? AI with agentic capabilities, plan for CRITICAL tier Consider whether human-in-the-loop achieves the same outcome more safely

The best AI decision might be no AI

If deterministic logic, keyword matching, or a simple lookup solves the problem reliably, adding AI introduces non-determinism, a larger attack surface, and ongoing operational cost. Use AI where it provides genuine capability that simpler approaches cannot match.

Use case definition

For each proposed AI use case, document:

  • Problem statement. What specific problem does this solve? What does success look like?
  • User context. Who uses it? Internal experts, internal general staff, external customers, or no direct user (autonomous)?
  • Data requirements. What data does the AI need access to? Is any of it sensitive, regulated, or confidential?
  • Decision impact. Does the AI make decisions, recommend decisions, or provide information?
  • Action capability. Will the AI take actions (write data, call APIs, send messages) or only read and respond?
  • Failure consequences. What happens when the AI is wrong? What is the blast radius?

This use case definition feeds directly into risk classification. The clearer the use case, the more accurate the classification.

Single agent or multi-agent?

Most AI use cases are single-agent: one model, one set of tools, one task. Multi-agent architectures (orchestrators, delegators, specialist agents) are appropriate when:

  • The task requires multiple distinct capabilities that cannot be served by one model with tools
  • Different parts of the workflow need different permission levels
  • The system needs to decompose complex goals into subtasks autonomously

If multi-agent, the MASO Framework applies, with additional multi-agent controls and delegation chain requirements. Each agent needs independent risk classification, and the system starts at MASO Tier 1 (Supervised) regardless of how confident you are in the design.

For single-agent systems, the Foundation Framework applies: core controls (the three-layer pattern) plus infrastructure controls (80 technical controls covering IAM, logging, network, data protection, secrets, supply chain, and incident response).

Phase 2: Risk classification

Once the use case is defined, classify it. This is not a one-time activity. Classification is revisited at every phase as the system's scope, data access, and capabilities become clearer.

Initial classification

Follow the risk classification process:

  1. Score impact dimensions (decision authority, reversibility, data sensitivity, audience, scale, regulatory)
  2. Apply the highest tier across all dimensions
  3. Factor in host application risk alignment
  4. Check regulatory requirements and adjust the floor upward if regulation demands it
  5. Document the classification with driving factors

What the tier unlocks

The tier determines the rigour required at every subsequent phase. Higher tiers do not just mean more controls. They mean more scrutiny, more independence in testing, more formal approvals, and more documentation.

Phase LOW MEDIUM HIGH CRITICAL
Model selection Documented choice Documented evaluation Formal assessment with alternatives Independent review of assessment
Platform selection Standard requirements Data residency check Compliance-driven selection Board-level platform approval
Build Standard CI/CD AI-specific pipeline gates Pipeline integrity verification Immutable, auditable pipelines
Test Basic prompt testing Structured test suite Red-team exercise Independent adversarial assessment
Deploy Standard deployment Subset rollout Canary with monitoring Shadow then canary with approval gates
Run Basic monitoring Guardrails + batch Judge Guardrails + near-real-time Judge + human oversight Full three-layer pattern with real-time Judge

Phase 3: Design

Design is where you make the decisions that constrain everything downstream. Three decisions dominate this phase: which model, which platform, and what architecture.

Model selection

Choose the model based on capability needs, risk tier requirements, and security posture.

Decision Guidance
Open-weight vs. closed API Risk assessment covers the tradeoffs. Open-weight gives you control and auditability. Closed API gives you managed infrastructure and regular updates.
Model provenance Provenance and integrity. At HIGH and CRITICAL tiers, you must verify the model's origin, training process, and integrity before use.
Model trust Trust and evaluation. Evaluate the model against your specific use case, not just generic benchmarks.
Vulnerability exposure Vulnerability scanning. Scan for known vulnerabilities, backdoors, and adversarial weaknesses.
Threat landscape Model threat landscape. Understand the current threats to the type of model you are selecting.

Platform selection

Choose the platform based on data requirements, compliance obligations, and operational capability.

Pattern When to use Guidance
Cloud AI services Most use cases. Managed infrastructure, rapid development, provider handles model serving. Cloud AI services
Self-hosted Regulated data that cannot leave your environment. Full control requirements. Air-gapped deployments. Self-hosted infrastructure
Hybrid Sensitive processing on-premises, less sensitive in the cloud. Gradual migration. Multi-cloud resilience. Hybrid patterns

Architecture decisions

Document the system architecture, including:

  • How users interact with the AI (API, chat interface, embedded in application)
  • What data flows into and out of the model
  • What tools or actions the model has access to
  • How the AI component connects to the host application
  • What the PACE resilience plan looks like for each critical component
  • Which infrastructure controls apply based on the risk tier

For agentic systems, also document:

For multi-agent systems, additionally document:

  • Agent roles and responsibilities
  • Delegation boundaries (what each agent can and cannot do, depth limits, privilege constraints)
  • Communication channels between agents
  • The orchestration pattern (central orchestrator, peer-to-peer, hierarchical)

Infrastructure control selection

During design, identify which of the 80 infrastructure controls apply to your system. The controls are organised by domain and tagged by risk tier, so you can quickly filter to what is relevant.

Domain Controls Key design decisions
Identity and access 8 controls Authentication model, least privilege, control/data plane separation
Logging and observability 10 controls What to log, retention, redaction, SIEM integration
Network and segmentation 8 controls Zone architecture, guardrail bypass prevention, egress controls
Data protection 8 controls Classification, minimisation, PII handling, RAG access controls
Secrets and credentials 8 controls Vault strategy, context window isolation, rotation policy
Supply chain 8 controls Model provenance, RAG data integrity, AI-BOM
Incident response 8 controls AI-specific categories, containment, rollback, investigation

Not every control applies to every system. Select based on your tier, deselect what does not apply, and document the rationale.

Platform-specific implementation patterns

If you are deploying on a specific cloud platform, the infrastructure section includes reference patterns for AWS Bedrock, Microsoft Foundry, and Databricks that map controls to platform-native services.

Phase 4: Build

Build the system using AI-aware DevOps and MLOps practices. The build phase is where pipeline integrity, data governance, and secrets management are established.

Pipeline security

Your CI/CD pipeline for AI has requirements that traditional software pipelines do not.

Area Guidance
CI/CD adaptation CI/CD for AI. Add model validation gates, artefact integrity checks, and AI-specific test stages.
Infrastructure Infrastructure as code. Define GPU provisioning, inference endpoints, and environment configuration as code.
Secrets Secrets management. AI systems have an expanded credential surface: model endpoints, vector databases, tool APIs, experiment trackers.
Integration Integration security. The AI component connects to web frontends, databases, APIs, and MCP servers. Secure every integration point.

Data governance

Data shapes model behaviour. Govern it accordingly.

Area Guidance
Data lineage Data governance. Know where your training and RAG data comes from, who modified it, and what it contains.
ML pipelines Secure ML pipelines. Ensure training integrity, reproducibility, and attestation.
Model lifecycle Model lifecycle. Manage transitions from training through staging to production with approval gates.
Experiments Experiment tracking. Protect intellectual property in hyperparameters, results, and experiment configurations.

Guardrail configuration

Configure input and output guardrails during the build phase, not after deployment. The control matrix specifies what is required at each tier, and the controls page details the three-layer pattern (guardrails, Judge, human oversight).

At a minimum:

  • Input guardrails: injection detection, content policy enforcement, rate limiting
  • Output guardrails: content filtering, PII detection, grounding checks (where applicable)
  • Logging: configured to the retention and protection level required by the tier
  • Network controls: guardrail bypass prevention at the network level (ensure traffic cannot reach the model without passing through guardrails)

For specialised deployments, also configure during build:

Resilience

Build PACE resilience into the application from the start. Every critical component (model endpoint, vector database, tool connections, pipeline infrastructure) needs a documented and tested fallback path. The PACE controls section defines fail postures for each control layer, and the PACE checklist provides verification criteria.

Phase 5: Test

Testing for AI systems goes beyond functional verification. You need to verify that the system behaves correctly, and that it resists deliberate attempts to make it misbehave.

Functional testing

  • Does the model produce useful, accurate outputs for the intended use case?
  • Do guardrails fire correctly (blocking what should be blocked, allowing what should be allowed)?
  • Do tools and integrations work as expected?
  • Does the PACE fallback logic trigger correctly when components fail?

Adversarial testing

Follow the adversarial testing requirements for your tier:

Tier Minimum requirement before deployment
LOW Basic prompt injection test suite
MEDIUM Structured adversarial test suite, no HIGH-severity findings
HIGH Red-team exercise completed, all findings remediated or risk-accepted
CRITICAL Independent adversarial assessment, senior stakeholder review, remediation plan approved

Domain-specific testing

Generic safety testing is insufficient. Test guardrails against the specific risk categories relevant to your use case. A model that refuses to generate malware may readily give inappropriate financial advice. See domain-specific guardrail tuning for guidance.

Risk classification review

After testing, revisit the risk classification. Testing may reveal:

  • The model accesses or generates data you did not anticipate
  • Attack surface is broader than assumed
  • Guardrails are less effective in your specific domain than in generic benchmarks

Adjust the tier if needed. It is better to upgrade before deployment than after an incident.

Phase 6: Deploy

Deployment is the gate between pre-runtime and runtime. No system passes through without meeting the requirements for its tier.

Deployment gate

Follow the production readiness checklist for your tier. The gate is a hard requirement, not a suggestion.

Key gate items:

  • Risk classification documented and approved
  • Adversarial testing passed at the required level
  • Guardrails configured and verified
  • PACE resilience plan documented and tested
  • Logging operational
  • Rollback procedure tested
  • Runtime security team notified and briefed

Staged deployment

Tier Strategy
LOW Direct deployment, monitor for one week
MEDIUM Subset of users first, expand after one to two weeks
HIGH Canary deployment, gradual expansion over weeks
CRITICAL Shadow deployment first (parallel run, no live impact), then canary, then gradual expansion with explicit approval at each stage

Runtime handoff

The handoff to AI Runtime Security is structured, not informal. Transfer:

  • Risk classification and accepted risks
  • Control configuration (guardrails, Judge policies, oversight rules)
  • Adversarial test results and known residual risks
  • PACE plans for all critical components
  • Escalation paths and incident response runbooks
  • Model documentation and known limitations

For HIGH and CRITICAL tiers, conduct a formal handoff meeting. See production readiness: the runtime security handoff.

Phase 7: Run

Once deployed, the three-layer control pattern takes over: guardrails prevent known-bad inputs and outputs in real-time, the Judge evaluates interactions asynchronously for unknown-bad patterns, and human oversight handles high-consequence decisions.

The quantitative risk assessment methodology lets you measure how effectively each layer reduces residual risk, using the same NIST AI RMF alignment that structures the rest of the framework.

Pre-runtime responsibilities do not end at deployment.

Ongoing pre-runtime obligations

Activity Cadence Trigger
Risk classification review Annual minimum Also triggered by scope changes, incidents, or model updates
Model update assessment Every model version change Treat as a new deployment through the SDLC
Adversarial re-testing Per tier cadence (quarterly to continuous) Also triggered by new threat intelligence or incidents
Regulatory review Quarterly Also triggered by new regulation or enforcement actions
PACE plan review When system changes Also after any PACE level transition in production

Model updates as SDLC cycles

A model update is not a patch. It can change the system's behaviour in unpredictable ways. Every model update should cycle back through the relevant SDLC phases:

  • Minor version update (same provider, same model family): Re-run adversarial test suite, verify guardrail effectiveness, monitor closely for the first week
  • Major version update or model change: Full cycle from Phase 3 (Design) onward, including model evaluation, adversarial testing, and staged deployment
  • Provider-side update (API models updated by the provider): Re-run adversarial test suite immediately, review outputs for behavioural changes, escalate if guardrail effectiveness has changed

Feedback loop

Production observations feed back into the SDLC:

  • Guardrail trigger patterns inform adversarial test suite updates
  • Judge evaluation findings reveal gaps in pre-deployment testing
  • Incidents trigger risk classification review and potential tier upgrade
  • User behaviour patterns may reveal use case drift that changes the risk profile

This is not a linear process with a defined end. It is a continuous cycle where production experience improves pre-runtime decisions, and pre-runtime decisions improve production outcomes.

Mapping the lifecycle to your organisation

Where does this fit with existing SDLC processes?

This AI-aware SDLC does not replace your existing development process. It augments it. If your organisation uses agile sprints, the AI phases map to sprint activities. If you use a stage-gate process, the AI gates sit alongside your existing gates.

The key additions to a standard SDLC:

  1. Ideation adds AI suitability assessment before committing to an AI approach
  2. Risk classification is a new activity that does not exist in traditional software development
  3. Design adds model and platform selection as first-class architectural decisions
  4. Build adds AI-specific pipeline gates, data governance, and guardrail configuration
  5. Test adds adversarial testing alongside functional and integration testing
  6. Deploy adds a structured runtime handoff that traditional deployments do not require
  7. Run adds ongoing model assessment and classification review that traditional software does not need

Roles and responsibilities

Each role has a stakeholder view that provides role-specific guidance, reading paths, and concrete first actions.

Role Primary SDLC responsibilities Stakeholder view
Product owner Use case definition, risk classification approval, tier change decisions Product owners
AI/ML engineer Model selection, pipeline build, guardrail configuration, model lifecycle AI engineers
Security leader Security strategy, control framework, adversarial testing oversight Security leaders
Enterprise architect Platform selection, infrastructure controls, integration patterns Enterprise architects
Governance/compliance Regulatory alignment, audit trail, classification review, policy enforcement Compliance and legal
Risk management Risk quantification, board reporting, control effectiveness measurement Risk and governance
CIO/CTO AI portfolio governance, platform strategy, technology standards Chief information officers
Business owner Business case, cost/benefit, operational risk across product lines Business owners

Low-risk fast lane

Not every AI system needs the full process. For LOW-tier systems (internal, read-only, no regulated data, no personal data), a streamlined path is appropriate:

  1. Document the use case and classify as LOW
  2. Select model and platform (standard choices, no formal evaluation required)
  3. Build with basic guardrails
  4. Run basic prompt injection tests
  5. Deploy directly with one-week monitoring
  6. Annual review

The fast lane is only for systems that genuinely meet LOW-tier criteria. If there is any doubt, use the full process.

The SDLC connects both sites

Phases 1 through 6 are pre-runtime (this site). Phase 7 is runtime (AI Runtime Security). The SDLC is the thread that connects them. Risk classification, established in Phase 2, carries through every phase and determines the intensity of both pre-runtime and runtime controls.