AI-Aware SDLC¶
Traditional software development lifecycles assume you are building deterministic systems from code you write. AI systems break that assumption. The model is not your code. The training data is not your database. The behaviour is not fully predictable. An effective SDLC for AI must account for these differences at every phase, not bolt AI considerations onto the end.
This page maps the complete lifecycle from ideation through to production operation. Each phase links to the detailed guidance elsewhere on this site and on AI Runtime Security. The goal is a single reference that shows where everything fits and what happens when.
Phase 1: Ideation and use case evaluation¶
Before any technical work begins, evaluate whether AI is the right tool for the problem. Not every automation needs a model. Not every model needs to be an LLM.
Is AI the right approach?¶
| Question | If yes | If no |
|---|---|---|
| Does the problem require handling ambiguous, unstructured input? | AI is likely appropriate | Consider rules, workflows, or traditional automation |
| Does the problem need natural language understanding or generation? | LLM-based AI is likely appropriate | Consider simpler ML models, search, or templating |
| Can the problem be solved with deterministic rules? | AI adds complexity without proportionate benefit | Use rules. AI is not a universal upgrade. |
| Is the cost of a wrong answer high? | AI may still be appropriate, but requires higher-tier controls | Simpler approaches may offer better predictability |
| Does the problem require real-time autonomous action? | AI with agentic capabilities, plan for CRITICAL tier | Consider whether human-in-the-loop achieves the same outcome more safely |
The best AI decision might be no AI
If deterministic logic, keyword matching, or a simple lookup solves the problem reliably, adding AI introduces non-determinism, a larger attack surface, and ongoing operational cost. Use AI where it provides genuine capability that simpler approaches cannot match.
Use case definition¶
For each proposed AI use case, document:
- Problem statement. What specific problem does this solve? What does success look like?
- User context. Who uses it? Internal experts, internal general staff, external customers, or no direct user (autonomous)?
- Data requirements. What data does the AI need access to? Is any of it sensitive, regulated, or confidential?
- Decision impact. Does the AI make decisions, recommend decisions, or provide information?
- Action capability. Will the AI take actions (write data, call APIs, send messages) or only read and respond?
- Failure consequences. What happens when the AI is wrong? What is the blast radius?
This use case definition feeds directly into risk classification. The clearer the use case, the more accurate the classification.
Single agent or multi-agent?¶
Most AI use cases are single-agent: one model, one set of tools, one task. Multi-agent architectures (orchestrators, delegators, specialist agents) are appropriate when:
- The task requires multiple distinct capabilities that cannot be served by one model with tools
- Different parts of the workflow need different permission levels
- The system needs to decompose complex goals into subtasks autonomously
If multi-agent, the MASO Framework applies, with additional multi-agent controls and delegation chain requirements. Each agent needs independent risk classification, and the system starts at MASO Tier 1 (Supervised) regardless of how confident you are in the design.
For single-agent systems, the Foundation Framework applies: core controls (the three-layer pattern) plus infrastructure controls (80 technical controls covering IAM, logging, network, data protection, secrets, supply chain, and incident response).
Phase 2: Risk classification¶
Once the use case is defined, classify it. This is not a one-time activity. Classification is revisited at every phase as the system's scope, data access, and capabilities become clearer.
Initial classification¶
Follow the risk classification process:
- Score impact dimensions (decision authority, reversibility, data sensitivity, audience, scale, regulatory)
- Apply the highest tier across all dimensions
- Factor in host application risk alignment
- Check regulatory requirements and adjust the floor upward if regulation demands it
- Document the classification with driving factors
What the tier unlocks¶
The tier determines the rigour required at every subsequent phase. Higher tiers do not just mean more controls. They mean more scrutiny, more independence in testing, more formal approvals, and more documentation.
| Phase | LOW | MEDIUM | HIGH | CRITICAL |
|---|---|---|---|---|
| Model selection | Documented choice | Documented evaluation | Formal assessment with alternatives | Independent review of assessment |
| Platform selection | Standard requirements | Data residency check | Compliance-driven selection | Board-level platform approval |
| Build | Standard CI/CD | AI-specific pipeline gates | Pipeline integrity verification | Immutable, auditable pipelines |
| Test | Basic prompt testing | Structured test suite | Red-team exercise | Independent adversarial assessment |
| Deploy | Standard deployment | Subset rollout | Canary with monitoring | Shadow then canary with approval gates |
| Run | Basic monitoring | Guardrails + batch Judge | Guardrails + near-real-time Judge + human oversight | Full three-layer pattern with real-time Judge |
Phase 3: Design¶
Design is where you make the decisions that constrain everything downstream. Three decisions dominate this phase: which model, which platform, and what architecture.
Model selection¶
Choose the model based on capability needs, risk tier requirements, and security posture.
| Decision | Guidance |
|---|---|
| Open-weight vs. closed API | Risk assessment covers the tradeoffs. Open-weight gives you control and auditability. Closed API gives you managed infrastructure and regular updates. |
| Model provenance | Provenance and integrity. At HIGH and CRITICAL tiers, you must verify the model's origin, training process, and integrity before use. |
| Model trust | Trust and evaluation. Evaluate the model against your specific use case, not just generic benchmarks. |
| Vulnerability exposure | Vulnerability scanning. Scan for known vulnerabilities, backdoors, and adversarial weaknesses. |
| Threat landscape | Model threat landscape. Understand the current threats to the type of model you are selecting. |
Platform selection¶
Choose the platform based on data requirements, compliance obligations, and operational capability.
| Pattern | When to use | Guidance |
|---|---|---|
| Cloud AI services | Most use cases. Managed infrastructure, rapid development, provider handles model serving. | Cloud AI services |
| Self-hosted | Regulated data that cannot leave your environment. Full control requirements. Air-gapped deployments. | Self-hosted infrastructure |
| Hybrid | Sensitive processing on-premises, less sensitive in the cloud. Gradual migration. Multi-cloud resilience. | Hybrid patterns |
Architecture decisions¶
Document the system architecture, including:
- How users interact with the AI (API, chat interface, embedded in application)
- What data flows into and out of the model
- What tools or actions the model has access to
- How the AI component connects to the host application
- What the PACE resilience plan looks like for each critical component
- Which infrastructure controls apply based on the risk tier
For agentic systems, also document:
- Tool access controls: what tools are available, what permissions they have, how invocations are constrained
- Sandbox patterns: how generated code is isolated and executed
- Agentic controls: plan approval, action constraints, circuit breakers
For multi-agent systems, additionally document:
- Agent roles and responsibilities
- Delegation boundaries (what each agent can and cannot do, depth limits, privilege constraints)
- Communication channels between agents
- The orchestration pattern (central orchestrator, peer-to-peer, hierarchical)
Infrastructure control selection¶
During design, identify which of the 80 infrastructure controls apply to your system. The controls are organised by domain and tagged by risk tier, so you can quickly filter to what is relevant.
| Domain | Controls | Key design decisions |
|---|---|---|
| Identity and access | 8 controls | Authentication model, least privilege, control/data plane separation |
| Logging and observability | 10 controls | What to log, retention, redaction, SIEM integration |
| Network and segmentation | 8 controls | Zone architecture, guardrail bypass prevention, egress controls |
| Data protection | 8 controls | Classification, minimisation, PII handling, RAG access controls |
| Secrets and credentials | 8 controls | Vault strategy, context window isolation, rotation policy |
| Supply chain | 8 controls | Model provenance, RAG data integrity, AI-BOM |
| Incident response | 8 controls | AI-specific categories, containment, rollback, investigation |
Not every control applies to every system. Select based on your tier, deselect what does not apply, and document the rationale.
Platform-specific implementation patterns
If you are deploying on a specific cloud platform, the infrastructure section includes reference patterns for AWS Bedrock, Microsoft Foundry, and Databricks that map controls to platform-native services.
Phase 4: Build¶
Build the system using AI-aware DevOps and MLOps practices. The build phase is where pipeline integrity, data governance, and secrets management are established.
Pipeline security¶
Your CI/CD pipeline for AI has requirements that traditional software pipelines do not.
| Area | Guidance |
|---|---|
| CI/CD adaptation | CI/CD for AI. Add model validation gates, artefact integrity checks, and AI-specific test stages. |
| Infrastructure | Infrastructure as code. Define GPU provisioning, inference endpoints, and environment configuration as code. |
| Secrets | Secrets management. AI systems have an expanded credential surface: model endpoints, vector databases, tool APIs, experiment trackers. |
| Integration | Integration security. The AI component connects to web frontends, databases, APIs, and MCP servers. Secure every integration point. |
Data governance¶
Data shapes model behaviour. Govern it accordingly.
| Area | Guidance |
|---|---|
| Data lineage | Data governance. Know where your training and RAG data comes from, who modified it, and what it contains. |
| ML pipelines | Secure ML pipelines. Ensure training integrity, reproducibility, and attestation. |
| Model lifecycle | Model lifecycle. Manage transitions from training through staging to production with approval gates. |
| Experiments | Experiment tracking. Protect intellectual property in hyperparameters, results, and experiment configurations. |
Guardrail configuration¶
Configure input and output guardrails during the build phase, not after deployment. The control matrix specifies what is required at each tier, and the controls page details the three-layer pattern (guardrails, Judge, human oversight).
At a minimum:
- Input guardrails: injection detection, content policy enforcement, rate limiting
- Output guardrails: content filtering, PII detection, grounding checks (where applicable)
- Logging: configured to the retention and protection level required by the tier
- Network controls: guardrail bypass prevention at the network level (ensure traffic cannot reach the model without passing through guardrails)
For specialised deployments, also configure during build:
- Multimodal controls if processing images, audio, or video
- Streaming controls if using streaming responses
- Memory and context controls if using persistent memory or long context windows
- Reasoning model controls if using chain-of-thought reasoning models
Resilience¶
Build PACE resilience into the application from the start. Every critical component (model endpoint, vector database, tool connections, pipeline infrastructure) needs a documented and tested fallback path. The PACE controls section defines fail postures for each control layer, and the PACE checklist provides verification criteria.
Phase 5: Test¶
Testing for AI systems goes beyond functional verification. You need to verify that the system behaves correctly, and that it resists deliberate attempts to make it misbehave.
Functional testing¶
- Does the model produce useful, accurate outputs for the intended use case?
- Do guardrails fire correctly (blocking what should be blocked, allowing what should be allowed)?
- Do tools and integrations work as expected?
- Does the PACE fallback logic trigger correctly when components fail?
Adversarial testing¶
Follow the adversarial testing requirements for your tier:
| Tier | Minimum requirement before deployment |
|---|---|
| LOW | Basic prompt injection test suite |
| MEDIUM | Structured adversarial test suite, no HIGH-severity findings |
| HIGH | Red-team exercise completed, all findings remediated or risk-accepted |
| CRITICAL | Independent adversarial assessment, senior stakeholder review, remediation plan approved |
Domain-specific testing¶
Generic safety testing is insufficient. Test guardrails against the specific risk categories relevant to your use case. A model that refuses to generate malware may readily give inappropriate financial advice. See domain-specific guardrail tuning for guidance.
Risk classification review¶
After testing, revisit the risk classification. Testing may reveal:
- The model accesses or generates data you did not anticipate
- Attack surface is broader than assumed
- Guardrails are less effective in your specific domain than in generic benchmarks
Adjust the tier if needed. It is better to upgrade before deployment than after an incident.
Phase 6: Deploy¶
Deployment is the gate between pre-runtime and runtime. No system passes through without meeting the requirements for its tier.
Deployment gate¶
Follow the production readiness checklist for your tier. The gate is a hard requirement, not a suggestion.
Key gate items:
- Risk classification documented and approved
- Adversarial testing passed at the required level
- Guardrails configured and verified
- PACE resilience plan documented and tested
- Logging operational
- Rollback procedure tested
- Runtime security team notified and briefed
Staged deployment¶
| Tier | Strategy |
|---|---|
| LOW | Direct deployment, monitor for one week |
| MEDIUM | Subset of users first, expand after one to two weeks |
| HIGH | Canary deployment, gradual expansion over weeks |
| CRITICAL | Shadow deployment first (parallel run, no live impact), then canary, then gradual expansion with explicit approval at each stage |
Runtime handoff¶
The handoff to AI Runtime Security is structured, not informal. Transfer:
- Risk classification and accepted risks
- Control configuration (guardrails, Judge policies, oversight rules)
- Adversarial test results and known residual risks
- PACE plans for all critical components
- Escalation paths and incident response runbooks
- Model documentation and known limitations
For HIGH and CRITICAL tiers, conduct a formal handoff meeting. See production readiness: the runtime security handoff.
Phase 7: Run¶
Once deployed, the three-layer control pattern takes over: guardrails prevent known-bad inputs and outputs in real-time, the Judge evaluates interactions asynchronously for unknown-bad patterns, and human oversight handles high-consequence decisions.
The quantitative risk assessment methodology lets you measure how effectively each layer reduces residual risk, using the same NIST AI RMF alignment that structures the rest of the framework.
Pre-runtime responsibilities do not end at deployment.
Ongoing pre-runtime obligations¶
| Activity | Cadence | Trigger |
|---|---|---|
| Risk classification review | Annual minimum | Also triggered by scope changes, incidents, or model updates |
| Model update assessment | Every model version change | Treat as a new deployment through the SDLC |
| Adversarial re-testing | Per tier cadence (quarterly to continuous) | Also triggered by new threat intelligence or incidents |
| Regulatory review | Quarterly | Also triggered by new regulation or enforcement actions |
| PACE plan review | When system changes | Also after any PACE level transition in production |
Model updates as SDLC cycles¶
A model update is not a patch. It can change the system's behaviour in unpredictable ways. Every model update should cycle back through the relevant SDLC phases:
- Minor version update (same provider, same model family): Re-run adversarial test suite, verify guardrail effectiveness, monitor closely for the first week
- Major version update or model change: Full cycle from Phase 3 (Design) onward, including model evaluation, adversarial testing, and staged deployment
- Provider-side update (API models updated by the provider): Re-run adversarial test suite immediately, review outputs for behavioural changes, escalate if guardrail effectiveness has changed
Feedback loop¶
Production observations feed back into the SDLC:
- Guardrail trigger patterns inform adversarial test suite updates
- Judge evaluation findings reveal gaps in pre-deployment testing
- Incidents trigger risk classification review and potential tier upgrade
- User behaviour patterns may reveal use case drift that changes the risk profile
This is not a linear process with a defined end. It is a continuous cycle where production experience improves pre-runtime decisions, and pre-runtime decisions improve production outcomes.
Mapping the lifecycle to your organisation¶
Where does this fit with existing SDLC processes?¶
This AI-aware SDLC does not replace your existing development process. It augments it. If your organisation uses agile sprints, the AI phases map to sprint activities. If you use a stage-gate process, the AI gates sit alongside your existing gates.
The key additions to a standard SDLC:
- Ideation adds AI suitability assessment before committing to an AI approach
- Risk classification is a new activity that does not exist in traditional software development
- Design adds model and platform selection as first-class architectural decisions
- Build adds AI-specific pipeline gates, data governance, and guardrail configuration
- Test adds adversarial testing alongside functional and integration testing
- Deploy adds a structured runtime handoff that traditional deployments do not require
- Run adds ongoing model assessment and classification review that traditional software does not need
Roles and responsibilities¶
Each role has a stakeholder view that provides role-specific guidance, reading paths, and concrete first actions.
| Role | Primary SDLC responsibilities | Stakeholder view |
|---|---|---|
| Product owner | Use case definition, risk classification approval, tier change decisions | Product owners |
| AI/ML engineer | Model selection, pipeline build, guardrail configuration, model lifecycle | AI engineers |
| Security leader | Security strategy, control framework, adversarial testing oversight | Security leaders |
| Enterprise architect | Platform selection, infrastructure controls, integration patterns | Enterprise architects |
| Governance/compliance | Regulatory alignment, audit trail, classification review, policy enforcement | Compliance and legal |
| Risk management | Risk quantification, board reporting, control effectiveness measurement | Risk and governance |
| CIO/CTO | AI portfolio governance, platform strategy, technology standards | Chief information officers |
| Business owner | Business case, cost/benefit, operational risk across product lines | Business owners |
Low-risk fast lane¶
Not every AI system needs the full process. For LOW-tier systems (internal, read-only, no regulated data, no personal data), a streamlined path is appropriate:
- Document the use case and classify as LOW
- Select model and platform (standard choices, no formal evaluation required)
- Build with basic guardrails
- Run basic prompt injection tests
- Deploy directly with one-week monitoring
- Annual review
The fast lane is only for systems that genuinely meet LOW-tier criteria. If there is any doubt, use the full process.
The SDLC connects both sites
Phases 1 through 6 are pre-runtime (this site). Phase 7 is runtime (AI Runtime Security). The SDLC is the thread that connects them. Risk classification, established in Phase 2, carries through every phase and determines the intensity of both pre-runtime and runtime controls.