AI DevOps¶
DevOps for AI is not the same as DevOps for traditional software. The principles are familiar (automation, repeatability, version control, testing), but AI workloads introduce new artefact types, larger files, different testing requirements, and additional secrets to manage.
Teams that apply standard DevOps practices without adaptation leave gaps that become security vulnerabilities.
What is different about AI DevOps¶
| Traditional DevOps | AI DevOps |
|---|---|
| Code is the primary artefact | Code, models, data, and configuration are all primary artefacts |
| Artefacts are small (MB) | Model artefacts are large (GB to TB) |
| Tests are deterministic | Tests are probabilistic (pass/fail thresholds, not exact matches) |
| Build reproducibility is straightforward | Training reproducibility depends on hardware, random seeds, data order |
| Secrets are API keys and database credentials | Secrets also include model endpoints, data store credentials, experiment tracker tokens |
| Deployment is rolling out new code | Deployment is rolling out new models, which may behave differently than tested |
Core topics¶
- CI/CD for AI covers how to adapt continuous integration and delivery pipelines for AI workloads, including model validation gates and artefact management.
- Infrastructure as Code covers defining and managing AI infrastructure declaratively, from GPU provisioning to inference endpoint configuration.
- Secrets Management covers the expanded secrets surface of AI systems and how to manage credentials for models, data stores, experiment trackers, and API endpoints.
DevOps secures the pipeline. MLOps secures the ML lifecycle.
DevOps and MLOps overlap but are not the same. DevOps focuses on the infrastructure and delivery pipeline. MLOps focuses on the ML-specific lifecycle (training, evaluation, registry, deployment). Both must be secured. This section covers the DevOps side.