This article was originally published in The Artificially Enterprise Newsletter.
As enterprises adopt AI what are the systems management considerations?
Last week I got to spend a few days with some of the leaders in the DevOps community now working in the artificial intelligence space, graciously hosted by Alan Shimel of TechStrong. His company runs the DevOps.com portal along with Techstrong.ai. It got me thinking…
As I collaborated with many of the OGs of DevOps, it made me wonder how we take these rapidly emerging AI technologies and apply the same methodology to manage cloud computing infrastructure and apply that to artificial intelligence. I know that AI is just software, and much of it is delivered via SaaS models or on-prem but there are some additional considerations.
For those not familiar, DevOps is a philosophy, a culture, and a set of practices that bridge the gap between software development (Dev) and IT operations (Ops). It’s about breaking down silos, fostering collaboration, and delivering quality software faster. DevOps practices, like continuous integration and continuous delivery, ensure that AI models are always up-to-date, optimized, and delivering value. As we see the emergence of AI in the enterprise, we’ll want also to recognize that value with the same level of rigor in our systems management practices.
DevOps in AI
Given that the framework for DevOps is a methodology that has served the industry well for many years, I believe that much of the tooling and ideas should translate to some degree to artificial intelligence. DevOps is often broken down into four key characteristics: Culture, Automation, Measurement, and Sharing. For this conversation, I will focus on how we may automate and measure AI though not to downplay the culture or sharing. It’s just a topic too broad for a single discussion.
Metrics for Measuring AI
I don’t have the answers, just thoughts. Here are some of the metrics that have been established for machine learning that I believe will become commonplace in the Artificially Intelligent Enterprise.
Accuracy
The ratio of correctly predicted instances to the total instances in the dataset.
Use Case: Classification problems where the distribution of classes is relatively balanced.
Data Drift
The change in data distribution over time.
Use Case: Monitoring AI systems in production to ensure they remain relevant as the nature of the input data evolves.
Model Explainability
The degree to which a model’s predictions can be understood and interpreted.
Use Case: In industries like finance or healthcare, understanding the reasoning behind decisions is crucial for trust and compliance.
Model Inference Time
The time it takes for the model to make a prediction once it’s trained.
Use Real-time applications where rapid predictions are essential, such as autonomous driving.
Model Training Time
The amount of time it takes to train a model.
Use Case: Comparing the efficiency of different algorithms or assessing the feasibility of retraining models frequently.
Precision
The ratio of correctly predicted positive observations to the total predicted positives.
Use Case: Situations with a high cost of false positives, such as spam email detection.
Recall (Sensitivity)
The ratio of correctly predicted positive observations to all the actual positives.
Use Case: Medical diagnoses where missing a positive (false negative) case can have serious consequences.
I also believe that for those deeper in AL/ML ops, like data operations scientists and machine learning professionals, we’ll see a need for understanding some combination of the following metrics:
- F1-Score
- Area Under the Receiver Operating Characteristic Curve (AUC-ROC)
- Mean Absolute Error (MAE)
- Root Mean Square Error (RMSE)
- Log Loss
- Confusion Matrix
- Model Size
- Concept Drift
AI Observability
Once we understand how to measure AI operations, the next step is the tooling to do so. Observability is the ability to understand the internal state of a system from its external outputs (the measurement of AI). In the context of AI-powered software, observability becomes crucial due to the inherent complexity and unpredictability of AI models.
- Model Monitoring: AI models can drift over time as they encounter new and unseen data in production. Observability tools can monitor model predictions in real time, flagging anomalies and ensuring the model remains accurate.
- System Health: The entire software ecosystem must be monitored beyond just the AI model. This includes data pipelines, model-serving infrastructure, and user interactions. Observability ensures that any bottlenecks or failures in the system are promptly identified and addressed.
The DevOps landscape has grown, introducing numerous tools and standards that provide comprehensive observability for conventional software. Modern ML pipelines demand an equivalent level of observability. It is inevitable that tools like Datadog, Honeycomb, and others will add AI observability features, but how well they translate to AI/ML infrastructure remains to be seen.
AI Continous Integration and Delivery (CI/CD)
AI CI/CD refers to the integration of Continuous Integration (CI) and Continuous Deployment (CD) practices specifically for Artificial Intelligence (AI) and Machine Learning (ML) projects.
Continuous Integration (CI) with Large Language Models
CI is the practice of frequently integrating code changes into a shared repository. CI ensures that the codebase and AI models are always in sync for AI-powered software.
- Automated Testing: Every code or model change can be automatically tested to ensure there are no regressions. This is especially important for AI, where small changes can have unintended consequences.
- Version Control: AI models evolve over time. Version control systems allow developers to track changes, compare model versions, and roll back to previous states if needed.
Continuous Deployment (CD)
CD is the practice of automatically deploying code changes to production after passing CI tests. In AI, CD ensures that models are seamlessly updated in production environments.
- Model Serving: Once a model is trained, it needs to be served to end-users. CD practices can automate the deployment of models to serving infrastructure, ensuring users always have access to the latest and most accurate predictions.
- Rollbacks: CD allows for quick rollbacks to previous, stable versions if a new model version performs poorly in production.
Considerations for CI/CD in LLMS
Training LLMs with corporate data introduces a layer of complexity. This data is often proprietary, sensitive, and subject to strict regulatory guidelines. Ensuring that the model doesn’t inadvertently leak or misuse this data during deployment is paramount. However, it’s inevitable that you will run into a situation where data may inadvertently “leak” into your AI infrastructure. How do you “Put the genie back in the bottle?” if that is, in fact, possible?
This shift presents a unique challenge for CI/CD. Unlike conventional software, ML application changes are influenced by code alterations and the data used for training the model.
It’s more than likely that you will be patching models using a technology like LORA (Locally Optimized Robust Approximations). LORA allows for fine-tuning AI models by making localized updates without retraining the entire model, ensuring efficiency and reducing computational costs. This is similar to how we patch server software in concept.
Quality control poses another significant hurdle. In standard software projects, unit, integration, and regression tests determine if a change can be safely incorporated and deployed. If tests are successful, the change proceeds; if not, it’s halted. This approach doesn’t align well with ML models for several reasons. Firstly, the relationship between input and output can be fluid, given that certain ML models exhibit non-deterministic traits. Secondly, ML model inputs like high-dimensional vectors or images can be highly complex. Crafting such inputs in how developers typically create tests would be inefficient and daunting, if not unfeasible. Typical CI/CD tools like Jenkins will either need to be augmented or replaced for
A robust CD system for LLMs should have a clear mechanism to roll back deployments quickly. If a model exhibits signs of overfitting(a common phenomenon in machine learning and statistics where a model learns the training data too closely, including its noise and outliers) or drift (where the model’s performance degrades over time due to changing data patterns), or any other unforeseen issues, it’s crucial to revert to a stable version swiftly to maintain service integrity and data security.
AI CI/CD is about streamlining the process of integrating new data and code changes, testing the AI models, and deploying them to production, all while ensuring optimal model performance and reliability.
DevSecOps for AI
DevSecOps refers to the inclusion of security best practices in DevOps. It’s a later submovement that helps bring security into the DevOps culture. Data is the lifeblood of AI systems often that includes sensitive data, making security paramount. DevOps practices emphasize security at every stage of the software lifecycle.
- Data Protection: AI models are only as good as the data they are trained on. Ensuring data integrity and protecting it from breaches is essential. Encryption, access controls, and regular audits can safeguard data.
- Model Security: Adversaries can exploit AI models through techniques like model inversion or adversarial attacks. Integrating security checks and robust testing can help in identifying and mitigating such vulnerabilities.
Considerations for Securing AI
I covered this topic in a previous newsletter, Security in the Age of Generative AI; the introduction of any new infrastructure can increase your attack face. At the very least, you should consider how your data is governed, how your models are accessed, and how the outputs from those models are used.
DevOps for AI, How to Operationalize AI
As AI continues to permeate every sector, the challenges associated with its development and deployment will only grow. Observability ensures transparency and monitoring of AI systems. Security practices protect sensitive data and models from breaches and exploitation. Continuous Integration ensures that the software and AI components evolve cohesively, while Continuous Deployment guarantees that users always have access to the latest innovations without disruptions.
By embracing DevOps, organizations can navigate the complexities of AI software delivery, ensuring efficiency, reliability, and excellence in the AI-driven future.