Chapter 21 — MLOps
MLOps: Deploy, Monitor, Retrain
A model in a notebook delivers zero value. This chapter covers the lifecycle most courses skip: packaging, deployment, monitoring, drift detection, and retraining.
MLOps is what turns a one-off analysis into a reliable product. The work doesn't end at
model.fit() — it ends when the model keeps delivering value in production and you'd notice the day it stops.21.1 The ML lifecycle
- Train
- Track
- Package
- Deploy
- Monitor
- Detect drift
- Retrain
21.2 Experiment tracking
Log every run — params, metrics, data version, code commit — so results are reproducible and comparable. Tools: MLflow, Weights & Biases, DVC (for data/versioning).
python
import mlflow with mlflow.start_run(): mlflow.log_params({'n_estimators': 400, 'lr': 0.05}) model.fit(X_train, y_train) auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]) mlflow.log_metric('auc', auc) mlflow.sklearn.log_model(model, 'model')
21.3 Deployment patterns
| Pattern | Use when | Avoid when |
|---|---|---|
| Batch scoring | Predictions needed daily/hourly (churn lists) | Instant response required |
| Real-time API | Low-latency per-request (fraud at checkout) | Huge volumes scored offline |
| Edge / embedded | Offline devices, privacy, low latency | Model too large for device |
| Streaming | Continuous event data | Simple periodic batches suffice |
python
# Minimal real-time model API with FastAPI from fastapi import FastAPI import joblib, pandas as pd app = FastAPI() model = joblib.load('model.joblib') @app.post('/predict') def predict(features: dict): X = pd.DataFrame([features]) proba = model.predict_proba(X)[0, 1] return {'churn_probability': float(proba)}
21.4 Monitoring & drift detection
| What to watch | Meaning | Action |
|---|---|---|
| Data drift | Input feature distribution shifts vs training | Alert; investigate; consider retraining |
| Concept drift | Relationship between X and y changes | Retrain on recent data |
| Prediction drift | Output distribution shifts | Check upstream data pipeline |
| Performance decay | Live metric drops once labels arrive | Retrain / rollback |
| Operational | Latency, error rate, throughput | Scale infra; fix serving bugs |
Detect drift with population stability index (PSI), KS-test, or tools like Evidently / NannyML. Always log live predictions so you can measure performance once true labels arrive.
21.5 Retraining strategy
decision tree
When to retrain? │ ├── Scheduled ─ stable domain ──────► Weekly / monthly cadence ├── Triggered ─ drift / decay alert ─► Retrain on recent window └── Continuous ─ fast-moving data ──► Online / streaming updates
Professional recommendation
Start simple: batch scoring + scheduled retraining + a drift dashboard. Add real-time serving and automated triggers only when the business case demands it. Always keep the previous model version so you can roll back instantly.
21.6 Common mistakes
- Deploying with different preprocessing than training (train/serve skew)
- No monitoring — discovering decay only when a stakeholder complains
- Retraining on drifted data without checking label quality
- No model versioning or rollback path
- Hard-coding paths/secrets instead of config and environment variables
Common mistakes to avoid
- Skipping business context before running technical steps
- Not writing assumptions and limitations explicitly
- Treating one metric as the full story
Quick cheatsheet
mlflow.log_metric() -> Track experiment resultsjoblib.dump(model) -> Persist the trained modelFastAPI / @app.post -> Serve real-time predictionsPSI / KS-test -> Detect data driftmodel registry / version -> Enable rollback