Chapter 21 — MLOps

MLOps: Deploy, Monitor, Retrain

A model in a notebook delivers zero value. This chapter covers the lifecycle most courses skip: packaging, deployment, monitoring, drift detection, and retraining.

MLOps is what turns a one-off analysis into a reliable product. The work doesn't end at model.fit() — it ends when the model keeps delivering value in production and you'd notice the day it stops.

21.1 The ML lifecycle

Train
Track
Package
Deploy
Monitor
Detect drift
Retrain

21.2 Experiment tracking

Log every run — params, metrics, data version, code commit — so results are reproducible and comparable. Tools: MLflow, Weights & Biases, DVC (for data/versioning).

python

import mlflow

with mlflow.start_run():
    mlflow.log_params({'n_estimators': 400, 'lr': 0.05})
    model.fit(X_train, y_train)
    auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])
    mlflow.log_metric('auc', auc)
    mlflow.sklearn.log_model(model, 'model')

21.3 Deployment patterns

Pattern	Use when	Avoid when
Batch scoring	Predictions needed daily/hourly (churn lists)	Instant response required
Real-time API	Low-latency per-request (fraud at checkout)	Huge volumes scored offline
Edge / embedded	Offline devices, privacy, low latency	Model too large for device
Streaming	Continuous event data	Simple periodic batches suffice

python

# Minimal real-time model API with FastAPI
from fastapi import FastAPI
import joblib, pandas as pd

app = FastAPI()
model = joblib.load('model.joblib')

@app.post('/predict')
def predict(features: dict):
    X = pd.DataFrame([features])
    proba = model.predict_proba(X)[0, 1]
    return {'churn_probability': float(proba)}

21.4 Monitoring & drift detection

What to watch	Meaning	Action
Data drift	Input feature distribution shifts vs training	Alert; investigate; consider retraining
Concept drift	Relationship between X and y changes	Retrain on recent data
Prediction drift	Output distribution shifts	Check upstream data pipeline
Performance decay	Live metric drops once labels arrive	Retrain / rollback
Operational	Latency, error rate, throughput	Scale infra; fix serving bugs

Detect drift with population stability index (PSI), KS-test, or tools like Evidently / NannyML. Always log live predictions so you can measure performance once true labels arrive.

21.5 Retraining strategy

decision tree

When to retrain?
│
├── Scheduled ─ stable domain ──────► Weekly / monthly cadence
├── Triggered ─ drift / decay alert ─► Retrain on recent window
└── Continuous ─ fast-moving data ──► Online / streaming updates

Professional recommendation

Start simple: batch scoring + scheduled retraining + a drift dashboard. Add real-time serving and automated triggers only when the business case demands it. Always keep the previous model version so you can roll back instantly.

21.6 Common mistakes

Deploying with different preprocessing than training (train/serve skew)
No monitoring — discovering decay only when a stakeholder complains
Retraining on drifted data without checking label quality
No model versioning or rollback path
Hard-coding paths/secrets instead of config and environment variables

Common mistakes to avoid

Skipping business context before running technical steps
Not writing assumptions and limitations explicitly
Treating one metric as the full story

Quick cheatsheet

mlflow.log_metric() -> Track experiment results

joblib.dump(model) -> Persist the trained model

FastAPI / @app.post -> Serve real-time predictions

PSI / KS-test -> Detect data drift

model registry / version -> Enable rollback