The Professional Decision Handbook

Data Analytics Notebook

A decision handbook for data analysis, machine learning, and AI projects. Stop asking “how do I use pandas?” and start asking “I have this data and this goal — what should I do next?” Pick a chapter to begin.

Start here — Interactive Tools

⚡ Interactive

Decision Assistant

Answer a few questions about your data and goal — get a specific, reasoned recommendation for missing values, encoding, scaling, metrics, models, and statistical tests, each linked to the full explanation.

🗺 Guided path

Learning Roadmap

A clear order to learn in, from your first import to job-ready. Eleven stages across all 35 chapters that track your own progress and tell you what to study next.

🔎 Ctrl K

Search Everything

Full-text search across every chapter — methods, metrics, code, and pitfalls. Jump straight to the exact section. Press Ctrl K (or ⌘K) anywhere on the site.

📖 Reference

Glossary

Plain-English definitions of every key term — p-value, leakage, ROC-AUC, SHAP and more. The same definitions pop up as tooltips on key words inside each chapter.

🐍 Run code

Python Playground

Run real pandas & numpy in your browser — no install. Edit and execute the handbook's analytics snippets live; nothing leaves your machine.

🗄 Run SQL

SQL Playground

Query a sample database in your browser. Practice the joins, GROUP BY, CTEs, and window functions from the SQL chapter, with instant results.

Foundations

Chapter 01

Environment & Libraries

Install and import all essential Python libraries for data analytics. Copy this template at the start of every project.

Chapter 02

Data Loading

How to load data from different sources — CSV, Excel, JSON, databases, and web APIs.

Core Workflow

Chapter 03

Exploratory Data Analysis

Understand the structure, shape, and content of your dataset before doing anything else. Never skip this step.

Chapter 04

Data Cleaning

Handle missing values, duplicates, outliers, incorrect types, and messy strings. This is often 60–70% of the work.

Chapter 05

Data Transformation

Reshape, aggregate, merge, and manipulate your data into the form needed for analysis.

Chapter 06

Feature Engineering

Create new meaningful columns from existing data to improve your analysis and model performance.

Chapter 07

Statistical Analysis

Use descriptive and inferential statistics to understand distributions, relationships, and test hypotheses.

Chapter 08

Data Visualization

Create clear and insightful charts. Choose the right chart for the right question.

Data & SQL

Chapter 24

SQL for Analysts

SQL is the #1 daily tool for analysts. Query order, joins, aggregation, window functions, CTEs, and the patterns you actually use in real work.

Chapter 25

Data Engineering & Pipelines

Where data lives and how it flows: warehouses vs lakes, ETL vs ELT, batch vs streaming, orchestration, dbt, and what scales beyond pandas.

Statistics & Experimentation

Chapter 23

Probability & Statistics Foundations

The math under every model and test: distributions, the Central Limit Theorem, confidence intervals, bootstrapping, and what a p-value actually means.

Chapter 26

A/B Testing & Experimentation

Design experiments that give trustworthy answers: hypotheses, power and sample size, running the test, reading results, and the traps that produce false wins.

Chapter 27

Causal Inference

When you can't run an experiment, how do you still claim X caused Y? Confounding, the methods (diff-in-diff, matching, IV, RDD), and how not to fool yourself.

Time Series

Chapter 28

Time Series Analysis & Forecasting

Ordered data breaks the i.i.d. assumption. Stationarity, decomposition, ARIMA vs Prophet vs ML, time-aware validation, and forecasting metrics.

Modeling

Chapter 09

Machine Learning Basics

Build, train, and evaluate predictive models. Learn the workflow that applies to every ML project.

Chapter 10

Model Evaluation

Measure model performance correctly. The right metric depends on your problem type.

Chapter 29

Model Validation & Hyperparameter Tuning

How to estimate true performance and tune without fooling yourself: cross-validation strategies, the search methods, nested CV, and probability calibration.

Chapter 30

Imbalanced Learning

When one class is rare (fraud, churn, disease), accuracy lies and naive models predict the majority. Resampling, class weights, thresholds, and the right metrics.

Chapter 31

Unsupervised Learning

No labels, just structure. Clustering (K-Means, DBSCAN, hierarchical), choosing k, and dimensionality reduction (PCA, t-SNE, UMAP) for compression and visualization.

Chapter 32

Model Interpretability & Explainability

A model you can't explain is one you can't trust or ship in regulated settings. Global vs local explanations, SHAP, permutation importance, PDP, and their pitfalls.

Decision Guides

Chapter 15

Data Analytics Decision Matrix

The core of the handbook. Stop guessing. Use these decision trees to choose the right method for missing values, outliers, encoding, scaling, feature selection, and metrics.

Chapter 16

Statistical Test Selection Guide

Don't memorise tests — learn to choose them. Answer what you want to know and the tree points to the correct test, its assumptions, and its non-parametric backup.

Chapter 17

Machine Learning Model Selection Guide

Which algorithm should you actually use? Match your target, dataset size, and need for explainability to the right model — with honest trade-off ratings for speed, accuracy, and interpretability.

Chapter 18

Real Industry Case Studies

End-to-end recommended pipelines for the five most common analytics projects. Each shows the goal, the recommended stack, the metric professionals report, and the traps to avoid.

Chapter 19

Data Science Troubleshooting Guide

When something breaks or looks wrong, find the symptom here. Each entry gives the likely cause and the concrete fix — for errors, modeling problems, and silent data issues.

Chapter 20

Data Leakage Prevention Guide

Leakage is the #1 reason a model looks brilliant offline and fails in production. Learn the four classic leaks and the correct pattern for each — side by side, wrong vs right.

Delivery

Chapter 11

Reporting & Communication

Export results, format outputs, and communicate your findings clearly to stakeholders.

Chapter 12

Documentation, Ethics & Optimization

Professional analytics workflow: reproducibility, data privacy, bias checks, and performance scaling for large datasets.

Chapter 13

Portfolio & Career Execution

Turn analysis work into interview-ready portfolio projects with clear business impact.

Chapter 14

Project Documentation Pack

Ready-to-use templates and standards for README, data dictionary, validation log, and handover notes.

Chapter 35

Reproducibility, Testing & Tooling

Work others (and future-you) can rerun and trust: environments, seeds, data validation, testing data pipelines, version control, and project structure.

Advanced

Chapter 21