Data Analytics Notebook
A decision handbook for data analysis, machine learning, and AI projects. Stop asking “how do I use pandas?” and start asking “I have this data and this goal — what should I do next?” Pick a chapter to begin.
Start here — Interactive Tools
Decision Assistant
Answer a few questions about your data and goal — get a specific, reasoned recommendation for missing values, encoding, scaling, metrics, models, and statistical tests, each linked to the full explanation.
🗺 Guided pathLearning Roadmap
A clear order to learn in, from your first import to job-ready. Eleven stages across all 35 chapters that track your own progress and tell you what to study next.
🔎 Ctrl KSearch Everything
Full-text search across every chapter — methods, metrics, code, and pitfalls. Jump straight to the exact section. Press Ctrl K (or ⌘K) anywhere on the site.
📖 ReferenceGlossary
Plain-English definitions of every key term — p-value, leakage, ROC-AUC, SHAP and more. The same definitions pop up as tooltips on key words inside each chapter.
🐍 Run codePython Playground
Run real pandas & numpy in your browser — no install. Edit and execute the handbook's analytics snippets live; nothing leaves your machine.
🗄 Run SQLSQL Playground
Query a sample database in your browser. Practice the joins, GROUP BY, CTEs, and window functions from the SQL chapter, with instant results.
Foundations
Core Workflow
Exploratory Data Analysis
Understand the structure, shape, and content of your dataset before doing anything else. Never skip this step.
Chapter 04Data Cleaning
Handle missing values, duplicates, outliers, incorrect types, and messy strings. This is often 60–70% of the work.
Chapter 05Data Transformation
Reshape, aggregate, merge, and manipulate your data into the form needed for analysis.
Chapter 06Feature Engineering
Create new meaningful columns from existing data to improve your analysis and model performance.
Chapter 07Statistical Analysis
Use descriptive and inferential statistics to understand distributions, relationships, and test hypotheses.
Chapter 08Data Visualization
Create clear and insightful charts. Choose the right chart for the right question.
Data & SQL
SQL for Analysts
SQL is the #1 daily tool for analysts. Query order, joins, aggregation, window functions, CTEs, and the patterns you actually use in real work.
Chapter 25Data Engineering & Pipelines
Where data lives and how it flows: warehouses vs lakes, ETL vs ELT, batch vs streaming, orchestration, dbt, and what scales beyond pandas.
Statistics & Experimentation
Probability & Statistics Foundations
The math under every model and test: distributions, the Central Limit Theorem, confidence intervals, bootstrapping, and what a p-value actually means.
Chapter 26A/B Testing & Experimentation
Design experiments that give trustworthy answers: hypotheses, power and sample size, running the test, reading results, and the traps that produce false wins.
Chapter 27Causal Inference
When you can't run an experiment, how do you still claim X caused Y? Confounding, the methods (diff-in-diff, matching, IV, RDD), and how not to fool yourself.
Time Series
Modeling
Machine Learning Basics
Build, train, and evaluate predictive models. Learn the workflow that applies to every ML project.
Chapter 10Model Evaluation
Measure model performance correctly. The right metric depends on your problem type.
Chapter 29Model Validation & Hyperparameter Tuning
How to estimate true performance and tune without fooling yourself: cross-validation strategies, the search methods, nested CV, and probability calibration.
Chapter 30Imbalanced Learning
When one class is rare (fraud, churn, disease), accuracy lies and naive models predict the majority. Resampling, class weights, thresholds, and the right metrics.
Chapter 31Unsupervised Learning
No labels, just structure. Clustering (K-Means, DBSCAN, hierarchical), choosing k, and dimensionality reduction (PCA, t-SNE, UMAP) for compression and visualization.
Chapter 32Model Interpretability & Explainability
A model you can't explain is one you can't trust or ship in regulated settings. Global vs local explanations, SHAP, permutation importance, PDP, and their pitfalls.
Decision Guides
Data Analytics Decision Matrix
The core of the handbook. Stop guessing. Use these decision trees to choose the right method for missing values, outliers, encoding, scaling, feature selection, and metrics.
Chapter 16Statistical Test Selection Guide
Don't memorise tests — learn to choose them. Answer what you want to know and the tree points to the correct test, its assumptions, and its non-parametric backup.
Chapter 17Machine Learning Model Selection Guide
Which algorithm should you actually use? Match your target, dataset size, and need for explainability to the right model — with honest trade-off ratings for speed, accuracy, and interpretability.
Chapter 18Real Industry Case Studies
End-to-end recommended pipelines for the five most common analytics projects. Each shows the goal, the recommended stack, the metric professionals report, and the traps to avoid.
Chapter 19Data Science Troubleshooting Guide
When something breaks or looks wrong, find the symptom here. Each entry gives the likely cause and the concrete fix — for errors, modeling problems, and silent data issues.
Chapter 20Data Leakage Prevention Guide
Leakage is the #1 reason a model looks brilliant offline and fails in production. Learn the four classic leaks and the correct pattern for each — side by side, wrong vs right.
Delivery
Reporting & Communication
Export results, format outputs, and communicate your findings clearly to stakeholders.
Chapter 12Documentation, Ethics & Optimization
Professional analytics workflow: reproducibility, data privacy, bias checks, and performance scaling for large datasets.
Chapter 13Portfolio & Career Execution
Turn analysis work into interview-ready portfolio projects with clear business impact.
Chapter 14Project Documentation Pack
Ready-to-use templates and standards for README, data dictionary, validation log, and handover notes.
Chapter 35Reproducibility, Testing & Tooling
Work others (and future-you) can rerun and trust: environments, seeds, data validation, testing data pipelines, version control, and project structure.
Advanced
MLOps: Deploy, Monitor, Retrain
A model in a notebook delivers zero value. This chapter covers the lifecycle most courses skip: packaging, deployment, monitoring, drift detection, and retraining.
Chapter 22AI & LLM Analytics
Modern analysts increasingly work with embeddings, vector search, and LLMs. This chapter covers RAG, embeddings, vector databases, prompt analytics, and how to evaluate an LLM system.
Chapter 33NLP Fundamentals
Working with text: preprocessing, representations from bag-of-words to embeddings, the core tasks, and choosing between classical ML and transformers.
Chapter 34Deep Learning Fundamentals
When (and when not) to use neural networks. Neurons, training loop, the main architectures, transfer learning, and the practical knobs that decide success.