Chapter 16 — Stat Tests

Statistical Test Selection Guide

Don't memorise tests — learn to choose them. Answer what you want to know and the tree points to the correct test, its assumptions, and its non-parametric backup.

16.0 What do you want to know?

master decision tree

What is your question?
│
├── Compare means / averages
│   ├── 1 group vs known value ───────► One-sample t-test
│   ├── 2 groups
│   │   ├── Independent ──────────────► Independent t-test
│   │   └── Paired (before/after) ────► Paired t-test
│   └── 3+ groups ───────────────────► ANOVA
│
├── Compare categories / proportions
│   └── Counts in a table ───────────► Chi-square test
│
└── Measure a relationship
    ├── Linear, numeric ─────────────► Pearson correlation
    └── Monotonic / ranked ──────────► Spearman correlation

16.1 Parametric vs non-parametric backup

assumptions check

Is data ~ normal AND sample large enough?
│
├── YES → use parametric test
│   ├── 2 groups ──► t-test
│   └── 3+ groups ─► ANOVA
│
└── NO (skewed, ordinal, small n) → use non-parametric
    ├── 2 groups ──► Mann-Whitney U
    └── 3+ groups ─► Kruskal-Wallis

16.2 Full selection table

Goal	Data	Test	Non-parametric backup
Mean vs a fixed value	1 numeric group	`ttest_1samp`	Wilcoxon signed-rank
Compare 2 group means	2 independent groups	`ttest_ind`	Mann-Whitney U
Before vs after	2 paired groups	`ttest_rel`	Wilcoxon signed-rank
Compare 3+ group means	3+ groups	`f_oneway` (ANOVA)	Kruskal-Wallis
Category association	2 categorical vars	`chi2_contingency`	Fisher's exact (small n)
Linear relationship	2 numeric vars	`pearsonr`	Spearman

16.3 Reading the result

p-value < 0.05 → reject the null hypothesis (the effect is statistically significant). p ≥ 0.05 → not enough evidence. Always report the effect size (Cohen's d, Cramér's V, r) too — significance with a tiny effect rarely matters to the business.

python

from scipy import stats

# Two independent groups — does pricing change conversion?
group_a = df[df['variant']=='A']['conversion']
group_b = df[df['variant']=='B']['conversion']

t, p = stats.ttest_ind(group_a, group_b, equal_var=False)
print(f"t={t:.3f}  p={p:.4f}")
print("Significant" if p < 0.05 else "Not significant")

Professional recommendation

A/B testTwo-proportion z / t-test

3+ variantsANOVA + post-hoc Tukey

Survey / LikertSpearman + Chi-square

Skewed metricsMann-Whitney U

16.4 Common mistakes

Running a t-test on heavily skewed data instead of Mann-Whitney U
Using Pearson on a clearly non-linear relationship (use Spearman)
Running many tests and reporting only the significant one (p-hacking — correct with Bonferroni)
Reporting p < 0.05 with no effect size or confidence interval
Treating "not significant" as "proven no effect"

Common mistakes to avoid

Skipping business context before running technical steps
Not writing assumptions and limitations explicitly
Treating one metric as the full story

Quick cheatsheet

stats.ttest_ind() -> Compare 2 independent group means

stats.f_oneway() -> ANOVA — compare 3+ groups

stats.chi2_contingency() -> Association between categories

stats.pearsonr() -> Linear correlation + p-value

stats.mannwhitneyu() -> Non-parametric 2-group test