Chapter 16 — Stat Tests
Statistical Test Selection Guide
Don't memorise tests — learn to choose them. Answer what you want to know and the tree points to the correct test, its assumptions, and its non-parametric backup.
16.0 What do you want to know?
master decision tree
What is your question? │ ├── Compare means / averages │ ├── 1 group vs known value ───────► One-sample t-test │ ├── 2 groups │ │ ├── Independent ──────────────► Independent t-test │ │ └── Paired (before/after) ────► Paired t-test │ └── 3+ groups ───────────────────► ANOVA │ ├── Compare categories / proportions │ └── Counts in a table ───────────► Chi-square test │ └── Measure a relationship ├── Linear, numeric ─────────────► Pearson correlation └── Monotonic / ranked ──────────► Spearman correlation
16.1 Parametric vs non-parametric backup
assumptions check
Is data ~ normal AND sample large enough? │ ├── YES → use parametric test │ ├── 2 groups ──► t-test │ └── 3+ groups ─► ANOVA │ └── NO (skewed, ordinal, small n) → use non-parametric ├── 2 groups ──► Mann-Whitney U └── 3+ groups ─► Kruskal-Wallis
16.2 Full selection table
| Goal | Data | Test | Non-parametric backup |
|---|---|---|---|
| Mean vs a fixed value | 1 numeric group | ttest_1samp | Wilcoxon signed-rank |
| Compare 2 group means | 2 independent groups | ttest_ind | Mann-Whitney U |
| Before vs after | 2 paired groups | ttest_rel | Wilcoxon signed-rank |
| Compare 3+ group means | 3+ groups | f_oneway (ANOVA) | Kruskal-Wallis |
| Category association | 2 categorical vars | chi2_contingency | Fisher's exact (small n) |
| Linear relationship | 2 numeric vars | pearsonr | Spearman |
16.3 Reading the result
p-value < 0.05 → reject the null hypothesis (the effect is statistically significant). p ≥ 0.05 → not enough evidence. Always report the effect size (Cohen's d, Cramér's V, r) too — significance with a tiny effect rarely matters to the business.
python
from scipy import stats # Two independent groups — does pricing change conversion? group_a = df[df['variant']=='A']['conversion'] group_b = df[df['variant']=='B']['conversion'] t, p = stats.ttest_ind(group_a, group_b, equal_var=False) print(f"t={t:.3f} p={p:.4f}") print("Significant" if p < 0.05 else "Not significant")
Professional recommendation
A/B testTwo-proportion z / t-test
3+ variantsANOVA + post-hoc Tukey
Survey / LikertSpearman + Chi-square
Skewed metricsMann-Whitney U
16.4 Common mistakes
- Running a t-test on heavily skewed data instead of Mann-Whitney U
- Using Pearson on a clearly non-linear relationship (use Spearman)
- Running many tests and reporting only the significant one (p-hacking — correct with Bonferroni)
- Reporting p < 0.05 with no effect size or confidence interval
- Treating "not significant" as "proven no effect"
Common mistakes to avoid
- Skipping business context before running technical steps
- Not writing assumptions and limitations explicitly
- Treating one metric as the full story
Quick cheatsheet
stats.ttest_ind() -> Compare 2 independent group meansstats.f_oneway() -> ANOVA — compare 3+ groupsstats.chi2_contingency() -> Association between categoriesstats.pearsonr() -> Linear correlation + p-valuestats.mannwhitneyu() -> Non-parametric 2-group test