IB Mathematics: Applications & Interpretation · Topic 4 · SL & HL
Statistics & Probability Formulae
Every formula, definition, condition, and worked example you need for IB Math AI Topic 4 — from descriptive statistics and probability rules to binomial, normal, Poisson, chi-squared, regression, and hypothesis testing. Clear SL / HL labels throughout.
⚠️ Formula Booklet vs. Memorisation
The IB provides a formula booklet in all exams. However, knowing when to apply each formula, understanding the notation and conditions, and being able to interpret results is entirely on you. This guide teaches the formula booklet — not just lists it.
1. Descriptive Statistics 📘 SL 4.1 – 4.3
1.1 Measures of Central Tendency
Use the second form (Σ f·x / Σ f) for grouped or frequency table data. The GDC calculates this automatically — enter data into lists and use 1-Var Stats.
1.2 Measures of Spread
Q₁ = lower quartile (25th percentile), Q₃ = upper quartile (75th percentile). Measures the spread of the middle 50% of data. Robust to outliers.
s = √[ Σ(x − x̄)² / (n−1) ] (sample)
Always use GDC. IB uses σx (population) in most SL contexts. Use sx (sample) for t-tests. Never calculate by hand in an exam.
Variance is the square of standard deviation. The formula E(X²) − [E(X)]² is the computational shortcut used in the formula booklet and is particularly useful for discrete random variable calculations.
1.3 Outliers
IB Outlier Rule: A data point is an outlier if it lies more than 1.5 × IQR below Q₁ or above Q₃:
Outliers are shown as individual points (×) beyond the whiskers of a box-and-whisker plot.
2. Correlation & Regression 📘 SL 4.4
2.1 Pearson's Correlation Coefficient (r)
Where: Sxy = Σxy − n·x̄·ȳ | Sxx = Σx² − n·x̄² | Syy = Σy² − n·ȳ²
2.2 Regression Line (y on x)
- a = gradient (slope) — for each 1-unit increase in x, y increases by a units
- b = y-intercept — value of y when x = 0
- The line always passes through the mean point (x̄, ȳ)
- Use for interpolation (within data range) — not extrapolation (outside range)
3. Probability Rules 📘 SL 4.5 – 4.6
For mutually exclusive events: P(A ∩ B) = 0, so P(A ∪ B) = P(A) + P(B)
The probability that event A does not occur. Extremely useful — often faster to compute 1 minus P(at least one event).
Only valid when A and B are independent. Two events are independent if P(A|B) = P(A) — the occurrence of B does not affect the probability of A.
Venn Diagram Reference
| Region | Meaning | Formula |
|---|---|---|
| A only | In A but not B | P(A) − P(A ∩ B) |
| A ∩ B | In both A and B | Centre overlap region |
| A ∪ B | In A or B or both | P(A) + P(B) − P(A ∩ B) |
| (A ∪ B)' | In neither A nor B | 1 − P(A ∪ B) |
4. Conditional Probability 📘 SL 4.6
Read as: "The probability of A given that B has already occurred." B becomes the new sample space.
Solution: P(Physics | Maths) = 0.5 / 0.7 = 5/7 ≈ 0.714
Tree Diagrams
Tree diagrams are the primary tool for solving multi-stage probability problems in IB. Two rules apply:
Multiply Along Branches
Multiply probabilities along a complete path through the tree to get the probability of that sequence of outcomes.
Add Across Branches
To find the probability of any outcome, add together the probabilities of all branches that lead to it.
5. Discrete Random Variables 📘 SL 4.7
Also written as μ. The long-run average value of a random variable over many trials. Not necessarily a value the variable can actually take.
Where E(X²) = Σ x² · P(X = x). Also Var(X) = σ². Standard deviation = √Var(X).
Essential Condition for any Discrete Probability Distribution:
If asked to find an unknown probability in a distribution table, set up the equation Σ P = 1 and solve.
6. Binomial Distribution 📘 SL 4.8
If X ~ B(n, p), the probability of exactly r successes is:
Where ⁿCᵣ = n! / [r!(n−r)!] is the binomial coefficient (combinations). Use GDC: binomialpdf(n, p, r).
Mean of Binomial
Variance of Binomial
Standard Deviation
• P(X = r): binomialpdf(n, p, r)
• P(X ≤ r): binomialcdf(n, p, r)
• P(X ≥ r) = 1 − binomialcdf(n, p, r−1)
• P(a ≤ X ≤ b) = binomialcdf(n, p, b) − binomialcdf(n, p, a−1)
✅ Worked Example
A biased coin has P(heads) = 0.3. It is tossed 8 times. Find P(exactly 3 heads).
X ~ B(8, 0.3) → P(X = 3) = ⁸C₃ × (0.3)³ × (0.7)⁵ = 56 × 0.027 × 0.16807 ≈ 0.254
7. Normal Distribution 📘 SL 4.9
Key Properties of the Normal Distribution:
- Bell-shaped, perfectly symmetric about the mean
- Mean = Median = Mode (all equal to μ)
- Total area under the curve = 1
- Approximately 68% of data lies within ±1σ of μ
- Approximately 95% of data lies within ±2σ of μ
- Approximately 99.7% of data lies within ±3σ of μ (68-95-99.7 rule)
Standardisation (Z-Score)
The z-score tells you how many standard deviations x is from the mean. Z ~ N(0, 1) is the standard normal distribution.
• P(X ≤ x): normalcdf(−9999, x, μ, σ)
• P(a ≤ X ≤ b): normalcdf(a, b, μ, σ)
• Inverse normal (find x given probability): invNorm(p, μ, σ)
✅ Worked Example
Heights of students follow X ~ N(165, 64). Find P(160 ≤ X ≤ 175).
σ = √64 = 8. GDC: normalcdf(160, 175, 165, 8) ≈ 0.681
8. Spearman's Rank Correlation Coefficient 📘 SL 4.10
- d = difference between the ranks of each matched pair
- n = number of data pairs
- Range: −1 ≤ rₛ ≤ 1 (same interpretation as Pearson's r)
- Used when data is ordinal, non-normal, or has outliers — more robust than Pearson's r
| Step | Action |
|---|---|
| 1 | Rank each variable separately from 1 (lowest) to n (highest) |
| 2 | For tied values, assign the average of the tied ranks |
| 3 | Calculate d = Rank(x) − Rank(y) for each pair |
| 4 | Calculate d² for each pair, then find Σd² |
| 5 | Substitute into formula: rₛ = 1 − (6Σd²) / [n(n²−1)] |
9. Chi-Squared Tests 📘 SL 4.11
9.1 Chi-Squared Test for Independence
Expected frequency formula:
Degrees of freedom:
GDC: Use χ² Test function — input observed data as a matrix, GDC computes χ², p-value, and expected frequencies automatically.
9.2 Chi-Squared Goodness of Fit Test (HL)
Same χ² statistic formula as above, but used to test whether observed data fits a specified theoretical distribution (e.g., binomial, Poisson, uniform, or normal).
If testing fit to a binomial with estimated p: subtract one extra df. If testing normal with estimated μ and σ: subtract 2 extra df.
10. t-Test 📘 SL 4.11 / 📗 HL 4.17
When to use each test:
| Test | Use When | GDC Function |
|---|---|---|
| One-sample t-test | Testing whether a single sample mean equals a claimed value | T-Test (1 sample) |
| Two-sample t-test | Comparing means of two independent groups (unequal variances) | 2-SampTTest |
| Paired t-test | Comparing two related measurements (before/after, same subjects) | T-Test on differences |
11. Hypothesis Testing Framework 📘 SL 4.11 / 📗 HL 4.17
| Error Type | Definition | Probability | Reduce by... |
|---|---|---|---|
| Type I Error (α) | Rejecting H₀ when it is actually true (false positive) | = α (significance level) | Lowering α (e.g., 5% → 1%) |
| Type II Error (β) | Failing to reject H₀ when it is actually false (false negative) | = β | Increasing sample size n |
Power of a test = 1 − β = probability of correctly rejecting a false H₀. Increasing n increases power and reduces Type II errors.
12. Poisson Distribution 📗 HL 4.16
Typical Poisson contexts: Number of phone calls per hour, car accidents per week, defects per metre of fabric, emails per day.
Mean
Variance
Note
GDC: poissonpdf(λ, k) for P(X = k); poissoncdf(λ, k) for P(X ≤ k).
✅ Worked Example
Emails arrive at a rate of 4 per hour. Find P(X ≥ 3 in 30 minutes).
30 minutes → λ = 4 × 0.5 = 2. P(X ≥ 3) = 1 − P(X ≤ 2) = 1 − poissoncdf(2, 2) ≈ 1 − 0.677 = 0.323
13. Random Variable Transformations 📗 HL 4.14 – 4.15
Key: Adding a constant b shifts the mean but does NOT change variance. Multiplying by a scales both mean (×a) and variance (×a²).
Critical: Variance always adds (never subtracts) for sums AND differences, provided X and Y are independent.
X + Y ~ N(μ₁ + μ₂, σ₁² + σ₂²)
Linear combinations of independent normal random variables are themselves normally distributed. This is a fundamental result used extensively in HL probability and statistics.
14. Central Limit Theorem 📗 HL 4.15
The Central Limit Theorem (CLT)
If X is any random variable with mean μ and variance σ², then for a large enough sample size n, the distribution of the sample mean X̄ is approximately normally distributed:
- Applies regardless of the shape of the original distribution of X — as long as n is sufficiently large (generally n ≥ 30)
- The mean of X̄ equals the population mean: E(X̄) = μ
- The standard deviation of X̄ (standard error) = σ / √n — it gets smaller as n increases
- Larger samples produce a sample mean distribution that is more tightly clustered around μ
15. Markov Chains & Transition Matrices 📗 HL 4.19
A Markov chain models a system that moves between states with fixed transition probabilities. The transition matrix T contains the probabilities of moving from each state to every other state.
- s₀ = initial state vector (column vector of initial probabilities)
- T = transition matrix — columns (or rows, depending on convention) sum to 1
- Steady state vector π: satisfies T · π = π. Solve (T − I)π = 0 with Σπᵢ = 1
- The steady state represents the long-run proportion of time in each state
16. Exam Tips & Common Mistakes
✅ High-Scoring Exam Habits
- Always state distribution notation clearly: Write X ~ B(n, p), X ~ N(μ, σ²), or X ~ Po(λ) before calculating. IB awards method marks for correct notation.
- For hypothesis tests, always write H₀ and H₁ explicitly using correct parameter notation (e.g., H₀: μ = 50), state α, and write a conclusion in context — these are all separate mark allocations.
- Never accept H₀ — say "there is insufficient evidence to reject H₀" or "fail to reject H₀." Saying "accept H₀" is technically wrong and loses marks.
- For χ² tests: Check all fe ≥ 5 before proceeding. State the degrees of freedom. Report p-value from GDC, not just the critical value.
- For normal distribution: Always check whether you're given σ or σ² — write the correct value into the GDC. Check whether you need a tail probability or a central interval.
- In Poisson problems: Rescale λ to match the time/space interval in the question before any calculation.
❌ Most Common Mistakes in Exams
- Using σ instead of σ² in N(μ, σ²) notation — IB always writes σ² as the second parameter. If X ~ N(50, 16), then σ = 4, not 16.
- Misidentifying the correct distribution — memorise the FITS conditions for binomial and the 4 Poisson conditions. Applying the wrong distribution formula wastes all method marks.
- Forgetting P(X ≥ r) = 1 − P(X ≤ r−1) — not 1 − P(X ≤ r). Off-by-one errors are extremely common in cumulative binomial and Poisson questions.
- Variance adds for both sums AND differences — Var(X − Y) = Var(X) + Var(Y) when independent. Students frequently subtract variances for differences, which is incorrect.
- Rounding intermediate values — carry full GDC precision throughout a multi-part calculation and only round at the final answer (3 significant figures unless told otherwise).
- Forgetting the expected frequency condition in χ² tests — if any fe < 5, you must combine categories before the test is valid. Not stating this loses marks.
17. Key Terms Glossary
Precise definitions for every core term in IB Math AI Topic 4 — essential for full marks in definition-type questions.
- Random Variable (X)
- A variable whose value is determined by the outcome of a random experiment. Discrete RVs take countable values; continuous RVs take any value in an interval.
- Probability Distribution
- A complete description of all possible values of a random variable and their associated probabilities. Must satisfy: Σ P(X = x) = 1 and 0 ≤ P(X = x) ≤ 1.
- Expected Value E(X)
- The long-run average value of a random variable over an infinite number of trials. Equal to Σ x·P(X = x) for discrete distributions. Also called the mean (μ).
- Null Hypothesis (H₀)
- The default assumption in a hypothesis test — typically stating no effect, no difference, or no association. It is the hypothesis that is tested and potentially rejected.
- p-value
- The probability of obtaining a test statistic at least as extreme as the one observed, given that H₀ is true. A small p-value (≤ α) provides evidence against H₀.
- Significance Level (α)
- The threshold probability below which the null hypothesis is rejected. Represents the maximum acceptable probability of making a Type I error. Commonly α = 0.05 or α = 0.01.
- Standard Error
- The standard deviation of the sampling distribution of the sample mean: SE = σ/√n. Measures how much the sample mean varies from sample to sample.
- Degrees of Freedom (ν)
- The number of independent values that can vary in a statistical calculation. For a one-sample t-test: ν = n − 1. For a χ² independence test: ν = (rows−1)(cols−1).
- Pearson's Correlation Coefficient (r)
- A measure of the strength and direction of the linear relationship between two quantitative variables. Range: −1 ≤ r ≤ 1. Measures linear correlation only — not causation.
- Spearman's Rank Correlation (rₛ)
- A non-parametric measure of the strength of the monotonic relationship between two ranked variables. More robust than Pearson's r for non-normal data or data with outliers.
- Steady State (Markov)
- The long-run probability distribution of a Markov chain that remains unchanged after further transitions. Found by solving π = Tπ with the constraint that all probabilities in π sum to 1.




