IB Mathematics: Applications & Interpretation · Topic 4 · SL & HL

Statistics & Probability Formulae

Every formula, definition, condition, and worked example you need for IB Math AI Topic 4 — from descriptive statistics and probability rules to binomial, normal, Poisson, chi-squared, regression, and hypothesis testing. Clear SL / HL labels throughout.

📘 SL: Topics 4.1 – 4.11 📗 HL: Topics 4.12 – 4.17 (Additional) 🧮 GDC Required

⚠️ Formula Booklet vs. Memorisation

The IB provides a formula booklet in all exams. However, knowing when to apply each formula, understanding the notation and conditions, and being able to interpret results is entirely on you. This guide teaches the formula booklet — not just lists it.

1. Descriptive Statistics 📘 SL 4.1 – 4.3

1.1 Measures of Central Tendency

Mean (Arithmetic)

x̄ = (Σ x) / n or x̄ = (Σ f·x) / (Σ f)

x̄ = sample mean

Σx = sum of all data values

n = number of data values

f = frequency of each value x

Use the second form (Σ f·x / Σ f) for grouped or frequency table data. The GDC calculates this automatically — enter data into lists and use 1-Var Stats.

1.2 Measures of Spread

Interquartile Range (IQR)

IQR = Q₃ − Q₁

Q₁ = lower quartile (25th percentile), Q₃ = upper quartile (75th percentile). Measures the spread of the middle 50% of data. Robust to outliers.

Standard Deviation (σ or s)

σ = √[ Σ(x − x̄)² / n ] (population)
s = √[ Σ(x − x̄)² / (n−1) ] (sample)

Always use GDC. IB uses σ_x (population) in most SL contexts. Use s_x (sample) for t-tests. Never calculate by hand in an exam.

Variance

Var(X) = σ² = E(X²) − [E(X)]²

Variance is the square of standard deviation. The formula E(X²) − [E(X)]² is the computational shortcut used in the formula booklet and is particularly useful for discrete random variable calculations.

1.3 Outliers

IB Outlier Rule: A data point is an outlier if it lies more than 1.5 × IQR below Q₁ or above Q₃:

Outlier if: x < Q₁ − 1.5×IQR or x > Q₃ + 1.5×IQR

Outliers are shown as individual points (×) beyond the whiskers of a box-and-whisker plot.

2. Correlation & Regression 📘 SL 4.4

2.1 Pearson's Correlation Coefficient (r)

Pearson's r — Measures Linear Correlation

r = S_xy / √(S_xx · S_yy)

Where: S_xy = Σxy − n·x̄·ȳ | S_xx = Σx² − n·x̄² | S_yy = Σy² − n·ȳ²

r = 1: Perfect positive linear correlation

r = −1: Perfect negative linear correlation

r = 0: No linear correlation

IB interpretation guide: |r| ≥ 0.85 → strong; 0.60 ≤ |r| < 0.85 → moderate; |r| < 0.60 → weak. Always use GDC to calculate r. State that correlation does not imply causation.

2.2 Regression Line (y on x)

Least-Squares Regression Line

ŷ = ax + b where a = S_xy / S_xx and b = ȳ − a·x̄

a = gradient (slope) — for each 1-unit increase in x, y increases by a units
b = y-intercept — value of y when x = 0
The line always passes through the mean point (x̄, ȳ)
Use for interpolation (within data range) — not extrapolation (outside range)

⚠️ IB Exam point: The regression line of y on x is used to predict y from x. Do not use it to predict x from y — that requires the regression line of x on y (a separate line, only needed in HL Non-linear Regression context).

3. Probability Rules 📘 SL 4.5 – 4.6

Addition Rule (Union)

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

For mutually exclusive events: P(A ∩ B) = 0, so P(A ∪ B) = P(A) + P(B)

Complement Rule

P(A') = 1 − P(A)

The probability that event A does not occur. Extremely useful — often faster to compute 1 minus P(at least one event).

Multiplication Rule (Independent Events)

P(A ∩ B) = P(A) × P(B)

Only valid when A and B are independent. Two events are independent if P(A|B) = P(A) — the occurrence of B does not affect the probability of A.

Venn Diagram Reference

Region	Meaning	Formula
A only	In A but not B	P(A) − P(A ∩ B)
A ∩ B	In both A and B	Centre overlap region
A ∪ B	In A or B or both	P(A) + P(B) − P(A ∩ B)
(A ∪ B)'	In neither A nor B	1 − P(A ∪ B)

4. Conditional Probability 📘 SL 4.6

Conditional Probability Formula

P(A | B) = P(A ∩ B) / P(B)

Read as: "The probability of A given that B has already occurred." B becomes the new sample space.

Worked Example: In a class, P(passes Maths) = 0.7, P(passes both Maths and Physics) = 0.5. Find P(passes Physics | passes Maths).
Solution: P(Physics | Maths) = 0.5 / 0.7 = 5/7 ≈ 0.714

Tree Diagrams

Tree diagrams are the primary tool for solving multi-stage probability problems in IB. Two rules apply:

Multiply Along Branches

P(A ∩ B) = P(A) × P(B | A)

Multiply probabilities along a complete path through the tree to get the probability of that sequence of outcomes.

Add Across Branches

P(event) = sum of relevant path probabilities

To find the probability of any outcome, add together the probabilities of all branches that lead to it.

5. Discrete Random Variables 📘 SL 4.7

Expected Value (Mean)

E(X) = Σ x · P(X = x)

Also written as μ. The long-run average value of a random variable over many trials. Not necessarily a value the variable can actually take.

Variance of a Discrete RV

Var(X) = E(X²) − [E(X)]²

Where E(X²) = Σ x² · P(X = x). Also Var(X) = σ². Standard deviation = √Var(X).

Essential Condition for any Discrete Probability Distribution:

Σ P(X = x) = 1 and 0 ≤ P(X = x) ≤ 1 for all x

If asked to find an unknown probability in a distribution table, set up the equation Σ P = 1 and solve.

6. Binomial Distribution 📘 SL 4.8

Conditions for a Binomial Distribution B(n, p)

F — Fixed number of trials (n)

I — Independent trials

T — Two outcomes only (success / failure)

S — Same probability p for each trial

If X ~ B(n, p), the probability of exactly r successes is:

P(X = r) = ⁿCᵣ · pʳ · (1−p)ⁿ⁻ʳ

Where ⁿCᵣ = n! / [r!(n−r)!] is the binomial coefficient (combinations). Use GDC: binomialpdf(n, p, r).

Mean of Binomial

E(X) = np

Variance of Binomial

Var(X) = np(1−p)

Standard Deviation

σ = √[np(1−p)]

GDC Commands:
• P(X = r): binomialpdf(n, p, r)
• P(X ≤ r): binomialcdf(n, p, r)
• P(X ≥ r) = 1 − binomialcdf(n, p, r−1)
• P(a ≤ X ≤ b) = binomialcdf(n, p, b) − binomialcdf(n, p, a−1)

✅ Worked Example

A biased coin has P(heads) = 0.3. It is tossed 8 times. Find P(exactly 3 heads).

X ~ B(8, 0.3) → P(X = 3) = ⁸C₃ × (0.3)³ × (0.7)⁵ = 56 × 0.027 × 0.16807 ≈ 0.254

7. Normal Distribution 📘 SL 4.9

X ~ N(μ, σ²) — Normal Distribution Notation

X ~ N(μ, σ²)

μ = mean (centre of the bell curve)

σ² = variance (Note: IB notation uses σ², not σ — a very common mistake)

σ = standard deviation (width of curve)

⚠️ Critical Notation Warning: If X ~ N(10, 4), then μ = 10 and σ² = 4, so σ = 2. Many students mistakenly use 4 as the standard deviation — always check whether σ or σ² is given.

Key Properties of the Normal Distribution:

Bell-shaped, perfectly symmetric about the mean
Mean = Median = Mode (all equal to μ)
Total area under the curve = 1
Approximately 68% of data lies within ±1σ of μ
Approximately 95% of data lies within ±2σ of μ
Approximately 99.7% of data lies within ±3σ of μ (68-95-99.7 rule)

Standardisation (Z-Score)

Z-Score Formula — Standardising to N(0,1)

z = (x − μ) / σ

The z-score tells you how many standard deviations x is from the mean. Z ~ N(0, 1) is the standard normal distribution.

GDC Commands:
• P(X ≤ x): normalcdf(−9999, x, μ, σ)
• P(a ≤ X ≤ b): normalcdf(a, b, μ, σ)
• Inverse normal (find x given probability): invNorm(p, μ, σ)

✅ Worked Example

Heights of students follow X ~ N(165, 64). Find P(160 ≤ X ≤ 175).

σ = √64 = 8. GDC: normalcdf(160, 175, 165, 8) ≈ 0.681

8. Spearman's Rank Correlation Coefficient 📘 SL 4.10

Spearman's Rank Formula (rₛ)

rₛ = 1 − (6 Σd²) / [n(n²−1)]

d = difference between the ranks of each matched pair
n = number of data pairs
Range: −1 ≤ rₛ ≤ 1 (same interpretation as Pearson's r)
Used when data is ordinal, non-normal, or has outliers — more robust than Pearson's r

Step	Action
1	Rank each variable separately from 1 (lowest) to n (highest)
2	For tied values, assign the average of the tied ranks
3	Calculate d = Rank(x) − Rank(y) for each pair
4	Calculate d² for each pair, then find Σd²
5	Substitute into formula: rₛ = 1 − (6Σd²) / [n(n²−1)]

9. Chi-Squared Tests 📘 SL 4.11

9.1 Chi-Squared Test for Independence

χ² Test Statistic

χ² = Σ [ (f_o − f_e)² / f_e ]

f_o = observed frequency (from data)

f_e = expected frequency (if H₀ true)

Expected frequency formula:

f_e = (Row total × Column total) / Grand total

Degrees of freedom:

ν = (rows − 1)(columns − 1)

Condition: All expected frequencies must be ≥ 5. If any f_e < 5, merge categories before conducting the test.
GDC: Use χ² Test function — input observed data as a matrix, GDC computes χ², p-value, and expected frequencies automatically.

9.2 Chi-Squared Goodness of Fit Test (HL)

📗 AHL Extension — Goodness of Fit

Same χ² statistic formula as above, but used to test whether observed data fits a specified theoretical distribution (e.g., binomial, Poisson, uniform, or normal).

ν = number of categories − 1 − (number of estimated parameters)

If testing fit to a binomial with estimated p: subtract one extra df. If testing normal with estimated μ and σ: subtract 2 extra df.

10. t-Test 📘 SL 4.11 / 📗 HL 4.17

One-Sample t-Test Statistic

t = (x̄ − μ₀) / (s / √n)

x̄ = sample mean

μ₀ = hypothesised population mean

s = sample standard deviation (use s_x on GDC)

n = sample size

Degrees of freedom: ν = n − 1 (one-sample t-test) | ν = n₁ + n₂ − 2 (two-sample t-test)

When to use each test:

Test	Use When	GDC Function
One-sample t-test	Testing whether a single sample mean equals a claimed value	T-Test (1 sample)
Two-sample t-test	Comparing means of two independent groups (unequal variances)	2-SampTTest
Paired t-test	Comparing two related measurements (before/after, same subjects)	T-Test on differences

Key assumption: The t-test requires the data to come from a normally distributed population, or n to be large enough to apply the Central Limit Theorem. State this assumption in any IB exam answer.

11. Hypothesis Testing Framework 📘 SL 4.11 / 📗 HL 4.17

Step 1

State H₀ and H₁: H₀ is the null hypothesis (no effect / no difference). H₁ is the alternative hypothesis (the effect you are testing for). Be precise — use parameter notation (μ, p, etc.).

Step 2

State the significance level (α): Common values are α = 0.05 (5%) and α = 0.01 (1%). This is the probability of rejecting H₀ when it is actually true (Type I error).

Step 3

Calculate the test statistic and p-value using GDC. The p-value is the probability of obtaining a result at least as extreme as observed, assuming H₀ is true.

Step 4

Decision rule: If p-value ≤ α → Reject H₀. If p-value > α → Fail to reject H₀. Never say "accept H₀" in IB — always "fail to reject."

Step 5

State conclusion in context: Always refer back to the original problem. State whether there is or is not sufficient evidence, at the given significance level, to support the alternative hypothesis — in the context of the question.

📗 HL — Type I and Type II Errors

Error Type	Definition	Probability	Reduce by...
Type I Error (α)	Rejecting H₀ when it is actually true (false positive)	= α (significance level)	Lowering α (e.g., 5% → 1%)
Type II Error (β)	Failing to reject H₀ when it is actually false (false negative)	= β	Increasing sample size n

Power of a test = 1 − β = probability of correctly rejecting a false H₀. Increasing n increases power and reduces Type II errors.

12. Poisson Distribution 📗 HL 4.16

Conditions for Poisson Distribution Po(λ)

Events occur randomly and independently

Events occur at a constant average rate (λ) per interval

Two events cannot occur at exactly the same time

Events in non-overlapping intervals are independent

Typical Poisson contexts: Number of phone calls per hour, car accidents per week, defects per metre of fabric, emails per day.

P(X = k) = (e^−λ · λ^k) / k! X ~ Po(λ)

Mean

E(X) = λ

Variance

Var(X) = λ

Note

Mean = Variance (unique to Poisson — use to verify or identify)

Rescaling λ: If events occur at rate λ per hour, then over t hours, X ~ Po(λt). Over half an hour, X ~ Po(λ/2). Always match λ to the time interval asked.
GDC: poissonpdf(λ, k) for P(X = k); poissoncdf(λ, k) for P(X ≤ k).

✅ Worked Example

Emails arrive at a rate of 4 per hour. Find P(X ≥ 3 in 30 minutes).

30 minutes → λ = 4 × 0.5 = 2. P(X ≥ 3) = 1 − P(X ≤ 2) = 1 − poissoncdf(2, 2) ≈ 1 − 0.677 = 0.323

13. Random Variable Transformations 📗 HL 4.14 – 4.15

Linear Transformation: Y = aX + b

E(aX + b) = aE(X) + b

Var(aX + b) = a²Var(X)

Key: Adding a constant b shifts the mean but does NOT change variance. Multiplying by a scales both mean (×a) and variance (×a²).

Linear Combinations: aX ± bY

E(aX ± bY) = aE(X) ± bE(Y)

Var(aX ± bY) = a²Var(X) + b²Var(Y) (if X, Y independent)

Critical: Variance always adds (never subtracts) for sums AND differences, provided X and Y are independent.

Sum of Independent Normal Variables

If X ~ N(μ₁, σ₁²) and Y ~ N(μ₂, σ₂²), independently:

X + Y ~ N(μ₁ + μ₂, σ₁² + σ₂²)

Linear combinations of independent normal random variables are themselves normally distributed. This is a fundamental result used extensively in HL probability and statistics.

14. Central Limit Theorem 📗 HL 4.15

The Central Limit Theorem (CLT)

If X is any random variable with mean μ and variance σ², then for a large enough sample size n, the distribution of the sample mean X̄ is approximately normally distributed:

X̄ ~ N( μ, σ²/n ) approximately, for large n

Applies regardless of the shape of the original distribution of X — as long as n is sufficiently large (generally n ≥ 30)
The mean of X̄ equals the population mean: E(X̄) = μ
The standard deviation of X̄ (standard error) = σ / √n — it gets smaller as n increases
Larger samples produce a sample mean distribution that is more tightly clustered around μ

IB Exam Application: The CLT justifies using normal distribution methods (z-tests, t-tests) with large samples drawn from non-normal populations, and is the theoretical basis for statistical inference.

15. Markov Chains & Transition Matrices 📗 HL 4.19

Transition Matrix T and State Vectors

A Markov chain models a system that moves between states with fixed transition probabilities. The transition matrix T contains the probabilities of moving from each state to every other state.

s_n = Tⁿ · s₀ (state after n steps)

s₀ = initial state vector (column vector of initial probabilities)
T = transition matrix — columns (or rows, depending on convention) sum to 1
Steady state vector π: satisfies T · π = π. Solve (T − I)π = 0 with Σπᵢ = 1
The steady state represents the long-run proportion of time in each state

GDC: Use matrix multiplication: Tⁿ × s₀. For large n, multiply T by itself n times — use the ANS matrix feature on the GDC to iterate efficiently.

16. Exam Tips & Common Mistakes

✅ High-Scoring Exam Habits

Always state distribution notation clearly: Write X ~ B(n, p), X ~ N(μ, σ²), or X ~ Po(λ) before calculating. IB awards method marks for correct notation.
For hypothesis tests, always write H₀ and H₁ explicitly using correct parameter notation (e.g., H₀: μ = 50), state α, and write a conclusion in context — these are all separate mark allocations.
Never accept H₀ — say "there is insufficient evidence to reject H₀" or "fail to reject H₀." Saying "accept H₀" is technically wrong and loses marks.
For χ² tests: Check all f_e ≥ 5 before proceeding. State the degrees of freedom. Report p-value from GDC, not just the critical value.
For normal distribution: Always check whether you're given σ or σ² — write the correct value into the GDC. Check whether you need a tail probability or a central interval.
In Poisson problems: Rescale λ to match the time/space interval in the question before any calculation.

❌ Most Common Mistakes in Exams

Using σ instead of σ² in N(μ, σ²) notation — IB always writes σ² as the second parameter. If X ~ N(50, 16), then σ = 4, not 16.
Misidentifying the correct distribution — memorise the FITS conditions for binomial and the 4 Poisson conditions. Applying the wrong distribution formula wastes all method marks.
Forgetting P(X ≥ r) = 1 − P(X ≤ r−1) — not 1 − P(X ≤ r). Off-by-one errors are extremely common in cumulative binomial and Poisson questions.
Variance adds for both sums AND differences — Var(X − Y) = Var(X) + Var(Y) when independent. Students frequently subtract variances for differences, which is incorrect.
Rounding intermediate values — carry full GDC precision throughout a multi-part calculation and only round at the final answer (3 significant figures unless told otherwise).
Forgetting the expected frequency condition in χ² tests — if any f_e < 5, you must combine categories before the test is valid. Not stating this loses marks.

17. Key Terms Glossary

Precise definitions for every core term in IB Math AI Topic 4 — essential for full marks in definition-type questions.

Random Variable (X): A variable whose value is determined by the outcome of a random experiment. Discrete RVs take countable values; continuous RVs take any value in an interval.
Probability Distribution: A complete description of all possible values of a random variable and their associated probabilities. Must satisfy: Σ P(X = x) = 1 and 0 ≤ P(X = x) ≤ 1.
Expected Value E(X): The long-run average value of a random variable over an infinite number of trials. Equal to Σ x·P(X = x) for discrete distributions. Also called the mean (μ).
Null Hypothesis (H₀): The default assumption in a hypothesis test — typically stating no effect, no difference, or no association. It is the hypothesis that is tested and potentially rejected.
p-value: The probability of obtaining a test statistic at least as extreme as the one observed, given that H₀ is true. A small p-value (≤ α) provides evidence against H₀.
Significance Level (α): The threshold probability below which the null hypothesis is rejected. Represents the maximum acceptable probability of making a Type I error. Commonly α = 0.05 or α = 0.01.
Standard Error: The standard deviation of the sampling distribution of the sample mean: SE = σ/√n. Measures how much the sample mean varies from sample to sample.
Degrees of Freedom (ν): The number of independent values that can vary in a statistical calculation. For a one-sample t-test: ν = n − 1. For a χ² independence test: ν = (rows−1)(cols−1).
Pearson's Correlation Coefficient (r): A measure of the strength and direction of the linear relationship between two quantitative variables. Range: −1 ≤ r ≤ 1. Measures linear correlation only — not causation.
Spearman's Rank Correlation (rₛ): A non-parametric measure of the strength of the monotonic relationship between two ranked variables. More robust than Pearson's r for non-normal data or data with outliers.
Steady State (Markov): The long-run probability distribution of a Markov chain that remains unchanged after further transitions. Found by solving π = Tπ with the constraint that all probabilities in π sum to 1.

1. Descriptive Statistics 📘 SL 4.1 – 4.3

1.1 Measures of Central Tendency

1.2 Measures of Spread

1.3 Outliers

2. Correlation & Regression 📘 SL 4.4

2.1 Pearson's Correlation Coefficient (r)

2.2 Regression Line (y on x)

3. Probability Rules 📘 SL 4.5 – 4.6

4. Conditional Probability 📘 SL 4.6

Tree Diagrams

5. Discrete Random Variables 📘 SL 4.7

6. Binomial Distribution 📘 SL 4.8

7. Normal Distribution 📘 SL 4.9

Standardisation (Z-Score)

8. Spearman's Rank Correlation Coefficient 📘 SL 4.10

9. Chi-Squared Tests 📘 SL 4.11

9.1 Chi-Squared Test for Independence

9.2 Chi-Squared Goodness of Fit Test (HL)

10. t-Test 📘 SL 4.11 / 📗 HL 4.17

11. Hypothesis Testing Framework 📘 SL 4.11 / 📗 HL 4.17

12. Poisson Distribution 📗 HL 4.16

13. Random Variable Transformations 📗 HL 4.14 – 4.15

14. Central Limit Theorem 📗 HL 4.15

15. Markov Chains & Transition Matrices 📗 HL 4.19

16. Exam Tips & Common Mistakes

17. Key Terms Glossary

Related Posts

IB

AP