Complete Statistics & Probability Formulas Guide

Master essential statistical concepts with comprehensive formulas, step-by-step explanations, and practical examples for academic success in IB, AP, GCSE, and university-level mathematics.

1 Measures of Central Tendency & Variability

Mean (Average)

Population Mean:

\[ \mu = \frac{\sum_{i=1}^{N} x_i}{N} \]

Sample Mean:

\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

Where: μ = population mean, x̄ = sample mean, N = population size, n = sample size

Variance

Population Variance:

\[ \sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N} \]

Sample Variance:

\[ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1} \]

Alternative Formula: \( s^2 = \frac{\sum x^2 - \frac{(\sum x)^2}{n}}{n-1} \)

Standard Deviation

Population Standard Deviation:

\[ \sigma = \sqrt{\frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}} \]

Sample Standard Deviation:

\[ s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} \]

Key Point: Standard deviation is the square root of variance and measures spread in original units.

Median & Mode

Median (n is odd):

\[ M = \left(\frac{n+1}{2}\right)^{th} \text{ term} \]

Median (n is even):

\[ M = \frac{\left(\frac{n}{2}\right)^{th} + \left(\frac{n}{2}+1\right)^{th}}{2} \]

Mode:

The value that appears most frequently in the dataset

2 Probability Formulas

Basic Probability

\[ P(A) = \frac{n(A)}{n(S)} \]

Where: P(A) = probability of event A, n(A) = favorable outcomes, n(S) = total possible outcomes

Probability Range:

\[ 0 \leq P(A) \leq 1 \]

Conditional Probability

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

Probability of A given that B has occurred

Bayes' Theorem:

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

Addition & Multiplication Rules

Addition Rule:

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

Multiplication Rule:

\[ P(A \cap B) = P(A) \cdot P(B|A) \]

Independent Events:

\[ P(A \cap B) = P(A) \cdot P(B) \]

Complement & Other Rules

Complement Rule:

\[ P(A') = 1 - P(A) \]

Mutually Exclusive Events:

\[ P(A \cap B) = 0 \] \[ P(A \cup B) = P(A) + P(B) \]

3 Linear Regression Analysis

Simple Linear Regression

Regression Equation:

\[ y = a + bx \]

Slope (b):

\[ b = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2} \]

Y-Intercept (a):

\[ a = \frac{\sum y \sum x^2 - \sum x \sum xy}{n\sum x^2 - (\sum x)^2} \]

Where: y = dependent variable, x = independent variable, a = y-intercept, b = slope

Correlation Coefficient

Pearson's r:

\[ r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} \]

Coefficient of Determination:

\[ r^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} \]

Range: -1 ≤ r ≤ 1, where |r| closer to 1 indicates stronger linear relationship

4 Hypothesis Testing & Test Statistics

Standard Scores

Z-Score:

\[ z = \frac{x - \mu}{\sigma} \]

One-Sample t-Test:

\[ t = \frac{\bar{x} - \mu}{s/\sqrt{n}} \]

Degrees of Freedom:

\[ df = n - 1 \]

Two-Sample t-Test

Test Statistic:

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p} \]

Pooled Standard Error:

\[ s_p = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \]

Pooled Standard Deviation:

\[ s_{pooled} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}} \]

Confidence Intervals

Mean (σ known):

\[ \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \]

Mean (σ unknown):

\[ \bar{x} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}} \]

Proportion:

\[ \hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

5 Sample Size Determination

Cochran's Formula

Sample Size for Proportions:

\[ n = \frac{z^2 \cdot p \cdot q}{e^2} \]

Finite Population Correction:

\[ n = \frac{N \cdot z^2 \cdot p \cdot q}{e^2(N-1) + z^2 \cdot p \cdot q} \]

Where: z = z-score, p = expected proportion, q = 1-p, e = margin of error, N = population size

Sample Size for Means

Known Standard Deviation:

\[ n = \left(\frac{z \cdot \sigma}{E}\right)^2 \]

Two-Group Comparison:

\[ n = \frac{2(z_{\alpha/2} + z_{\beta})^2 \sigma^2}{(\mu_1 - \mu_2)^2} \]

Where: E = margin of error, σ = standard deviation, z_β = power level z-score

6 Additional Statistical Measures

Relative Frequency

\[ \text{Relative Frequency} = \frac{\text{Frequency of Event}}{\text{Total Number of Observations}} \]

Used to determine the probability of an event based on observed data

Chi-Square Test

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

Where: O = observed frequency, E = expected frequency

Standardized Test Statistic

\[ \text{Test Statistic} = \frac{\text{Statistic} - \text{Parameter}}{\text{Standard Error}} \]

General formula for calculating test statistics in hypothesis testing

Effect Size (Cohen's d)

\[ d = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}} \]

Measures the magnitude of difference between two groups

💡 Key Study Tips & Important Notes

📊 Data Analysis Steps

Identify the type of data (qualitative/quantitative)
Choose appropriate measures of central tendency
Calculate variability measures
Interpret results in context

🔍 Hypothesis Testing

State null and alternative hypotheses
Choose significance level (α)
Calculate test statistic
Make decision based on p-value

📈 Regression Analysis

Check for linear relationship
Calculate correlation coefficient
Find regression equation
Interpret slope and intercept

About the Author

Adam Kumar

Co-Founder @RevisionTown
Mathematics Expert specializing in various curricula including IB, AP, GCSE, IGCSE, and more. Dedicated to creating comprehensive educational resources for students worldwide.

LinkedIn Profile info@revisiontown.com

RevisionTown provides comprehensive study materials and interactive tools for mathematics and statistics across multiple international curricula.