📊 Statistics Formulas: Complete AP Study Guide

Essential Mathematical Formulas for AP Statistics Success

📈 Descriptive Statistics Formulas

📊 Measures of Central Tendency

📐 Mean (Arithmetic Mean)

Sample Mean:

\[ \bar{x} = \frac{\sum x}{n} \]

Population Mean:

\[ \mu = \frac{\sum X}{N} \]

Where:
• \(\bar{x}\) = sample mean
• \(\mu\) = population mean
• \(\sum x\) = sum of all sample values
• \(n\) = sample size
• \(N\) = population size

📊 Median

For Odd n:

\[ M = x_{\frac{n+1}{2}} \]

For Even n:

\[ M = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2} \]

For Grouped Data:

\[ \text{Median} = L + \left[\frac{\frac{n}{2} - CF}{f}\right] \times h \]

Where:
• \(L\) = lower boundary of median class
• \(CF\) = cumulative frequency before median class
• \(f\) = frequency of median class
• \(h\) = class width

📈 Geometric Mean

\[ GM = \sqrt[n]{x_1 \times x_2 \times x_3 \times \cdots \times x_n} \]

Or equivalently:

\[ GM = (x_1 \times x_2 \times x_3 \times \cdots \times x_n)^{\frac{1}{n}} \]

Used for:
• Growth rates
• Ratios and percentages
• Averaging rates of change
• Investment returns

📏 Measures of Variability

📊 Variance

Population Variance:

\[ \sigma^2 = \frac{\sum(X - \mu)^2}{N} \]

Sample Variance:

\[ s^2 = \frac{\sum(x - \bar{x})^2}{n-1} \]

Alternative Formula:

\[ s^2 = \frac{\sum x^2 - \frac{(\sum x)^2}{n}}{n-1} \]

📐 Standard Deviation

Population SD:

\[ \sigma = \sqrt{\frac{\sum(X - \mu)^2}{N}} \]

Sample SD:

\[ s = \sqrt{\frac{\sum(x - \bar{x})^2}{n-1}} \]

Relationship:

\[ s = \sqrt{s^2} \quad \text{and} \quad s^2 = (s)^2 \]

📊 Interquartile Range

\[ IQR = Q_3 - Q_1 \]

where \(Q_1\) = 25th percentile, \(Q_3\) = 75th percentile

Outlier Detection:

\[ \text{Lower fence} = Q_1 - 1.5(IQR) \] \[ \text{Upper fence} = Q_3 + 1.5(IQR) \]

📈 Coefficient of Variation

\[ CV = \frac{s}{\bar{x}} \times 100\% \]

Or for population:

\[ CV = \frac{\sigma}{\mu} \times 100\% \]

Used for:
• Comparing variability between datasets
• Relative measure of dispersion
• When means differ substantially

📊 Percentiles and Z-Scores

📈 Percentile Formula

\[ \text{Percentile} = \frac{\text{Number of values below } x}{\text{Total number of values}} \times 100 \]

For position: \(P = (n+1) \times \frac{\text{percentile}}{100}\)

📊 Z-Score (Standard Score)

\[ z = \frac{x - \mu}{\sigma} \quad \text{or} \quad z = \frac{x - \bar{x}}{s} \]

Converts raw scores to standard deviations from mean

📐 Mean Deviation

\[ MD = \frac{\sum|x - \bar{x}|}{n} \]

Average absolute deviation from the mean

🎲 Probability Formulas

🎯 Basic Probability Rules

📊 Basic Probability

\[ P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}} \]

Properties:

\[ 0 \leq P(A) \leq 1 \] \[ P(A) + P(A') = 1 \]

Properties:
• \(P(\text{impossible event}) = 0\)
• \(P(\text{certain event}) = 1\)
• \(P(A') = 1 - P(A)\) (complement rule)

🔗 Addition Rule

General Rule:

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

Mutually Exclusive:

\[ P(A \cup B) = P(A) + P(B) \]

When to use:
• Finding probability of "A or B"
• Events may or may not overlap

✖️ Multiplication Rule

General Rule:

\[ P(A \cap B) = P(A) \times P(B|A) \]

Independent Events:

\[ P(A \cap B) = P(A) \times P(B) \]

When to use:
• Finding probability of "A and B"
• Sequential events

🔄 Conditional Probability

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

and

\[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]

Read as:
• \(P(A|B)\) = "Probability of A given B"
• Requires \(P(B) > 0\)

🎲 Expected Value and Variance

📊 Expected Value

\[ E(X) = \sum [x \times P(x)] \]

Weighted average of all possible values

📈 Variance of Random Variable

\[ Var(X) = E(X^2) - [E(X)]^2 \] \[ Var(X) = \sum[(x - \mu)^2 \times P(x)] \]

Measure of spread for probability distribution

📊 Probability Distributions

📈 Normal Distribution

📊 Normal Distribution Formula

Probability Density Function:

\[ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]

Standard Form:

\[ X \sim N(\mu, \sigma^2) \]

Parameters:
• \(\mu\) = mean
• \(\sigma\) = standard deviation
• \(\sigma^2\) = variance

📐 Standard Normal (Z-Distribution)

\[ Z \sim N(0, 1) \]

Standardization Formula:

\[ Z = \frac{X - \mu}{\sigma} \]

Properties:
• Mean = 0
• Standard deviation = 1
• Used with Z-tables

📊 Normal Approximation

For Binomial (if \(np \geq 10\) and \(n(1-p) \geq 10\)):

\[ X \sim N\left(np, \sqrt{np(1-p)}\right) \]

With Continuity Correction:

\[ P(X = k) \approx P(k-0.5 < Y < k+0.5) \] \[ P(X \leq k) \approx P(Y < k+0.5) \]

🎲 Binomial Distribution

📊 Binomial Probability Formula

\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

where \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\)

Notation:

\[ X \sim \text{Binomial}(n, p) \]

Conditions:
• Fixed number of trials \((n)\)
• Each trial has two outcomes
• Constant probability \((p)\)
• Independent trials

📈 Binomial Mean & Variance

Mean:

\[ \mu = np \]

Variance:

\[ \sigma^2 = np(1-p) \]

Standard Deviation:

\[ \sigma = \sqrt{np(1-p)} \]

Where:
• \(n\) = number of trials
• \(p\) = probability of success
• \((1-p)\) = probability of failure

📈 Correlation & Regression Formulas

🔗 Correlation Coefficients

📊 Pearson Correlation Coefficient

\[ r = \frac{\sum[(x-\bar{x})(y-\bar{y})]}{\sqrt{\sum(x-\bar{x})^2 \sum(y-\bar{y})^2}} \]

Alternative Formula:

\[ r = \frac{n\sum xy - \sum x\sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} \]

Properties:
• \(-1 \leq r \leq 1\)
• Measures linear relationship
• \(r = 0\) means no linear correlation

📈 Spearman Rank Correlation

\[ r_s = 1 - \frac{6\sum d^2}{n(n^2-1)} \]

where \(d\) = difference in ranks

Used for:
• Ranked/ordinal data
• Non-linear monotonic relationships
• When data has outliers

📊 Linear Regression

📈 Linear Regression Equation

\[ \hat{y} = a + bx \]

Slope:

\[ b = r\left(\frac{s_y}{s_x}\right) \]

y-intercept:

\[ a = \bar{y} - b\bar{x} \]

Where:
• \(\hat{y}\) = predicted y value
• \(r\) = correlation coefficient
• \(s_x, s_y\) = standard deviations

📊 Coefficient of Determination

\[ r^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} \] \[ r^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST} \]

Interpretation:
• % of variance in y explained by x
• \(0 \leq r^2 \leq 1\)
• Higher \(r^2\) indicates better fit

🧪 Hypothesis Testing Formulas

📊 Test Statistics

📊 t-Test Formulas

One Sample:

\[ t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} \]

Two Sample (equal variances):

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \]

Paired t-test:

\[ t = \frac{\bar{d}}{s_d/\sqrt{n}} \]

Degrees of freedom:
• One sample: \(df = n - 1\)
• Two sample: \(df = n_1 + n_2 - 2\)
• Paired: \(df = n - 1\)

📈 Chi-Square Test

\[ \chi^2 = \sum\frac{(O - E)^2}{E} \]

where:

\[ O = \text{Observed frequency} \] \[ E = \text{Expected frequency} \]

Degrees of freedom:
• Goodness of fit: \(df = k - 1\)
• Independence: \(df = (r-1)(c-1)\)
• Where \(k\) = categories, \(r\) = rows, \(c\) = columns

📊 Test Statistic General Form

📈 General Formula

\[ \text{Test Statistic} = \frac{\text{Sample Statistic} - \text{Parameter}}{\text{Standard Error}} \]

Standardized measure of difference from null hypothesis

📊 P-Value

\[ P\text{-value} = P(\text{Test Statistic} \geq |\text{observed value}| \mid H_0 \text{ is true}) \]

Probability of observing result if null hypothesis is true

🎯 Confidence Intervals

📊 Confidence Interval Formulas

📈 General Form

\[ CI = \text{Point Estimate} \pm \text{Margin of Error} \] \[ CI = \text{Point Estimate} \pm (\text{Critical Value})(\text{Standard Error}) \]

Components:
• Point estimate = sample statistic
• Critical value = from t or z distribution
• Standard error = std dev of sampling distribution

📊 Mean (σ known)

\[ CI = \bar{x} \pm z^* \times \frac{\sigma}{\sqrt{n}} \]

Margin of Error:

\[ MOE = z^* \times \frac{\sigma}{\sqrt{n}} \]

When to use:
• Population \(\sigma\) is known
• Large sample size \((n \geq 30)\)
• Population is normally distributed

📈 Mean (σ unknown)

\[ CI = \bar{x} \pm t^* \times \frac{s}{\sqrt{n}} \]

with \(df = n - 1\)

When to use:
• Population \(\sigma\) is unknown
• Sample size is small
• Use t-distribution

📊 Proportion

\[ CI = \hat{p} \pm z^* \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Margin of Error:

\[ MOE = z^* \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Conditions:
• \(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\)
• Random sample
• Independence

📊 Margin of Error Calculations

📈 Width of Confidence Interval

\[ \text{Width} = 2 \times \text{Margin of Error} \] \[ \text{Margin of Error} = \frac{\text{Width}}{2} \]

Distance from point estimate to either endpoint

📊 Factors Affecting MOE

\[ MOE \propto \text{Critical Value} \] \[ MOE \propto \frac{1}{\sqrt{n}} \] \[ MOE \propto \text{Standard Deviation} \]

Increase confidence level or decrease sample size increases MOE

📏 Sample Size Calculations

📊 Sample Size Formulas

📈 Sample Size for Mean

\[ n = \left(\frac{z^* \times \sigma}{E}\right)^2 \]

Where:

\[ E = \text{desired margin of error} \] \[ \sigma = \text{population standard deviation} \] \[ z^* = \text{critical value} \]

Round up to next integer
If \(\sigma\) unknown, use \(s\) from pilot study or conservative estimate

📊 Sample Size for Proportion

\[ n = \left(\frac{z^*}{E}\right)^2 \times \hat{p}(1-\hat{p}) \]

Conservative (worst case):

\[ n = \left(\frac{z^*}{E}\right)^2 \times 0.25 \]

When \(\hat{p}\) unknown:
• Use \(\hat{p} = 0.5\) (most conservative)
• Gives largest possible sample size

⚡ Power Analysis

📊 Power Components

\[ \text{Power} = 1 - \beta \]

where \(\beta = P(\text{Type II Error})\)

\[ \alpha = P(\text{Type I Error}) \]

Typically \(\alpha = 0.05\), Power \(= 0.80\)

Factors affecting power:
• Effect size (larger = more power)
• Sample size (larger = more power)
• Significance level \(\alpha\)
• Population variability

📈 Sample Size for Power

For t-test:

\[ n = \frac{2(z_{\alpha/2} + z_\beta)^2\sigma^2}{\delta^2} \]

where \(\delta = |\mu_1 - \mu_2|\) (effect size)

For two-sample tests:
• \(\delta\) = difference in means
• \(\sigma\) = common standard deviation
• Use appropriate z-values for \(\alpha\) and \(\beta\)

📋 AP Statistics Formula Reference

🎯 Quick Reference for AP Exam

📊 Calculator Functions

1-Var Stats: Mean, SD, Q1, Q3, etc.
2-Var Stats: Correlation, regression
normalpdf: Normal probability density
normalcdf: Normal probability (area)
invNorm: Inverse normal \((z^*)\)
tpdf, tcdf, invT: t-distribution functions
binomialpdf/cdf: Binomial probabilities

📈 Key Relationships

Variance to SD: \(s = \sqrt{s^2}\)
Z-score: \(z = \frac{x - \mu}{\sigma}\)
Standard Error: \(SE = \frac{s}{\sqrt{n}}\)
\(r^2\) interpretation: % variance explained
Degrees of freedom: Usually \(n - 1\)
Critical values: Use t-table or calculator

🎲 Common Values

90% CI: \(z^* = 1.645\)
95% CI: \(z^* = 1.96\)
99% CI: \(z^* = 2.576\)
68-95-99.7 Rule: Normal distribution
Binomial normal approx: \(np \geq 10, n(1-p) \geq 10\)
Conservative \(\hat{p}\): Use 0.5 when unknown

🧪 Test Selection Guide

One sample mean: t-test (\(\sigma\) unknown)
Two sample means: Two-sample t-test
Paired data: Paired t-test
One proportion: z-test for proportion
Two proportions: Two-sample z-test
Categorical data: Chi-square test

📊 Conditions Checklist

📈 t-Procedures

Random sample
Independence (10% condition)
Normal/Large sample (\(n \geq 30\) or normal population)

📊 z-Procedures (Proportions)

Random sample
Independence (10% condition)
Success/Failure: \(np \geq 10, n(1-p) \geq 10\)

🎲 Chi-Square

Random sample
Independence
Expected counts \(\geq 5\) in all cells

📈 Linear Regression

Linear relationship (scatterplot)
Independence
Normal residuals
Equal variance (residual plot)

💡 AP Exam Tips

✅ Always Do

State conditions clearly
Show calculator work
Interpret results in context
Include units in final answer
Draw conclusions from p-values

❌ Common Mistakes

Forgetting to check conditions
Using wrong distribution (z vs t)
Misinterpreting confidence level
Causation from correlation
Incorrect degrees of freedom

🎯 Free Response Strategy

Show all work clearly
Use proper statistical vocabulary
Connect to real-world context
Check reasonableness of answers
Practice with released exams