📊 Statistics Formulas: Complete AP Study Guide
Essential Mathematical Formulas for AP Statistics Success
📈 Descriptive Statistics Formulas
📊 Measures of Central Tendency
📐 Mean (Arithmetic Mean)
Sample Mean:
\[ \bar{x} = \frac{\sum x}{n} \]Population Mean:
\[ \mu = \frac{\sum X}{N} \]Where:
• \(\bar{x}\) = sample mean
• \(\mu\) = population mean
• \(\sum x\) = sum of all sample values
• \(n\) = sample size
• \(N\) = population size
📊 Median
For Odd n:
\[ M = x_{\frac{n+1}{2}} \]For Even n:
\[ M = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2} \]For Grouped Data:
\[ \text{Median} = L + \left[\frac{\frac{n}{2} - CF}{f}\right] \times h \]Where:
• \(L\) = lower boundary of median class
• \(CF\) = cumulative frequency before median class
• \(f\) = frequency of median class
• \(h\) = class width
📈 Geometric Mean
Or equivalently:
\[ GM = (x_1 \times x_2 \times x_3 \times \cdots \times x_n)^{\frac{1}{n}} \]Used for:
• Growth rates
• Ratios and percentages
• Averaging rates of change
• Investment returns
📏 Measures of Variability
📊 Variance
Population Variance:
\[ \sigma^2 = \frac{\sum(X - \mu)^2}{N} \]Sample Variance:
\[ s^2 = \frac{\sum(x - \bar{x})^2}{n-1} \]Alternative Formula:
\[ s^2 = \frac{\sum x^2 - \frac{(\sum x)^2}{n}}{n-1} \]📐 Standard Deviation
Population SD:
\[ \sigma = \sqrt{\frac{\sum(X - \mu)^2}{N}} \]Sample SD:
\[ s = \sqrt{\frac{\sum(x - \bar{x})^2}{n-1}} \]Relationship:
\[ s = \sqrt{s^2} \quad \text{and} \quad s^2 = (s)^2 \]📊 Interquartile Range
where \(Q_1\) = 25th percentile, \(Q_3\) = 75th percentile
Outlier Detection:
\[ \text{Lower fence} = Q_1 - 1.5(IQR) \] \[ \text{Upper fence} = Q_3 + 1.5(IQR) \]📈 Coefficient of Variation
Or for population:
\[ CV = \frac{\sigma}{\mu} \times 100\% \]Used for:
• Comparing variability between datasets
• Relative measure of dispersion
• When means differ substantially
📊 Percentiles and Z-Scores
📈 Percentile Formula
For position: \(P = (n+1) \times \frac{\text{percentile}}{100}\)
📊 Z-Score (Standard Score)
Converts raw scores to standard deviations from mean
📐 Mean Deviation
Average absolute deviation from the mean
🎲 Probability Formulas
🎯 Basic Probability Rules
📊 Basic Probability
Properties:
\[ 0 \leq P(A) \leq 1 \] \[ P(A) + P(A') = 1 \]Properties:
• \(P(\text{impossible event}) = 0\)
• \(P(\text{certain event}) = 1\)
• \(P(A') = 1 - P(A)\) (complement rule)
🔗 Addition Rule
General Rule:
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]Mutually Exclusive:
\[ P(A \cup B) = P(A) + P(B) \]When to use:
• Finding probability of "A or B"
• Events may or may not overlap
✖️ Multiplication Rule
General Rule:
\[ P(A \cap B) = P(A) \times P(B|A) \]Independent Events:
\[ P(A \cap B) = P(A) \times P(B) \]When to use:
• Finding probability of "A and B"
• Sequential events
🔄 Conditional Probability
and
\[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]Read as:
• \(P(A|B)\) = "Probability of A given B"
• Requires \(P(B) > 0\)
🎲 Expected Value and Variance
📊 Expected Value
Weighted average of all possible values
📈 Variance of Random Variable
Measure of spread for probability distribution
📊 Probability Distributions
📈 Normal Distribution
📊 Normal Distribution Formula
Probability Density Function:
\[ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]Standard Form:
\[ X \sim N(\mu, \sigma^2) \]Parameters:
• \(\mu\) = mean
• \(\sigma\) = standard deviation
• \(\sigma^2\) = variance
📐 Standard Normal (Z-Distribution)
Standardization Formula:
\[ Z = \frac{X - \mu}{\sigma} \]Properties:
• Mean = 0
• Standard deviation = 1
• Used with Z-tables
📊 Normal Approximation
For Binomial (if \(np \geq 10\) and \(n(1-p) \geq 10\)):
\[ X \sim N\left(np, \sqrt{np(1-p)}\right) \]With Continuity Correction:
\[ P(X = k) \approx P(k-0.5 < Y < k+0.5) \] \[ P(X \leq k) \approx P(Y < k+0.5) \]🎲 Binomial Distribution
📊 Binomial Probability Formula
where \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\)
Notation:
\[ X \sim \text{Binomial}(n, p) \]Conditions:
• Fixed number of trials \((n)\)
• Each trial has two outcomes
• Constant probability \((p)\)
• Independent trials
📈 Binomial Mean & Variance
Mean:
\[ \mu = np \]Variance:
\[ \sigma^2 = np(1-p) \]Standard Deviation:
\[ \sigma = \sqrt{np(1-p)} \]Where:
• \(n\) = number of trials
• \(p\) = probability of success
• \((1-p)\) = probability of failure
📈 Correlation & Regression Formulas
🔗 Correlation Coefficients
📊 Pearson Correlation Coefficient
Alternative Formula:
\[ r = \frac{n\sum xy - \sum x\sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} \]Properties:
• \(-1 \leq r \leq 1\)
• Measures linear relationship
• \(r = 0\) means no linear correlation
📈 Spearman Rank Correlation
where \(d\) = difference in ranks
Used for:
• Ranked/ordinal data
• Non-linear monotonic relationships
• When data has outliers
📊 Linear Regression
📈 Linear Regression Equation
Slope:
\[ b = r\left(\frac{s_y}{s_x}\right) \]y-intercept:
\[ a = \bar{y} - b\bar{x} \]Where:
• \(\hat{y}\) = predicted y value
• \(r\) = correlation coefficient
• \(s_x, s_y\) = standard deviations
📊 Coefficient of Determination
Interpretation:
• % of variance in y explained by x
• \(0 \leq r^2 \leq 1\)
• Higher \(r^2\) indicates better fit
🧪 Hypothesis Testing Formulas
📊 Test Statistics
📊 t-Test Formulas
One Sample:
\[ t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} \]Two Sample (equal variances):
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \]Paired t-test:
\[ t = \frac{\bar{d}}{s_d/\sqrt{n}} \]Degrees of freedom:
• One sample: \(df = n - 1\)
• Two sample: \(df = n_1 + n_2 - 2\)
• Paired: \(df = n - 1\)
📈 Chi-Square Test
where:
\[ O = \text{Observed frequency} \] \[ E = \text{Expected frequency} \]Degrees of freedom:
• Goodness of fit: \(df = k - 1\)
• Independence: \(df = (r-1)(c-1)\)
• Where \(k\) = categories, \(r\) = rows, \(c\) = columns
📊 Test Statistic General Form
📈 General Formula
Standardized measure of difference from null hypothesis
📊 P-Value
Probability of observing result if null hypothesis is true
🎯 Confidence Intervals
📊 Confidence Interval Formulas
📈 General Form
Components:
• Point estimate = sample statistic
• Critical value = from t or z distribution
• Standard error = std dev of sampling distribution
📊 Mean (σ known)
Margin of Error:
\[ MOE = z^* \times \frac{\sigma}{\sqrt{n}} \]When to use:
• Population \(\sigma\) is known
• Large sample size \((n \geq 30)\)
• Population is normally distributed
📈 Mean (σ unknown)
with \(df = n - 1\)
When to use:
• Population \(\sigma\) is unknown
• Sample size is small
• Use t-distribution
📊 Proportion
Margin of Error:
\[ MOE = z^* \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]Conditions:
• \(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\)
• Random sample
• Independence
📊 Margin of Error Calculations
📈 Width of Confidence Interval
Distance from point estimate to either endpoint
📊 Factors Affecting MOE
Increase confidence level or decrease sample size increases MOE
📏 Sample Size Calculations
📊 Sample Size Formulas
📈 Sample Size for Mean
Where:
\[ E = \text{desired margin of error} \] \[ \sigma = \text{population standard deviation} \] \[ z^* = \text{critical value} \]Round up to next integer
If \(\sigma\) unknown, use \(s\) from pilot study or conservative estimate
📊 Sample Size for Proportion
Conservative (worst case):
\[ n = \left(\frac{z^*}{E}\right)^2 \times 0.25 \]When \(\hat{p}\) unknown:
• Use \(\hat{p} = 0.5\) (most conservative)
• Gives largest possible sample size
⚡ Power Analysis
📊 Power Components
where \(\beta = P(\text{Type II Error})\)
\[ \alpha = P(\text{Type I Error}) \]Typically \(\alpha = 0.05\), Power \(= 0.80\)
Factors affecting power:
• Effect size (larger = more power)
• Sample size (larger = more power)
• Significance level \(\alpha\)
• Population variability
📈 Sample Size for Power
For t-test:
\[ n = \frac{2(z_{\alpha/2} + z_\beta)^2\sigma^2}{\delta^2} \]where \(\delta = |\mu_1 - \mu_2|\) (effect size)
For two-sample tests:
• \(\delta\) = difference in means
• \(\sigma\) = common standard deviation
• Use appropriate z-values for \(\alpha\) and \(\beta\)
📋 AP Statistics Formula Reference
🎯 Quick Reference for AP Exam
📊 Calculator Functions
- 1-Var Stats: Mean, SD, Q1, Q3, etc.
- 2-Var Stats: Correlation, regression
- normalpdf: Normal probability density
- normalcdf: Normal probability (area)
- invNorm: Inverse normal \((z^*)\)
- tpdf, tcdf, invT: t-distribution functions
- binomialpdf/cdf: Binomial probabilities
📈 Key Relationships
- Variance to SD: \(s = \sqrt{s^2}\)
- Z-score: \(z = \frac{x - \mu}{\sigma}\)
- Standard Error: \(SE = \frac{s}{\sqrt{n}}\)
- \(r^2\) interpretation: % variance explained
- Degrees of freedom: Usually \(n - 1\)
- Critical values: Use t-table or calculator
🎲 Common Values
- 90% CI: \(z^* = 1.645\)
- 95% CI: \(z^* = 1.96\)
- 99% CI: \(z^* = 2.576\)
- 68-95-99.7 Rule: Normal distribution
- Binomial normal approx: \(np \geq 10, n(1-p) \geq 10\)
- Conservative \(\hat{p}\): Use 0.5 when unknown
🧪 Test Selection Guide
- One sample mean: t-test (\(\sigma\) unknown)
- Two sample means: Two-sample t-test
- Paired data: Paired t-test
- One proportion: z-test for proportion
- Two proportions: Two-sample z-test
- Categorical data: Chi-square test
📊 Conditions Checklist
📈 t-Procedures
- Random sample
- Independence (10% condition)
- Normal/Large sample (\(n \geq 30\) or normal population)
📊 z-Procedures (Proportions)
- Random sample
- Independence (10% condition)
- Success/Failure: \(np \geq 10, n(1-p) \geq 10\)
🎲 Chi-Square
- Random sample
- Independence
- Expected counts \(\geq 5\) in all cells
📈 Linear Regression
- Linear relationship (scatterplot)
- Independence
- Normal residuals
- Equal variance (residual plot)
💡 AP Exam Tips
✅ Always Do
- State conditions clearly
- Show calculator work
- Interpret results in context
- Include units in final answer
- Draw conclusions from p-values
❌ Common Mistakes
- Forgetting to check conditions
- Using wrong distribution (z vs t)
- Misinterpreting confidence level
- Causation from correlation
- Incorrect degrees of freedom
🎯 Free Response Strategy
- Show all work clearly
- Use proper statistical vocabulary
- Connect to real-world context
- Check reasonableness of answers
- Practice with released exams