Formula Sheets

Statistics Formulas: Complete AP Study Guide

Statistics Formulas: Complete AP Study Guide

📊 Statistics Formulas: Complete AP Study Guide

Essential Mathematical Formulas for AP Statistics Success

📈 Descriptive Statistics Formulas

📊 Measures of Central Tendency

📐 Mean (Arithmetic Mean)

Sample Mean:

\[ \bar{x} = \frac{\sum x}{n} \]

Population Mean:

\[ \mu = \frac{\sum X}{N} \]

Where:
• \(\bar{x}\) = sample mean
• \(\mu\) = population mean
• \(\sum x\) = sum of all sample values
• \(n\) = sample size
• \(N\) = population size

📊 Median

For Odd n:

\[ M = x_{\frac{n+1}{2}} \]

For Even n:

\[ M = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2} \]

For Grouped Data:

\[ \text{Median} = L + \left[\frac{\frac{n}{2} - CF}{f}\right] \times h \]

Where:
• \(L\) = lower boundary of median class
• \(CF\) = cumulative frequency before median class
• \(f\) = frequency of median class
• \(h\) = class width

📈 Geometric Mean

\[ GM = \sqrt[n]{x_1 \times x_2 \times x_3 \times \cdots \times x_n} \]

Or equivalently:

\[ GM = (x_1 \times x_2 \times x_3 \times \cdots \times x_n)^{\frac{1}{n}} \]

Used for:
• Growth rates
• Ratios and percentages
• Averaging rates of change
• Investment returns

📏 Measures of Variability

📊 Variance

Population Variance:

\[ \sigma^2 = \frac{\sum(X - \mu)^2}{N} \]

Sample Variance:

\[ s^2 = \frac{\sum(x - \bar{x})^2}{n-1} \]

Alternative Formula:

\[ s^2 = \frac{\sum x^2 - \frac{(\sum x)^2}{n}}{n-1} \]

📐 Standard Deviation

Population SD:

\[ \sigma = \sqrt{\frac{\sum(X - \mu)^2}{N}} \]

Sample SD:

\[ s = \sqrt{\frac{\sum(x - \bar{x})^2}{n-1}} \]

Relationship:

\[ s = \sqrt{s^2} \quad \text{and} \quad s^2 = (s)^2 \]

📊 Interquartile Range

\[ IQR = Q_3 - Q_1 \]

where \(Q_1\) = 25th percentile, \(Q_3\) = 75th percentile

Outlier Detection:

\[ \text{Lower fence} = Q_1 - 1.5(IQR) \] \[ \text{Upper fence} = Q_3 + 1.5(IQR) \]

📈 Coefficient of Variation

\[ CV = \frac{s}{\bar{x}} \times 100\% \]

Or for population:

\[ CV = \frac{\sigma}{\mu} \times 100\% \]

Used for:
• Comparing variability between datasets
• Relative measure of dispersion
• When means differ substantially

📊 Percentiles and Z-Scores

📈 Percentile Formula

\[ \text{Percentile} = \frac{\text{Number of values below } x}{\text{Total number of values}} \times 100 \]

For position: \(P = (n+1) \times \frac{\text{percentile}}{100}\)

📊 Z-Score (Standard Score)

\[ z = \frac{x - \mu}{\sigma} \quad \text{or} \quad z = \frac{x - \bar{x}}{s} \]

Converts raw scores to standard deviations from mean

📐 Mean Deviation

\[ MD = \frac{\sum|x - \bar{x}|}{n} \]

Average absolute deviation from the mean

🎲 Probability Formulas

🎯 Basic Probability Rules

📊 Basic Probability

\[ P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}} \]

Properties:

\[ 0 \leq P(A) \leq 1 \] \[ P(A) + P(A') = 1 \]

Properties:
• \(P(\text{impossible event}) = 0\)
• \(P(\text{certain event}) = 1\)
• \(P(A') = 1 - P(A)\) (complement rule)

🔗 Addition Rule

General Rule:

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

Mutually Exclusive:

\[ P(A \cup B) = P(A) + P(B) \]

When to use:
• Finding probability of "A or B"
• Events may or may not overlap

✖️ Multiplication Rule

General Rule:

\[ P(A \cap B) = P(A) \times P(B|A) \]

Independent Events:

\[ P(A \cap B) = P(A) \times P(B) \]

When to use:
• Finding probability of "A and B"
• Sequential events

🔄 Conditional Probability

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

and

\[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]

Read as:
• \(P(A|B)\) = "Probability of A given B"
• Requires \(P(B) > 0\)

🎲 Expected Value and Variance

📊 Expected Value

\[ E(X) = \sum [x \times P(x)] \]

Weighted average of all possible values

📈 Variance of Random Variable

\[ Var(X) = E(X^2) - [E(X)]^2 \] \[ Var(X) = \sum[(x - \mu)^2 \times P(x)] \]

Measure of spread for probability distribution

📊 Probability Distributions

📈 Normal Distribution

📊 Normal Distribution Formula

Probability Density Function:

\[ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]

Standard Form:

\[ X \sim N(\mu, \sigma^2) \]

Parameters:
• \(\mu\) = mean
• \(\sigma\) = standard deviation
• \(\sigma^2\) = variance

📐 Standard Normal (Z-Distribution)

\[ Z \sim N(0, 1) \]

Standardization Formula:

\[ Z = \frac{X - \mu}{\sigma} \]

Properties:
• Mean = 0
• Standard deviation = 1
• Used with Z-tables

📊 Normal Approximation

For Binomial (if \(np \geq 10\) and \(n(1-p) \geq 10\)):

\[ X \sim N\left(np, \sqrt{np(1-p)}\right) \]

With Continuity Correction:

\[ P(X = k) \approx P(k-0.5 < Y < k+0.5) \] \[ P(X \leq k) \approx P(Y < k+0.5) \]

🎲 Binomial Distribution

📊 Binomial Probability Formula

\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

where \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\)

Notation:

\[ X \sim \text{Binomial}(n, p) \]

Conditions:
• Fixed number of trials \((n)\)
• Each trial has two outcomes
• Constant probability \((p)\)
• Independent trials

📈 Binomial Mean & Variance

Mean:

\[ \mu = np \]

Variance:

\[ \sigma^2 = np(1-p) \]

Standard Deviation:

\[ \sigma = \sqrt{np(1-p)} \]

Where:
• \(n\) = number of trials
• \(p\) = probability of success
• \((1-p)\) = probability of failure

📈 Correlation & Regression Formulas

🔗 Correlation Coefficients

📊 Pearson Correlation Coefficient

\[ r = \frac{\sum[(x-\bar{x})(y-\bar{y})]}{\sqrt{\sum(x-\bar{x})^2 \sum(y-\bar{y})^2}} \]

Alternative Formula:

\[ r = \frac{n\sum xy - \sum x\sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} \]

Properties:
• \(-1 \leq r \leq 1\)
• Measures linear relationship
• \(r = 0\) means no linear correlation

📈 Spearman Rank Correlation

\[ r_s = 1 - \frac{6\sum d^2}{n(n^2-1)} \]

where \(d\) = difference in ranks

Used for:
• Ranked/ordinal data
• Non-linear monotonic relationships
• When data has outliers

📊 Linear Regression

📈 Linear Regression Equation

\[ \hat{y} = a + bx \]

Slope:

\[ b = r\left(\frac{s_y}{s_x}\right) \]

y-intercept:

\[ a = \bar{y} - b\bar{x} \]

Where:
• \(\hat{y}\) = predicted y value
• \(r\) = correlation coefficient
• \(s_x, s_y\) = standard deviations

📊 Coefficient of Determination

\[ r^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} \] \[ r^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST} \]

Interpretation:
• % of variance in y explained by x
• \(0 \leq r^2 \leq 1\)
• Higher \(r^2\) indicates better fit

🧪 Hypothesis Testing Formulas

📊 Test Statistics

📊 t-Test Formulas

One Sample:

\[ t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} \]

Two Sample (equal variances):

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \]

Paired t-test:

\[ t = \frac{\bar{d}}{s_d/\sqrt{n}} \]

Degrees of freedom:
• One sample: \(df = n - 1\)
• Two sample: \(df = n_1 + n_2 - 2\)
• Paired: \(df = n - 1\)

📈 Chi-Square Test

\[ \chi^2 = \sum\frac{(O - E)^2}{E} \]

where:

\[ O = \text{Observed frequency} \] \[ E = \text{Expected frequency} \]

Degrees of freedom:
• Goodness of fit: \(df = k - 1\)
• Independence: \(df = (r-1)(c-1)\)
• Where \(k\) = categories, \(r\) = rows, \(c\) = columns

📊 Test Statistic General Form

📈 General Formula

\[ \text{Test Statistic} = \frac{\text{Sample Statistic} - \text{Parameter}}{\text{Standard Error}} \]

Standardized measure of difference from null hypothesis

📊 P-Value

\[ P\text{-value} = P(\text{Test Statistic} \geq |\text{observed value}| \mid H_0 \text{ is true}) \]

Probability of observing result if null hypothesis is true

🎯 Confidence Intervals

📊 Confidence Interval Formulas

📈 General Form

\[ CI = \text{Point Estimate} \pm \text{Margin of Error} \] \[ CI = \text{Point Estimate} \pm (\text{Critical Value})(\text{Standard Error}) \]

Components:
• Point estimate = sample statistic
• Critical value = from t or z distribution
• Standard error = std dev of sampling distribution

📊 Mean (σ known)

\[ CI = \bar{x} \pm z^* \times \frac{\sigma}{\sqrt{n}} \]

Margin of Error:

\[ MOE = z^* \times \frac{\sigma}{\sqrt{n}} \]

When to use:
• Population \(\sigma\) is known
• Large sample size \((n \geq 30)\)
• Population is normally distributed

📈 Mean (σ unknown)

\[ CI = \bar{x} \pm t^* \times \frac{s}{\sqrt{n}} \]

with \(df = n - 1\)

When to use:
• Population \(\sigma\) is unknown
• Sample size is small
• Use t-distribution

📊 Proportion

\[ CI = \hat{p} \pm z^* \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Margin of Error:

\[ MOE = z^* \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Conditions:
• \(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\)
• Random sample
• Independence

📊 Margin of Error Calculations

📈 Width of Confidence Interval

\[ \text{Width} = 2 \times \text{Margin of Error} \] \[ \text{Margin of Error} = \frac{\text{Width}}{2} \]

Distance from point estimate to either endpoint

📊 Factors Affecting MOE

\[ MOE \propto \text{Critical Value} \] \[ MOE \propto \frac{1}{\sqrt{n}} \] \[ MOE \propto \text{Standard Deviation} \]

Increase confidence level or decrease sample size increases MOE

📏 Sample Size Calculations

📊 Sample Size Formulas

📈 Sample Size for Mean

\[ n = \left(\frac{z^* \times \sigma}{E}\right)^2 \]

Where:

\[ E = \text{desired margin of error} \] \[ \sigma = \text{population standard deviation} \] \[ z^* = \text{critical value} \]

Round up to next integer
If \(\sigma\) unknown, use \(s\) from pilot study or conservative estimate

📊 Sample Size for Proportion

\[ n = \left(\frac{z^*}{E}\right)^2 \times \hat{p}(1-\hat{p}) \]

Conservative (worst case):

\[ n = \left(\frac{z^*}{E}\right)^2 \times 0.25 \]

When \(\hat{p}\) unknown:
• Use \(\hat{p} = 0.5\) (most conservative)
• Gives largest possible sample size

⚡ Power Analysis

📊 Power Components

\[ \text{Power} = 1 - \beta \]

where \(\beta = P(\text{Type II Error})\)

\[ \alpha = P(\text{Type I Error}) \]

Typically \(\alpha = 0.05\), Power \(= 0.80\)

Factors affecting power:
• Effect size (larger = more power)
• Sample size (larger = more power)
• Significance level \(\alpha\)
• Population variability

📈 Sample Size for Power

For t-test:

\[ n = \frac{2(z_{\alpha/2} + z_\beta)^2\sigma^2}{\delta^2} \]

where \(\delta = |\mu_1 - \mu_2|\) (effect size)

For two-sample tests:
• \(\delta\) = difference in means
• \(\sigma\) = common standard deviation
• Use appropriate z-values for \(\alpha\) and \(\beta\)

📋 AP Statistics Formula Reference

🎯 Quick Reference for AP Exam

📊 Calculator Functions

  • 1-Var Stats: Mean, SD, Q1, Q3, etc.
  • 2-Var Stats: Correlation, regression
  • normalpdf: Normal probability density
  • normalcdf: Normal probability (area)
  • invNorm: Inverse normal \((z^*)\)
  • tpdf, tcdf, invT: t-distribution functions
  • binomialpdf/cdf: Binomial probabilities

📈 Key Relationships

  • Variance to SD: \(s = \sqrt{s^2}\)
  • Z-score: \(z = \frac{x - \mu}{\sigma}\)
  • Standard Error: \(SE = \frac{s}{\sqrt{n}}\)
  • \(r^2\) interpretation: % variance explained
  • Degrees of freedom: Usually \(n - 1\)
  • Critical values: Use t-table or calculator

🎲 Common Values

  • 90% CI: \(z^* = 1.645\)
  • 95% CI: \(z^* = 1.96\)
  • 99% CI: \(z^* = 2.576\)
  • 68-95-99.7 Rule: Normal distribution
  • Binomial normal approx: \(np \geq 10, n(1-p) \geq 10\)
  • Conservative \(\hat{p}\): Use 0.5 when unknown

🧪 Test Selection Guide

  • One sample mean: t-test (\(\sigma\) unknown)
  • Two sample means: Two-sample t-test
  • Paired data: Paired t-test
  • One proportion: z-test for proportion
  • Two proportions: Two-sample z-test
  • Categorical data: Chi-square test

📊 Conditions Checklist

📈 t-Procedures

  • Random sample
  • Independence (10% condition)
  • Normal/Large sample (\(n \geq 30\) or normal population)

📊 z-Procedures (Proportions)

  • Random sample
  • Independence (10% condition)
  • Success/Failure: \(np \geq 10, n(1-p) \geq 10\)

🎲 Chi-Square

  • Random sample
  • Independence
  • Expected counts \(\geq 5\) in all cells

📈 Linear Regression

  • Linear relationship (scatterplot)
  • Independence
  • Normal residuals
  • Equal variance (residual plot)

💡 AP Exam Tips

✅ Always Do

  • State conditions clearly
  • Show calculator work
  • Interpret results in context
  • Include units in final answer
  • Draw conclusions from p-values

❌ Common Mistakes

  • Forgetting to check conditions
  • Using wrong distribution (z vs t)
  • Misinterpreting confidence level
  • Causation from correlation
  • Incorrect degrees of freedom

🎯 Free Response Strategy

  • Show all work clearly
  • Use proper statistical vocabulary
  • Connect to real-world context
  • Check reasonableness of answers
  • Practice with released exams

🎓 Master Statistics Formulas for AP Success!

Study Strategy

  • Practice with real data
  • Understand when to use each formula
  • Master calculator functions
  • Check conditions systematically

Key Concepts

  • Connect formulas to concepts
  • Interpret results in context
  • Understand sampling distributions
  • Practice hypothesis testing steps

Exam Success

  • Show all work clearly
  • Use statistical language
  • Check conditions first
  • Practice, practice, practice!

🌟 Success in AP Statistics comes from understanding when and how to apply these formulas. Focus on the concepts behind each formula and practice with real data to build confidence for the exam! 🌟

Shares: