Basic Math

Single-variable statistics | Twelfth Grade

Single-Variable Statistics

Complete Notes & Formulae for Twelfth Grade (Precalculus)

1. Biased Samples

Definition:

A biased sample does not accurately represent the population it's meant to study

Types of Bias:

Selection Bias: Sample not randomly chosen

Convenience Sampling: Choosing easiest to reach participants

Voluntary Response Bias: Only those who feel strongly respond

Undercoverage: Some groups excluded from sample

Nonresponse Bias: Selected participants don't respond

Examples:

Biased: Surveying only students in the library about study habits

Unbiased: Random sample of all students in the school

2. Variance

Definition:

Variance measures how spread out the data values are from the mean

Population Variance:

\[ \sigma^2 = \frac{\sum (x - \mu)^2}{N} \]

Sample Variance:

\[ s^2 = \frac{\sum (x - \bar{x})^2}{n-1} \]

where:

• \( x \) = individual data value

• \( \mu \) = population mean; \( \bar{x} \) = sample mean

• \( N \) = population size; \( n \) = sample size

• Note: Sample variance uses \( n-1 \) (Bessel's correction)

3. Standard Deviation

Definition:

Standard deviation is the square root of variance (measures typical distance from mean)

Population Standard Deviation:

\[ \sigma = \sqrt{\frac{\sum (x - \mu)^2}{N}} \]

Sample Standard Deviation:

\[ s = \sqrt{\frac{\sum (x - \bar{x})^2}{n-1}} \]

Example:

Data: 5, 7, 9, 11. Find variance and standard deviation.

Mean: \( \bar{x} = \frac{5+7+9+11}{4} = 8 \)

Deviations: -3, -1, 1, 3

Squared: 9, 1, 1, 9

Variance: \( s^2 = \frac{9+1+1+9}{3} = \frac{20}{3} \approx 6.67 \)

Standard Deviation: \( s = \sqrt{6.67} \approx 2.58 \)

4. Identify Outliers

Definition:

An outlier is a data value that is significantly different from other values in the dataset

IQR Method (Most Common):

Steps:

1. Find Q1 (25th percentile) and Q3 (75th percentile)

2. Calculate IQR = Q3 - Q1

3. Find lower fence = Q1 - 1.5(IQR)

4. Find upper fence = Q3 + 1.5(IQR)

5. Values below lower fence or above upper fence are outliers

\[ \text{Lower Fence} = Q1 - 1.5(IQR) \] \[ \text{Upper Fence} = Q3 + 1.5(IQR) \]

Example:

Data: 22, 24, 26, 28, 29, 31, 35, 37, 41, 53, 64

Q1 = 26, Q3 = 41, IQR = 15

Lower fence: 26 - 1.5(15) = 3.5

Upper fence: 41 + 1.5(15) = 63.5

Outlier: 64 (above upper fence)

5. Effect of Removing Outliers

General Effects:

Mean: Most affected (moves toward the center of remaining data)

Median: Less affected (resistant to outliers)

Standard Deviation: Decreases (data becomes less spread out)

Range: Decreases significantly

Example:

Data: 10, 12, 13, 14, 15, 50 (outlier)

With outlier:

Mean = 19, SD ≈ 15.2

Without outlier:

Mean = 12.8, SD ≈ 1.9

Removing outlier decreased both mean and SD significantly

6. Confidence Intervals for Population Means

Definition:

A confidence interval gives a range of values likely to contain the true population parameter

Formula (σ known or large n):

\[ \bar{x} \pm z^* \frac{\sigma}{\sqrt{n}} \]

where:

• \( \bar{x} \) = sample mean

• \( z^* \) = critical z-value (1.96 for 95% confidence)

• \( \sigma \) = population standard deviation

• \( n \) = sample size

Common z* Values:

Confidence Levelz* Value
90%1.645
95%1.96
99%2.576

7. Confidence Intervals for Proportions

Formula:

\[ \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

where:

• \( \hat{p} \) = sample proportion

• \( z^* \) = critical z-value

• \( n \) = sample size

Example:

In a sample of 200 voters, 120 support a candidate. Find 95% CI for population proportion.

\( \hat{p} = \frac{120}{200} = 0.6 \)

Margin of error: \( 1.96\sqrt{\frac{0.6(0.4)}{200}} = 1.96\sqrt{0.0012} \approx 0.068 \)

95% CI: (0.532, 0.668) or 53.2% to 66.8%

8. Interpret Confidence Intervals

Correct Interpretation:

A 95% confidence interval means:

✓ Correct:

"We are 95% confident that the true population parameter lies within this interval"

"If we repeated this process many times, 95% of intervals would contain the true parameter"

✗ Incorrect:

"There is a 95% probability the parameter is in this interval" (parameter is fixed, not random)

"95% of the data falls in this interval" (CI is about parameter, not data)

9. Experiment Design

Key Components:

Control Group: Receives no treatment or standard treatment

Treatment Group: Receives the experimental treatment

Randomization: Randomly assign subjects to groups

Replication: Use enough subjects for reliable results

Blinding: Subjects don't know which treatment they receive

Double-Blind: Neither subjects nor researchers know assignments

Types of Studies:

Observational Study:

Observe without intervention; can show correlation but NOT causation

Experiment:

Impose treatments on groups; CAN establish causation

10. Quick Reference Summary

Key Formulas:

Sample Variance: \( s^2 = \frac{\sum(x-\bar{x})^2}{n-1} \)

Sample Standard Deviation: \( s = \sqrt{s^2} \)

Outlier Fences: \( Q1 - 1.5(IQR) \) and \( Q3 + 1.5(IQR) \)

CI for Mean: \( \bar{x} \pm z^* \frac{\sigma}{\sqrt{n}} \)

CI for Proportion: \( \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)

📚 Study Tips

✓ Biased samples don't represent the population accurately

✓ Standard deviation is square root of variance (same units as data)

✓ Use IQR method (1.5×IQR) to identify outliers

✓ Removing outliers typically decreases mean and standard deviation

✓ Confidence intervals estimate population parameters, not individual data points

Shares: