Single-Variable Statistics
Complete Notes & Formulae for Twelfth Grade (Precalculus)
1. Biased Samples
Definition:
A biased sample does not accurately represent the population it's meant to study
Types of Bias:
• Selection Bias: Sample not randomly chosen
• Convenience Sampling: Choosing easiest to reach participants
• Voluntary Response Bias: Only those who feel strongly respond
• Undercoverage: Some groups excluded from sample
• Nonresponse Bias: Selected participants don't respond
Examples:
Biased: Surveying only students in the library about study habits
Unbiased: Random sample of all students in the school
2. Variance
Definition:
Variance measures how spread out the data values are from the mean
Population Variance:
\[ \sigma^2 = \frac{\sum (x - \mu)^2}{N} \]
Sample Variance:
\[ s^2 = \frac{\sum (x - \bar{x})^2}{n-1} \]
where:
• \( x \) = individual data value
• \( \mu \) = population mean; \( \bar{x} \) = sample mean
• \( N \) = population size; \( n \) = sample size
• Note: Sample variance uses \( n-1 \) (Bessel's correction)
3. Standard Deviation
Definition:
Standard deviation is the square root of variance (measures typical distance from mean)
Population Standard Deviation:
\[ \sigma = \sqrt{\frac{\sum (x - \mu)^2}{N}} \]
Sample Standard Deviation:
\[ s = \sqrt{\frac{\sum (x - \bar{x})^2}{n-1}} \]
Example:
Data: 5, 7, 9, 11. Find variance and standard deviation.
Mean: \( \bar{x} = \frac{5+7+9+11}{4} = 8 \)
Deviations: -3, -1, 1, 3
Squared: 9, 1, 1, 9
Variance: \( s^2 = \frac{9+1+1+9}{3} = \frac{20}{3} \approx 6.67 \)
Standard Deviation: \( s = \sqrt{6.67} \approx 2.58 \)
4. Identify Outliers
Definition:
An outlier is a data value that is significantly different from other values in the dataset
IQR Method (Most Common):
Steps:
1. Find Q1 (25th percentile) and Q3 (75th percentile)
2. Calculate IQR = Q3 - Q1
3. Find lower fence = Q1 - 1.5(IQR)
4. Find upper fence = Q3 + 1.5(IQR)
5. Values below lower fence or above upper fence are outliers
\[ \text{Lower Fence} = Q1 - 1.5(IQR) \] \[ \text{Upper Fence} = Q3 + 1.5(IQR) \]
Example:
Data: 22, 24, 26, 28, 29, 31, 35, 37, 41, 53, 64
Q1 = 26, Q3 = 41, IQR = 15
Lower fence: 26 - 1.5(15) = 3.5
Upper fence: 41 + 1.5(15) = 63.5
Outlier: 64 (above upper fence)
5. Effect of Removing Outliers
General Effects:
• Mean: Most affected (moves toward the center of remaining data)
• Median: Less affected (resistant to outliers)
• Standard Deviation: Decreases (data becomes less spread out)
• Range: Decreases significantly
Example:
Data: 10, 12, 13, 14, 15, 50 (outlier)
With outlier:
Mean = 19, SD ≈ 15.2
Without outlier:
Mean = 12.8, SD ≈ 1.9
Removing outlier decreased both mean and SD significantly
6. Confidence Intervals for Population Means
Definition:
A confidence interval gives a range of values likely to contain the true population parameter
Formula (σ known or large n):
\[ \bar{x} \pm z^* \frac{\sigma}{\sqrt{n}} \]
where:
• \( \bar{x} \) = sample mean
• \( z^* \) = critical z-value (1.96 for 95% confidence)
• \( \sigma \) = population standard deviation
• \( n \) = sample size
Common z* Values:
| Confidence Level | z* Value |
|---|---|
| 90% | 1.645 |
| 95% | 1.96 |
| 99% | 2.576 |
7. Confidence Intervals for Proportions
Formula:
\[ \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
where:
• \( \hat{p} \) = sample proportion
• \( z^* \) = critical z-value
• \( n \) = sample size
Example:
In a sample of 200 voters, 120 support a candidate. Find 95% CI for population proportion.
\( \hat{p} = \frac{120}{200} = 0.6 \)
Margin of error: \( 1.96\sqrt{\frac{0.6(0.4)}{200}} = 1.96\sqrt{0.0012} \approx 0.068 \)
95% CI: (0.532, 0.668) or 53.2% to 66.8%
8. Interpret Confidence Intervals
Correct Interpretation:
A 95% confidence interval means:
✓ Correct:
"We are 95% confident that the true population parameter lies within this interval"
"If we repeated this process many times, 95% of intervals would contain the true parameter"
✗ Incorrect:
"There is a 95% probability the parameter is in this interval" (parameter is fixed, not random)
"95% of the data falls in this interval" (CI is about parameter, not data)
9. Experiment Design
Key Components:
• Control Group: Receives no treatment or standard treatment
• Treatment Group: Receives the experimental treatment
• Randomization: Randomly assign subjects to groups
• Replication: Use enough subjects for reliable results
• Blinding: Subjects don't know which treatment they receive
• Double-Blind: Neither subjects nor researchers know assignments
Types of Studies:
Observational Study:
Observe without intervention; can show correlation but NOT causation
Experiment:
Impose treatments on groups; CAN establish causation
10. Quick Reference Summary
Key Formulas:
Sample Variance: \( s^2 = \frac{\sum(x-\bar{x})^2}{n-1} \)
Sample Standard Deviation: \( s = \sqrt{s^2} \)
Outlier Fences: \( Q1 - 1.5(IQR) \) and \( Q3 + 1.5(IQR) \)
CI for Mean: \( \bar{x} \pm z^* \frac{\sigma}{\sqrt{n}} \)
CI for Proportion: \( \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)
📚 Study Tips
✓ Biased samples don't represent the population accurately
✓ Standard deviation is square root of variance (same units as data)
✓ Use IQR method (1.5×IQR) to identify outliers
✓ Removing outliers typically decreases mean and standard deviation
✓ Confidence intervals estimate population parameters, not individual data points
