Basic Math

Single-variable statistics | Eleventh Grade

Single-Variable Statistics

Complete Notes & Formulae for Eleventh Grade (Algebra 2)

1. Identify Biased Samples

What is Sampling Bias?

Sampling bias occurs when a sample does not accurately represent the population, causing some members to be overrepresented or underrepresented

Result:

• Skewed or invalid results

• Cannot generalize findings to the population

Types of Sampling Bias:

1. Self-Selection Bias (Voluntary Response Bias)

People with strong opinions volunteer to participate

Example: Online polls where only motivated people respond

2. Nonresponse Bias

People who refuse to participate differ systematically from those who do

Example: Busy people less likely to complete long surveys

3. Undercoverage Bias

Some groups in the population are inadequately represented

Example: Online surveys miss people without internet access

4. Convenience Sampling Bias

Sampling only easily accessible individuals

Example: Surveying only students in one classroom

5. Survivorship Bias

Only studying "survivors" or successful cases

Example: Only interviewing successful entrepreneurs

2. Variance and Standard Deviation

Variance (σ² or s²):

Measures the average squared deviation from the mean (spread of data)

Population Variance:

\[ \sigma^2 = \frac{\sum (x_i - \mu)^2}{N} \]

Sample Variance:

\[ s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1} \]

Note: Divide by \( n-1 \) for sample (Bessel's correction)

Standard Deviation (σ or s):

Square root of variance; measures typical distance from the mean

\[ \sigma = \sqrt{\sigma^2} \quad \text{or} \quad s = \sqrt{s^2} \]

Key Points:

• Larger SD = more spread out data

• Smaller SD = data clustered near mean

• SD has same units as original data

3. Identify an Outlier

What is an Outlier?

A data value that is significantly different (much larger or smaller) from other values in the dataset

Method 1: Using Standard Deviation (3-Sigma Rule)

Rule:

Any data value more than 3 standard deviations from the mean is an outlier

\[ \text{Outlier if: } x < \mu - 3\sigma \text{ or } x > \mu + 3\sigma \]

Method 2: Using IQR (Interquartile Range)

Steps:

1. Find Q1 (25th percentile) and Q3 (75th percentile)

2. Calculate IQR = Q3 - Q1

3. Check for outliers:

\[ \text{Lower outliers: } x < Q1 - 1.5 \times IQR \]

\[ \text{Upper outliers: } x > Q3 + 1.5 \times IQR \]

Example:

Data: 10, 12, 14, 15, 16, 18, 50. Mean = 19.3, SD = 14.4

Upper bound: 19.3 + 3(14.4) = 62.5

Lower bound: 19.3 - 3(14.4) = -23.9

All values fall within bounds

No outliers using 3-sigma rule (though 50 appears unusual)

4. Effect of Removing Outliers

Impact on Statistics:

Removing outliers typically affects:

Mean:

• Most affected by outliers

• Will move toward center of remaining data

Median:

• Resistant to outliers (less affected)

• May change slightly or not at all

Standard Deviation:

• Usually decreases (data less spread out)

• Indicates more consistent data

Range:

• Always decreases

Example:

Data: 10, 12, 14, 15, 16, 18, 100

With OutlierWithout Outlier
Mean26.414.2
Median1514.5
SD31.82.9

Effect: Mean decreased significantly, SD decreased dramatically, median barely changed

5. Find Confidence Intervals for Population Means

Confidence Interval:

A range of values that likely contains the true population parameter with a specified level of confidence

\[ \text{CI} = \bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}} \]

where:

• \( \bar{x} \) = sample mean

• \( z^* \) = critical value (z-score for confidence level)

• \( \sigma \) = population standard deviation

• \( n \) = sample size

• \( \frac{\sigma}{\sqrt{n}} \) = standard error

Common Critical Values:

Confidence Levelz*
90%1.645
95%1.96
99%2.576

Example:

Sample: n = 100, \( \bar{x} = 75 \), σ = 10. Find 95% CI.

Standard error: \( \frac{10}{\sqrt{100}} = 1 \)

Margin of error: \( 1.96 \times 1 = 1.96 \)

CI: \( 75 \pm 1.96 = (73.04, 76.96) \)

95% CI: (73.04, 76.96)

6. Find Confidence Intervals for Population Proportions

Formula:

\[ \text{CI} = \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

where:

• \( \hat{p} = \frac{x}{n} \) = sample proportion

• \( x \) = number of successes

• \( n \) = sample size

• \( z^* \) = critical value

Conditions:

• Random sample

• \( n\hat{p} \geq 10 \) and \( n(1-\hat{p}) \geq 10 \)

• Sample size < 10% of population

Example:

Survey: 200 people, 120 support a policy. Find 95% CI for proportion.

\( \hat{p} = \frac{120}{200} = 0.6 \)

Standard error: \( \sqrt{\frac{0.6(0.4)}{200}} = \sqrt{\frac{0.24}{200}} = 0.0346 \)

Margin of error: \( 1.96 \times 0.0346 = 0.0678 \)

CI: \( 0.6 \pm 0.0678 = (0.532, 0.668) \)

95% CI: (53.2%, 66.8%)

7. Interpret Confidence Intervals for Population Means

Correct Interpretation:

For a 95% confidence interval (a, b):

✓ CORRECT:

"We are 95% confident that the true population mean lies between a and b"

"If we repeated this process many times, about 95% of intervals would contain the true mean"

✗ INCORRECT:

"There is a 95% probability the true mean is between a and b" (probability is wrong - the interval either contains μ or it doesn't)

"95% of the data falls in this interval" (CI is about parameter, not data)

Confidence Level Meaning:

• Higher confidence level → Wider interval (more certainty, less precision)

• Lower confidence level → Narrower interval (less certainty, more precision)

• Larger sample size → Narrower interval (more precision)

8. Experiment Design

Key Components:

1. Control Group

Receives no treatment or standard treatment (baseline for comparison)

2. Treatment Group

Receives the experimental treatment

3. Random Assignment

Randomly assign subjects to groups to eliminate bias

4. Replication

Use enough subjects to detect effects (larger sample = more reliable)

5. Blinding

Single-blind: Subjects don't know which group they're in

Double-blind: Neither subjects nor researchers know (reduces bias)

Types of Studies:

Observational Study:

Observe subjects without intervention; can show association but NOT causation

Experiment:

Researcher imposes treatment; CAN establish causation with proper design

9. Analyze Results Using Simulations

Purpose of Simulation:

Simulations help determine if observed results could have occurred by chance

Steps:

1. State null hypothesis (no effect/difference)

2. Run many simulations assuming null hypothesis is true

3. Compare observed result to simulation distribution

4. Calculate p-value (proportion of simulations as extreme as observed)

5. Make conclusion

Interpretation:

P-value:

Probability of getting results as extreme as observed, assuming no real effect

• Small p-value (< 0.05): Results unlikely due to chance → Statistically significant

• Large p-value (> 0.05): Results could easily occur by chance → Not significant

Example:

A coin is flipped 100 times, getting 60 heads. Is the coin fair?

Simulate 1000 trials of flipping a fair coin 100 times

Count how many simulations give ≥60 heads

If only 30 out of 1000 simulations have ≥60 heads:

P-value = 30/1000 = 0.03

Conclusion: Result is statistically significant (p < 0.05); coin may be biased

10. Quick Reference Summary

Key Formulas:

Sample Variance: \( s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1} \)

Standard Deviation: \( s = \sqrt{s^2} \)

Outlier (3-sigma): \( x < \mu - 3\sigma \) or \( x > \mu + 3\sigma \)

CI for Mean: \( \bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}} \)

CI for Proportion: \( \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)

📚 Study Tips

✓ Always check for biased sampling methods in survey design

✓ Outliers significantly affect mean and standard deviation but not median

✓ Higher confidence level = wider confidence interval

✓ Random assignment in experiments helps establish causation

✓ P-value < 0.05 typically indicates statistical significance

Shares: