Basic Math

Single-variable statistics | Ninth Grade

Single-Variable Statistics - Ninth Grade Math

Introduction to Statistics

Statistics: The science of collecting, organizing, analyzing, and interpreting data
Single-Variable Data: Data involving one characteristic or measurement
Population: The entire group being studied
Sample: A subset of the population used to make inferences
Parameter: A numerical description of a population
Statistic: A numerical description of a sample

1. Identify Biased Samples

Biased Sample: A sample that does not fairly represent the population
Random Sample: Each member of population has equal chance of being selected
Representative Sample: Reflects the characteristics of the population
Sampling Bias: Systematic error in how a sample is collected

Types of Biased Samples

Common Types of Sampling Bias:

1. Convenience Sampling:
• Choosing samples that are easy to reach
• Example: Surveying only your friends about school lunch
• Problem: Not representative of all students

2. Voluntary Response Bias:
• Only people who choose to respond are included
• Example: Online polls where people opt in
• Problem: People with strong opinions more likely to respond

3. Undercoverage:
• Some groups in population are excluded
• Example: Phone survey excludes people without phones
• Problem: Missing perspectives from excluded groups

4. Nonresponse Bias:
• Selected individuals don't participate
• Example: Mail survey with low response rate
• Problem: Responders may differ from non-responders

5. Question Wording Bias:
• Questions are leading or confusing
• Example: "Don't you agree that...?"
• Problem: Influences responses
Example 1: Identify if sample is biased

Scenario: A principal wants to know students' opinions on school uniforms. She surveys students in the chess club.

Analysis:
• This is convenience sampling
• Chess club members may not represent all students
• Different clubs/groups may have different opinions

Conclusion: This is a BIASED sample
Better method: Randomly select students from all grades and activities
Example 2: Identify bias

Scenario: A store wants to know customer satisfaction. They ask every 10th customer who makes a purchase.

Analysis: Systematic sampling of actual customers
Potential bias: Only includes people who made purchases (satisfied customers)
Missing: People who left without buying (possibly dissatisfied)

Conclusion: BIASED - excludes non-purchasers
Example 3: Unbiased sample

Scenario: A researcher assigns a number to each student in the school and uses a random number generator to select 50 students for a survey.

Analysis:
• Each student has equal chance of selection
• Random selection process
• No systematic exclusion

Conclusion: This is an UNBIASED sample

2. Mean, Median, Mode, and Range

Measures of Center: Values that describe the center or typical value of data
Measures of Spread: Values that describe how data is distributed
Central Tendency: The tendency of data to cluster around a central value

Mean (Average)

Mean Formula:

$$\text{Mean} = \bar{x} = \frac{\sum x}{n}$$

where:
• $\sum x$ = sum of all data values
• $n$ = number of data values
• $\bar{x}$ (x-bar) = mean

In words: Add all values, divide by how many values
Example 1: Find the mean of 5, 8, 12, 15, 20

$$\text{Mean} = \frac{5 + 8 + 12 + 15 + 20}{5} = \frac{60}{5} = 12$$

Answer: Mean = 12

Median (Middle Value)

Median Steps:

Step 1: Order data from least to greatest
Step 2: Find middle value

If odd number of values:
Median is the middle value

If even number of values:
$$\text{Median} = \frac{\text{Two middle values}}{2}$$
Example 2: Find median of 3, 7, 9, 15, 20

Already ordered: 3, 7, 9, 15, 20
Middle value: 9

Answer: Median = 9
Example 3: Find median of 4, 8, 10, 12, 16, 20

Two middle values: 4, 8, 10, 12, 16, 20
$$\text{Median} = \frac{10 + 12}{2} = \frac{22}{2} = 11$$

Answer: Median = 11

Mode (Most Frequent)

Mode Definition:

The value that appears most frequently in the dataset

Special Cases:
No mode: All values appear once
Bimodal: Two values appear most frequently
Multimodal: More than two values tied for most frequent
Example 4: Find mode of 2, 3, 3, 5, 7, 7, 7, 9

Frequency:
2: once, 3: twice, 5: once, 7: three times, 9: once

Answer: Mode = 7
Example 5: Find mode of 1, 2, 3, 4, 5

All values appear once

Answer: No mode

Range (Spread)

Range Formula:

$$\text{Range} = \text{Maximum} - \text{Minimum}$$

Interpretation: Shows how spread out the data is
Example 6: Find range of 12, 18, 25, 30, 42

$$\text{Range} = 42 - 12 = 30$$

Answer: Range = 30

3. Calculate Quartiles and Interquartile Range

Quartiles: Values that divide ordered data into four equal parts
Q1 (First Quartile): 25th percentile - median of lower half
Q2 (Second Quartile): 50th percentile - median of entire dataset
Q3 (Third Quartile): 75th percentile - median of upper half
IQR: Interquartile Range - range of middle 50% of data
Quartile Formulas:

Step 1: Order data from least to greatest
Step 2: Find median (Q2)
Step 3: Find median of lower half (Q1)
Step 4: Find median of upper half (Q3)

Interquartile Range:
$$\text{IQR} = Q3 - Q1$$

Five-Number Summary:
Minimum, Q1, Median (Q2), Q3, Maximum
Example 1: Find quartiles for: 2, 5, 7, 9, 11, 13, 15, 18, 20

Step 1: Already ordered
n = 9 values

Step 2: Find Q2 (median)
2, 5, 7, 9, 11, 13, 15, 18, 20
Q2 = 11

Step 3: Find Q1 (median of lower half)
Lower half: 2, 5, 7, 9
Q1 = 7

Step 4: Find Q3 (median of upper half)
Upper half: 13, 15, 18, 20
Q3 = 15

Step 5: Calculate IQR
$$\text{IQR} = 15 - 7 = 8$$

Answer: Q1 = 7, Q2 = 11, Q3 = 15, IQR = 8
Example 2: Find five-number summary for: 3, 6, 8, 10, 12, 15, 18, 22

Minimum: 3
Q1: Median of (3, 6, 8, 10) = $\frac{6+8}{2} = 7$
Q2 (Median): $\frac{10+12}{2} = 11$
Q3: Median of (12, 15, 18, 22) = $\frac{15+18}{2} = 16.5$
Maximum: 22

IQR: $16.5 - 7 = 9.5$

Answer: Min = 3, Q1 = 7, Med = 11, Q3 = 16.5, Max = 22, IQR = 9.5

4-5. Identify Outliers and Their Effects

Outlier: A data value significantly different from other values
Effect: Can greatly affect mean, but not median
Why identify: May indicate errors, special cases, or important information

Method 1: Using IQR (Most Common)

IQR Method for Outliers:

Step 1: Calculate Q1, Q3, and IQR

Step 2: Calculate boundaries
$$\text{Lower Boundary} = Q1 - 1.5 \times \text{IQR}$$
$$\text{Upper Boundary} = Q3 + 1.5 \times \text{IQR}$$

Step 3: Any value outside boundaries is an outlier
• Value < Lower Boundary → Low outlier
• Value > Upper Boundary → High outlier
Example 1: Identify outliers in: 5, 8, 10, 12, 15, 18, 20, 45

Find Q1 and Q3:
Q1 = 9 (median of 5, 8, 10, 12)
Q3 = 19 (median of 15, 18, 20, 45)

Calculate IQR:
$\text{IQR} = 19 - 9 = 10$

Calculate boundaries:
Lower: $9 - 1.5(10) = 9 - 15 = -6$
Upper: $19 + 1.5(10) = 19 + 15 = 34$

Check data:
All values except 45 are between -6 and 34
45 > 34

Answer: 45 is an outlier

Effects of Removing Outliers

How Outliers Affect Statistics:

Mean: GREATLY affected
• High outlier increases mean
• Low outlier decreases mean

Median: SLIGHTLY or NOT affected
• Position of middle value usually stays similar

Mode: Usually NOT affected
• Outliers typically appear only once

Range: GREATLY affected
• Outliers are often min or max values

Standard Deviation: GREATLY affected
• Measures spread from mean
Example 2: Describe effect of removing outlier

Original data: 10, 12, 13, 14, 15, 15, 16, 50

With outlier (50):
Mean: $\frac{10+12+13+14+15+15+16+50}{8} = \frac{145}{8} = 18.125$
Median: $\frac{14+15}{2} = 14.5$
Range: $50 - 10 = 40$

Without outlier:
Mean: $\frac{10+12+13+14+15+15+16}{7} = \frac{95}{7} \approx 13.57$
Median: $14$ (middle value)
Range: $16 - 10 = 6$

Analysis:
• Mean decreased from 18.125 to 13.57 (significant change)
• Median changed slightly from 14.5 to 14
• Range decreased dramatically from 40 to 6

Conclusion: Removing outlier made data more representative

6. Variance and Standard Deviation

Variance: Average of squared deviations from the mean
Standard Deviation: Square root of variance - measures typical distance from mean
Symbol for variance: $\sigma^2$ (population) or $s^2$ (sample)
Symbol for standard deviation: $\sigma$ (population) or $s$ (sample)

Population vs Sample

Key Difference:

Population: Entire group
• Divide by $n$
• Use $\sigma$ (sigma)

Sample: Part of group
• Divide by $n - 1$ (Bessel's correction)
• Use $s$

In this course, we typically use population formulas

Variance

Population Variance Formula:

$$\sigma^2 = \frac{\sum (x - \bar{x})^2}{n}$$

where:
• $x$ = each data value
• $\bar{x}$ = mean
• $n$ = number of values
• $(x - \bar{x})$ = deviation from mean

Steps:
1. Find the mean
2. Find each deviation: $(x - \bar{x})$
3. Square each deviation: $(x - \bar{x})^2$
4. Find average of squared deviations
Example 1: Find variance of 2, 4, 6, 8, 10

Step 1: Find mean
$\bar{x} = \frac{2+4+6+8+10}{5} = \frac{30}{5} = 6$

Step 2-3: Find deviations and square them
x$(x - \bar{x})$$(x - \bar{x})^2$
22 - 6 = -416
44 - 6 = -24
66 - 6 = 00
88 - 6 = 24
1010 - 6 = 416

Step 4: Calculate variance
$$\sigma^2 = \frac{16+4+0+4+16}{5} = \frac{40}{5} = 8$$

Answer: Variance = 8

Standard Deviation

Standard Deviation Formula:

$$\sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum (x - \bar{x})^2}{n}}$$

In words: Square root of variance

Interpretation:
• Small standard deviation: data clustered near mean
• Large standard deviation: data spread out from mean
• Units are same as original data (unlike variance)
Example 2: Find standard deviation using variance from Example 1

Variance: $\sigma^2 = 8$

Standard deviation:
$$\sigma = \sqrt{8} = 2\sqrt{2} \approx 2.83$$

Answer: Standard deviation ≈ 2.83

Interpretation: Values typically vary about 2.83 units from mean of 6

Using Standard Deviation to Find Outliers

Standard Deviation Method:

An outlier is any value more than 3 standard deviations from the mean

$$\text{Lower Boundary} = \bar{x} - 3\sigma$$
$$\text{Upper Boundary} = \bar{x} + 3\sigma$$

Values outside this range are outliers
Example 3: A dataset has mean = 50 and standard deviation = 5. Is 72 an outlier?

Calculate boundaries:
Lower: $50 - 3(5) = 50 - 15 = 35$
Upper: $50 + 3(5) = 50 + 15 = 65$

Check 72:
72 > 65 (upper boundary)

Answer: Yes, 72 is an outlier

7. Choose Appropriate Measures of Center and Variation

Choosing Wisely: Different situations call for different measures
Key Question: Are there outliers or is data skewed?
Decision Guide:

Use MEAN and STANDARD DEVIATION when:
• Data is symmetric (no outliers)
• Normal distribution (bell-shaped)
• Want to use all data values
• Doing further calculations

Use MEDIAN and IQR when:
• Data has outliers
• Data is skewed (not symmetric)
• Want measure resistant to extreme values
• Dealing with ordinal data (rankings)

Use MODE when:
• Data is categorical
• Want most common value
• Multiple values tied for highest frequency
Example 1: Choose appropriate measures

Scenario: Home prices in a neighborhood: $150K, $160K, $170K, $180K, $190K, $2M

Analysis:
• $2M is an outlier (much higher than others)
• Mean would be heavily influenced by $2M
• Median better represents typical home

Mean: $\frac{2,850,000}{6} = \$475,000$ (misleading!)
Median: $\frac{170,000 + 180,000}{2} = \$175,000$ (more typical)

Best choice: Median and IQR
Reason: Outlier present, better represents typical home
Example 2: Choose measures

Scenario: Test scores: 72, 75, 78, 80, 82, 85, 88, 90

Analysis:
• No outliers
• Fairly symmetric distribution
• All values close together

Best choice: Mean and Standard Deviation
Reason: Symmetric data, no outliers, uses all information
Example 3: Favorite colors survey

Data: Red (5), Blue (12), Green (3), Yellow (2)

Analysis:
• Categorical data (not numerical)
• Can't calculate mean or median

Best choice: Mode
Answer: Blue is most popular (mode)

Measures of Center Comparison

MeasureFormula/MethodBest Used WhenAffected by Outliers?
Mean$\bar{x} = \frac{\sum x}{n}$Symmetric data, no outliersYES - heavily affected
MedianMiddle value when orderedSkewed data, outliers presentNO - resistant to outliers
ModeMost frequent valueCategorical dataNO - not affected

Measures of Spread Comparison

MeasureFormulaWhat It ShowsAffected by Outliers?
RangeMax - MinTotal spreadYES - very sensitive
IQRQ3 - Q1Spread of middle 50%NO - resistant
Variance$\sigma^2 = \frac{\sum (x-\bar{x})^2}{n}$Average squared deviationYES - very sensitive
Standard Deviation$\sigma = \sqrt{\sigma^2}$Typical distance from meanYES - very sensitive

Outlier Detection Methods

MethodFormulaWhen to Use
IQR Method (Most Common)Lower: $Q1 - 1.5 \times IQR$
Upper: $Q3 + 1.5 \times IQR$
General purpose, box plots
Standard Deviation MethodLower: $\bar{x} - 3\sigma$
Upper: $\bar{x} + 3\sigma$
Normal distributions

Types of Sampling Bias

TypeDescriptionExampleProblem
ConvenienceEasy to reach samplesSurvey friends onlyNot representative
Voluntary ResponseSelf-selected participantsOnline pollStrong opinions overrepresented
UndercoverageExcludes part of populationPhone survey onlyMissing perspectives
NonresponseSelected don't respondLow response rateResponders may differ
Success Tips for Single-Variable Statistics:
✓ Mean uses all values; median uses position
✓ Always order data before finding median or quartiles
✓ IQR measures spread of middle 50% - resistant to outliers
✓ Use IQR method (1.5 × IQR) to identify outliers
✓ Outliers greatly affect mean, range, and standard deviation
✓ Outliers barely affect median and IQR
✓ Variance is in squared units; standard deviation is in original units
✓ Choose median & IQR when outliers present
✓ Choose mean & standard deviation for symmetric data
✓ Random sampling eliminates bias - every member has equal chance!
Shares: