One-Variable Statistics - Grade 8
1. Mean, Median, Mode, and Range
These are measures of central tendency and spread that help us understand and describe data.
Mean (Average):
\( \text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}} = \frac{\sum x}{n} \)
Add all values and divide by how many there are
Median (Middle Value):
- Step 1: Arrange data in order from least to greatest
- Step 2: Find the middle value:
- Odd number of values: Middle value is the median
- Even number of values: Average the two middle values
Mode (Most Frequent):
The value that appears most often in the data set
- Can have one mode (unimodal)
- Can have multiple modes (bimodal, multimodal)
- Can have no mode if all values appear equally
Range (Spread):
\( \text{Range} = \text{Maximum value} - \text{Minimum value} \)
Example:
Data set: 12, 15, 18, 15, 22, 30, 15
Mean: \( \frac{12 + 15 + 18 + 15 + 22 + 30 + 15}{7} = \frac{127}{7} = 18.14 \)
Median: Order: 12, 15, 15, 15, 18, 22, 30 → Middle value = 15
Mode: 15 (appears 3 times)
Range: 30 - 12 = 18
2. Interpret Charts and Graphs
Common Chart Types:
Frequency Tables: Shows how often each value appears
- Multiply each value by its frequency, then sum
- Divide by total frequency to get mean
Dot Plots: Each dot represents one data point
- Count dots at each value
- Most dots = mode
Bar Graphs: Height of bars shows frequency
- Read frequency from y-axis
- Tallest bar = mode
Example: Frequency Table
Score | Frequency |
---|---|
5 | 3 |
7 | 5 |
9 | 2 |
Mean: \( \frac{(5×3) + (7×5) + (9×2)}{3+5+2} = \frac{15+35+18}{10} = \frac{68}{10} = 6.8 \)
Mode: 7 (highest frequency of 5)
3. Find the Missing Number
Strategy for Finding Missing Value:
When given the mean:
- Multiply mean by number of values to get total sum
- Subtract known values from total sum
- Result is the missing value
When given the median:
- Arrange known values in order
- Determine where missing value must be placed
- Use median position to find missing value
Example 1: Finding Missing Value (Mean)
Problem: The mean of 15, 20, x, 25 is 22. Find x.
Step 1: Total sum = Mean × Number of values = 22 × 4 = 88
Step 2: Sum of known values = 15 + 20 + 25 = 60
Step 3: x = 88 - 60 = 28
Answer: x = 28
Example 2: Finding Missing Value (Median)
Problem: The median of 8, 12, x, 20, 24 is 15. Find x.
Step 1: For 5 values, median is the 3rd value
Step 2: In order: 8, 12, ?, ?, ?
Step 3: The middle value must be 15, so x = 15
Answer: x = 15
4. Changes in Mean, Median, Mode, and Range
Effects of Adding/Removing a Value:
Change | Effect on Mean | Effect on Median | Effect on Mode | Effect on Range |
---|---|---|---|---|
Add large value | Increases | May increase | May change | Increases |
Add small value | Decreases | May decrease | May change | Increases |
Remove outlier | Changes significantly | Little change | May change | Decreases |
Example:
Original data: 10, 12, 15, 15, 18
Mean = 14, Median = 15, Mode = 15, Range = 8
Add 30 to the data: 10, 12, 15, 15, 18, 30
Mean = 16.67 (increased), Median = 15 (same), Mode = 15 (same), Range = 20 (increased)
Key Observations:
- Mean is most affected by outliers
- Median is resistant to outliers
- Mode only changes if frequencies change
- Range changes when min or max changes
5. Mean Absolute Deviation (MAD)
Definition: The average distance of each data point from the mean. It measures how spread out the data is.
Formula:
\( \text{MAD} = \frac{\sum |x_i - \bar{x}|}{n} \)
where \( x_i \) = each data value, \( \bar{x} \) = mean, \( n \) = number of values
Steps to Calculate MAD:
- Find the mean of the data set
- Find the absolute deviation of each value from the mean: \( |x_i - \bar{x}| \)
- Add all the absolute deviations
- Divide by the number of values
Example:
Data: 4, 8, 6, 10, 12
Step 1: Mean = \( \frac{4+8+6+10+12}{5} = \frac{40}{5} = 8 \)
Step 2: Find absolute deviations:
Value | Deviation from Mean | Absolute Deviation |
---|---|---|
4 | 4 - 8 = -4 | 4 |
8 | 8 - 8 = 0 | 0 |
6 | 6 - 8 = -2 | 2 |
10 | 10 - 8 = 2 | 2 |
12 | 12 - 8 = 4 | 4 |
Step 3: Sum of absolute deviations = 4 + 0 + 2 + 2 + 4 = 12
Step 4: MAD = \( \frac{12}{5} = 2.4 \)
Answer: MAD = 2.4 (on average, values are 2.4 units from the mean)
6. Quartiles and Interquartile Range (IQR)
Quartiles:
Quartiles divide ordered data into four equal parts.
- Q₁ (First Quartile): Median of lower half (25th percentile)
- Q₂ (Second Quartile): Median of entire data (50th percentile)
- Q₃ (Third Quartile): Median of upper half (75th percentile)
Interquartile Range (IQR):
\( \text{IQR} = Q_3 - Q_1 \)
IQR measures the spread of the middle 50% of the data
Steps to Find Quartiles:
- Arrange data in order from least to greatest
- Find Q₂ (median of all data)
- Find Q₁ (median of lower half, excluding Q₂)
- Find Q₃ (median of upper half, excluding Q₂)
- Calculate IQR = Q₃ - Q₁
Example:
Data: 3, 7, 8, 12, 13, 14, 18, 21, 22
Step 1: Already in order (9 values)
Step 2: Q₂ (median) = 13 (5th value)
Step 3: Lower half: 3, 7, 8, 12 → Q₁ = \( \frac{7+8}{2} = 7.5 \)
Step 4: Upper half: 14, 18, 21, 22 → Q₃ = \( \frac{18+21}{2} = 19.5 \)
Step 5: IQR = 19.5 - 7.5 = 12
Five-Number Summary: Min = 3, Q₁ = 7.5, Q₂ = 13, Q₃ = 19.5, Max = 22
7. Box Plots (Box-and-Whisker Plots)
Definition: A visual display of the five-number summary showing the distribution and spread of data.
Five-Number Summary:
- Minimum: Smallest value (left whisker)
- Q₁: First quartile (left edge of box)
- Median (Q₂): Middle value (line inside box)
- Q₃: Third quartile (right edge of box)
- Maximum: Largest value (right whisker)
Parts of a Box Plot:
- Box: Contains middle 50% of data (from Q₁ to Q₃)
- Whiskers: Lines extending to minimum and maximum
- Median line: Vertical line inside the box
Reading a Box Plot:
- Width of box = IQR: Shows spread of middle 50%
- Length of whiskers: Shows overall range
- Median position: Shows if data is skewed
- Outliers: Marked as individual points beyond whiskers
Interpreting Box Plots:
Feature | Meaning |
---|---|
Long box | Data is spread out in the middle |
Short box | Data is clustered in the middle |
Median near Q₁ | Right-skewed data |
Median near Q₃ | Left-skewed data |
Median in center | Symmetric data |
8. Identify an Outlier
Definition: An outlier is a data value that is much higher or much lower than most of the other values in a data set.
Methods to Identify Outliers:
Method 1: Visual Inspection
Look for values that are far from the rest of the data
Method 2: 1.5 × IQR Rule
Lower boundary: \( Q_1 - 1.5 \times \text{IQR} \)
Upper boundary: \( Q_3 + 1.5 \times \text{IQR} \)
Any value below lower boundary or above upper boundary is an outlier
Example:
Data: 12, 15, 18, 20, 22, 25, 28, 75
Find quartiles:
Q₁ = 16.5, Q₂ = 21, Q₃ = 26.5
Calculate IQR: IQR = 26.5 - 16.5 = 10
Calculate boundaries:
Lower: 16.5 - 1.5(10) = 16.5 - 15 = 1.5
Upper: 26.5 + 1.5(10) = 26.5 + 15 = 41.5
75 > 41.5, so 75 is an outlier
9. Effect of Removing an Outlier
Impact on Statistics:
Statistic | Effect of Removing Outlier | Sensitivity |
---|---|---|
Mean | Changes significantly (moves toward center) | Very sensitive |
Median | Little to no change | Not sensitive (resistant) |
Mode | Usually no change (unless outlier is the mode) | Not sensitive |
Range | Decreases significantly | Very sensitive |
MAD | Decreases (data more clustered) | Sensitive |
IQR | Little to no change | Not sensitive (resistant) |
Example:
With outlier: 5, 8, 10, 12, 15, 50
Mean = 16.67, Median = 11, Mode = none, Range = 45
Without outlier: 5, 8, 10, 12, 15
Mean = 10, Median = 10, Mode = none, Range = 10
Observations:
- Mean decreased by 6.67 (40% change)
- Median decreased by only 1 (9% change)
- Range decreased by 35 (78% change)
Conclusion: Median and IQR are better measures when outliers are present.
Quick Reference: One-Variable Statistics
Key Formulas:
Mean: \( \bar{x} = \frac{\sum x}{n} \)
Range: \( \text{Max} - \text{Min} \)
MAD: \( \frac{\sum |x_i - \bar{x}|}{n} \)
IQR: \( Q_3 - Q_1 \)
Outlier boundaries: \( Q_1 - 1.5 \times \text{IQR} \) and \( Q_3 + 1.5 \times \text{IQR} \)
Five-Number Summary:
Minimum, Q₁, Median (Q₂), Q₃, Maximum
Resistant vs Sensitive Measures:
- Resistant (not affected by outliers): Median, IQR
- Sensitive (affected by outliers): Mean, Range, MAD
💡 Key Tips for One-Variable Statistics
- ✓ Mean = average (add all, divide by count)
- ✓ Median = middle value (order data first!)
- ✓ Mode = most frequent value
- ✓ Range = max - min (spread of all data)
- ✓ Always order data before finding median and quartiles
- ✓ MAD shows average distance from mean
- ✓ IQR shows spread of middle 50% of data
- ✓ Box plot shows five-number summary visually
- ✓ Outlier = value beyond 1.5 × IQR from quartiles
- ✓ Mean is sensitive to outliers; median is not
- ✓ Use median for skewed data or data with outliers
- ✓ Removing outliers: mean and range change most