Two-Variable Statistics - Grade 8
1. Line Graphs
Definition: A line graph displays data as points connected by lines, showing how one variable changes in relation to another (usually over time).
Key Components:
- X-axis (horizontal): Independent variable (usually time)
- Y-axis (vertical): Dependent variable (what's being measured)
- Data points: Individual values plotted as dots
- Connecting lines: Show trend or change over time
Interpreting Line Graphs:
- Rising line: Value is increasing
- Falling line: Value is decreasing
- Horizontal line: Value stays constant
- Steep slope: Rapid change
- Gentle slope: Gradual change
Creating Line Graphs:
- Draw and label axes with appropriate scales
- Plot each data point at the correct coordinates
- Connect the points with straight lines
- Add a title describing what the graph shows
2. Scatter Plots
Definition: A scatter plot shows the relationship between two numerical variables using dots plotted on a coordinate plane. Each dot represents one data pair (x, y).
Purpose:
- Show relationships between two variables
- Identify patterns and trends
- Detect outliers
- Make predictions
Interpreting Scatter Plots:
Look at the pattern of points to determine:
- Direction: Positive, negative, or no correlation
- Form: Linear or nonlinear pattern
- Strength: How closely points cluster around a line
- Outliers: Points far from the general pattern
Creating Scatter Plots:
- Set up coordinate axes with appropriate scales
- Label axes with variable names
- Plot each data pair as a point (x, y)
- Do NOT connect the points (unlike line graphs)
- Add a descriptive title
3. Identify Trends with Scatter Plots (Correlation)
Types of Correlation:
Positive Correlation (Positive Association)
- As x increases, y increases
- Points trend upward from left to right
- Example: Hours studied vs. test score
Negative Correlation (Negative Association)
- As x increases, y decreases
- Points trend downward from left to right
- Example: Age of car vs. value
No Correlation (No Association)
- No clear pattern
- Points are randomly scattered
- Variables are not related
- Example: Shoe size vs. test score
Strength of Correlation:
Strength | Description |
---|---|
Strong | Points cluster tightly around a line |
Moderate | Points show a pattern but with some scatter |
Weak | Points are loosely scattered with little pattern |
4. Make Predictions with Scatter Plots
Using Trends to Predict: If a clear pattern exists, you can predict unknown values by following the trend.
Types of Predictions:
Interpolation: Predicting a value WITHIN the range of data
- More reliable because it's within observed data
- Example: If data ranges from x=0 to x=10, predict at x=5
Extrapolation: Predicting a value OUTSIDE the range of data
- Less reliable because you assume the trend continues
- Example: If data ranges from x=0 to x=10, predict at x=15
Steps to Make Predictions:
- Identify the pattern/trend in the scatter plot
- Draw or imagine a line of best fit
- Locate the given x-value on the axis
- Follow up or down to the line of best fit
- Read the corresponding y-value
Example:
A scatter plot shows hours studied (x) vs. test score (y) with a positive correlation. If a student studies 4 hours and the trend suggests scores increase by 5 points per hour of study starting from a base of 60, predict the score:
Prediction: 60 + (4 × 5) = 80 points
5. Outliers in Scatter Plots
Definition: An outlier is a data point that does not fit the general pattern of the scatter plot. It lies far away from most other points.
Identifying Outliers:
- Look for points that are far from the cluster of other points
- Points that don't follow the trend/pattern
- Usually isolated from the main group
Causes of Outliers:
- Measurement error: Data recorded incorrectly
- Data entry error: Typo or mistake in recording
- Genuine unusual case: Real exception to the pattern
- Different subgroup: Belongs to a different category
Effect of Outliers:
Impact | Description |
---|---|
Correlation strength | Can weaken the apparent correlation |
Line of best fit | Can pull the line away from most data |
Predictions | Can make predictions less accurate |
Example:
In a scatter plot of age vs. salary, most points show increasing salary with age. One point shows a 25-year-old earning $500,000 while others at that age earn $30,000-$50,000. This is an outlier (possibly a professional athlete or CEO).
6. Line of Best Fit (Trend Line)
Definition: A straight line drawn through the center of a scatter plot that best represents the relationship between the variables. It minimizes the distance between the line and all data points.
Identifying a Good Line of Best Fit:
- Passes through or near most of the data points
- About equal number of points above and below the line
- Points are evenly distributed on both sides
- Follows the general direction/trend of the data
- Minimizes the vertical distances from points to the line
Drawing a Line of Best Fit:
- Look at the overall pattern of points
- Ignore outliers when drawing the line
- Use a ruler to draw a straight line
- Balance points above and below the line
- Extend the line across the graph
Properties:
- Also called "trend line" or "regression line"
- Can be used for predictions (interpolation and extrapolation)
- Represents the general trend, not every individual point
- Should only be used when there's a linear pattern
7. Write Equations for Lines of Best Fit
Goal: Find the equation of the line in slope-intercept form: \( y = mx + b \)
Formula:
\( y = mx + b \)
where \( m \) = slope (rate of change), \( b \) = y-intercept (starting value)
Steps to Write the Equation:
- Draw the line of best fit through the scatter plot
- Find the slope (m):
- Choose two points ON the line (preferably at grid intersections)
- Use the formula: \( m = \frac{y_2 - y_1}{x_2 - x_1} = \frac{\text{rise}}{\text{run}} \)
- Find the y-intercept (b):
- Identify where the line crosses the y-axis
- OR substitute a point and slope into \( y = mx + b \) and solve for b
- Write the equation: \( y = mx + b \)
Example:
A line of best fit passes through points (2, 5) and (6, 13). Find the equation.
Step 1: Find slope:
\( m = \frac{13 - 5}{6 - 2} = \frac{8}{4} = 2 \)
Step 2: Find y-intercept using point (2, 5):
\( 5 = 2(2) + b \) → \( 5 = 4 + b \) → \( b = 1 \)
Step 3: Write equation:
\( y = 2x + 1 \)
8. Interpret Lines of Best Fit: Word Problems
Interpreting the Equation \( y = mx + b \):
Slope (m): Rate of change; how much y changes per unit of x
- Units: (y units) per (x units)
- Positive slope: y increases as x increases
- Negative slope: y decreases as x increases
Y-intercept (b): Starting value; value of y when x = 0
- Initial amount or base value
- Where the trend begins
Example 1:
Equation: \( C = 3h + 20 \) where C is total cost ($) and h is hours of work
Slope = 3: The cost increases by $3 per hour
Y-intercept = 20: There's a $20 initial fee (starting cost)
Predict cost for 10 hours: C = 3(10) + 20 = $50
Example 2:
Equation: \( T = -5d + 70 \) where T is temperature (°F) and d is depth (feet) underground
Slope = -5: Temperature decreases by 5°F per foot of depth
Y-intercept = 70: Surface temperature is 70°F
Predict temperature at 8 feet: T = -5(8) + 70 = 30°F
9. Identify Representative, Random, and Biased Samples
Sampling: Selecting a subset of a population to make inferences about the entire population.
Population vs. Sample:
- Population: The entire group you want information about
- Sample: A subset of the population used to represent the whole
Types of Samples:
1. Random Sample
- Definition: Every member of the population has an equal chance of being selected
- Goal: Reduce bias and get fair representation
- Example: Drawing names from a hat containing all students' names
- Example: Using a random number generator to select participants
2. Representative Sample
- Definition: Accurately reflects the characteristics of the entire population
- Goal: Match population proportions
- Example: If school is 60% girls and 40% boys, sample should be similar
- Note: Random samples are usually representative
3. Biased Sample
- Definition: Does NOT fairly represent the population; certain groups are over/under-represented
- Problem: Leads to inaccurate conclusions
- Example: Surveying only students in the library about study habits (excludes non-library users)
- Example: Calling only landlines (excludes cell phone-only users)
Common Sources of Bias:
Type of Bias | Description | Example |
---|---|---|
Convenience sampling | Choosing people who are easy to reach | Surveying only your friends |
Voluntary response | People choose to participate | Online polls (only motivated people respond) |
Undercoverage | Some groups not included | Phone survey (misses people without phones) |
Evaluation Questions:
To determine if a sample is good, ask:
- Does everyone in the population have an equal chance of being selected?
- Does the sample match the characteristics of the population?
- Is any group excluded or overrepresented?
- Is the sample size large enough?
Examples:
Example 1: To find favorite lunch at school, survey every 10th student entering cafeteria
✓ Random and representative (all students use cafeteria, systematic selection)
Example 2: To find average income in a city, survey people at a luxury mall
✗ Biased (overrepresents wealthy people, excludes lower-income residents)
Example 3: To find students' opinions on homework, randomly select 50 students from entire school roster
✓ Random and likely representative (all students have equal chance)
Quick Reference: Two-Variable Statistics
Line Graphs vs. Scatter Plots:
Feature | Line Graph | Scatter Plot |
---|---|---|
Points connected? | Yes | No |
Purpose | Show change over time | Show relationship between variables |
Use when | Data is continuous | Looking for correlation |
Correlation Types:
- Positive: Both variables increase together (↗)
- Negative: One increases, other decreases (↘)
- None: No pattern or relationship
Line of Best Fit Equation:
\( y = mx + b \)
- m (slope): Rate of change
- b (y-intercept): Starting value
Good Samples:
- Random: Everyone has equal chance
- Representative: Matches population characteristics
- Unbiased: No systematic errors or exclusions
💡 Key Tips for Two-Variable Statistics
- ✓ Line graphs: connect points; Scatter plots: don't connect
- ✓ Positive correlation: both increase together (upward trend)
- ✓ Negative correlation: one up, one down (downward trend)
- ✓ No correlation: random scatter, no pattern
- ✓ Outliers: points far from the pattern
- ✓ Line of best fit: balance points above and below
- ✓ Slope tells you rate of change; y-intercept tells starting value
- ✓ Interpolation (within data) more reliable than extrapolation (outside data)
- ✓ Random sample: everyone has equal chance of selection
- ✓ Biased sample: some groups over/underrepresented
- ✓ Larger samples usually more reliable
- ✓ Use line of best fit equation to make predictions