Two-Variable Statistics - Grade 8

1. Line Graphs

Definition: A line graph displays data as points connected by lines, showing how one variable changes in relation to another (usually over time).

Key Components:

X-axis (horizontal): Independent variable (usually time)
Y-axis (vertical): Dependent variable (what's being measured)
Data points: Individual values plotted as dots
Connecting lines: Show trend or change over time

Interpreting Line Graphs:

Rising line: Value is increasing
Falling line: Value is decreasing
Horizontal line: Value stays constant
Steep slope: Rapid change
Gentle slope: Gradual change

Creating Line Graphs:

Draw and label axes with appropriate scales
Plot each data point at the correct coordinates
Connect the points with straight lines
Add a title describing what the graph shows

2. Scatter Plots

Definition: A scatter plot shows the relationship between two numerical variables using dots plotted on a coordinate plane. Each dot represents one data pair (x, y).

Purpose:

Show relationships between two variables
Identify patterns and trends
Detect outliers
Make predictions

Interpreting Scatter Plots:

Look at the pattern of points to determine:

Direction: Positive, negative, or no correlation
Form: Linear or nonlinear pattern
Strength: How closely points cluster around a line
Outliers: Points far from the general pattern

Creating Scatter Plots:

Set up coordinate axes with appropriate scales
Label axes with variable names
Plot each data pair as a point (x, y)
Do NOT connect the points (unlike line graphs)
Add a descriptive title

3. Identify Trends with Scatter Plots (Correlation)

Types of Correlation:

Positive Correlation (Positive Association)

As x increases, y increases
Points trend upward from left to right
Example: Hours studied vs. test score

Negative Correlation (Negative Association)

As x increases, y decreases
Points trend downward from left to right
Example: Age of car vs. value

No Correlation (No Association)

No clear pattern
Points are randomly scattered
Variables are not related
Example: Shoe size vs. test score

Strength of Correlation:

Strength	Description
Strong	Points cluster tightly around a line
Moderate	Points show a pattern but with some scatter
Weak	Points are loosely scattered with little pattern

4. Make Predictions with Scatter Plots

Using Trends to Predict: If a clear pattern exists, you can predict unknown values by following the trend.

Types of Predictions:

Interpolation: Predicting a value WITHIN the range of data

More reliable because it's within observed data
Example: If data ranges from x=0 to x=10, predict at x=5

Extrapolation: Predicting a value OUTSIDE the range of data

Less reliable because you assume the trend continues
Example: If data ranges from x=0 to x=10, predict at x=15

Steps to Make Predictions:

Identify the pattern/trend in the scatter plot
Draw or imagine a line of best fit
Locate the given x-value on the axis
Follow up or down to the line of best fit
Read the corresponding y-value

Example:

A scatter plot shows hours studied (x) vs. test score (y) with a positive correlation. If a student studies 4 hours and the trend suggests scores increase by 5 points per hour of study starting from a base of 60, predict the score:

Prediction: 60 + (4 × 5) = 80 points

5. Outliers in Scatter Plots

Definition: An outlier is a data point that does not fit the general pattern of the scatter plot. It lies far away from most other points.

Identifying Outliers:

Look for points that are far from the cluster of other points
Points that don't follow the trend/pattern
Usually isolated from the main group

Causes of Outliers:

Measurement error: Data recorded incorrectly
Data entry error: Typo or mistake in recording
Genuine unusual case: Real exception to the pattern
Different subgroup: Belongs to a different category

Effect of Outliers:

Impact	Description
Correlation strength	Can weaken the apparent correlation
Line of best fit	Can pull the line away from most data
Predictions	Can make predictions less accurate

Example:

In a scatter plot of age vs. salary, most points show increasing salary with age. One point shows a 25-year-old earning $500,000 while others at that age earn $30,000-$50,000. This is an outlier (possibly a professional athlete or CEO).

6. Line of Best Fit (Trend Line)

Definition: A straight line drawn through the center of a scatter plot that best represents the relationship between the variables. It minimizes the distance between the line and all data points.

Identifying a Good Line of Best Fit:

Passes through or near most of the data points
About equal number of points above and below the line
Points are evenly distributed on both sides
Follows the general direction/trend of the data
Minimizes the vertical distances from points to the line

Drawing a Line of Best Fit:

Look at the overall pattern of points
Ignore outliers when drawing the line
Use a ruler to draw a straight line
Balance points above and below the line
Extend the line across the graph

Properties:

Also called "trend line" or "regression line"
Can be used for predictions (interpolation and extrapolation)
Represents the general trend, not every individual point
Should only be used when there's a linear pattern

7. Write Equations for Lines of Best Fit

Goal: Find the equation of the line in slope-intercept form: $ y = mx + b $

Formula:

$ y = mx + b $

where $ m $ = slope (rate of change), $ b $ = y-intercept (starting value)

Steps to Write the Equation:

Draw the line of best fit through the scatter plot
Find the slope (m):
- Choose two points ON the line (preferably at grid intersections)
- Use the formula: $ m = \frac{y_2 - y_1}{x_2 - x_1} = \frac{\text{rise}}{\text{run}} $
Find the y-intercept (b):
- Identify where the line crosses the y-axis
- OR substitute a point and slope into $ y = mx + b $ and solve for b
Write the equation: $ y = mx + b $

Example:

A line of best fit passes through points (2, 5) and (6, 13). Find the equation.

Step 1: Find slope:

$ m = \frac{13 - 5}{6 - 2} = \frac{8}{4} = 2 $

Step 2: Find y-intercept using point (2, 5):

$ 5 = 2(2) + b $ → $ 5 = 4 + b $ → $ b = 1 $

Step 3: Write equation:

$ y = 2x + 1 $

8. Interpret Lines of Best Fit: Word Problems

Interpreting the Equation $ y = mx + b $:

Slope (m): Rate of change; how much y changes per unit of x

Units: (y units) per (x units)
Positive slope: y increases as x increases
Negative slope: y decreases as x increases

Y-intercept (b): Starting value; value of y when x = 0

Initial amount or base value
Where the trend begins

Example 1:

Equation: $ C = 3h + 20 $ where C is total cost ($) and h is hours of work

Slope = 3: The cost increases by $3 per hour

Y-intercept = 20: There's a $20 initial fee (starting cost)

Predict cost for 10 hours: C = 3(10) + 20 = $50

Example 2:

Equation: $ T = -5d + 70 $ where T is temperature (°F) and d is depth (feet) underground

Slope = -5: Temperature decreases by 5°F per foot of depth

Y-intercept = 70: Surface temperature is 70°F

Predict temperature at 8 feet: T = -5(8) + 70 = 30°F

9. Identify Representative, Random, and Biased Samples

Sampling: Selecting a subset of a population to make inferences about the entire population.

Population vs. Sample:

Population: The entire group you want information about
Sample: A subset of the population used to represent the whole

Types of Samples:

1. Random Sample

Definition: Every member of the population has an equal chance of being selected
Goal: Reduce bias and get fair representation
Example: Drawing names from a hat containing all students' names
Example: Using a random number generator to select participants

2. Representative Sample

Definition: Accurately reflects the characteristics of the entire population
Goal: Match population proportions
Example: If school is 60% girls and 40% boys, sample should be similar
Note: Random samples are usually representative

3. Biased Sample

Definition: Does NOT fairly represent the population; certain groups are over/under-represented
Problem: Leads to inaccurate conclusions
Example: Surveying only students in the library about study habits (excludes non-library users)
Example: Calling only landlines (excludes cell phone-only users)

Common Sources of Bias:

Type of Bias	Description	Example
Convenience sampling	Choosing people who are easy to reach	Surveying only your friends
Voluntary response	People choose to participate	Online polls (only motivated people respond)
Undercoverage	Some groups not included	Phone survey (misses people without phones)

Evaluation Questions:

To determine if a sample is good, ask:

Does everyone in the population have an equal chance of being selected?
Does the sample match the characteristics of the population?
Is any group excluded or overrepresented?
Is the sample size large enough?

Examples:

Example 1: To find favorite lunch at school, survey every 10th student entering cafeteria

✓ Random and representative (all students use cafeteria, systematic selection)

Example 2: To find average income in a city, survey people at a luxury mall

✗ Biased (overrepresents wealthy people, excludes lower-income residents)

Example 3: To find students' opinions on homework, randomly select 50 students from entire school roster

✓ Random and likely representative (all students have equal chance)

Quick Reference: Two-Variable Statistics

Line Graphs vs. Scatter Plots:

Feature	Line Graph	Scatter Plot
Points connected?	Yes	No
Purpose	Show change over time	Show relationship between variables
Use when	Data is continuous	Looking for correlation

Correlation Types:

Positive: Both variables increase together (↗)
Negative: One increases, other decreases (↘)
None: No pattern or relationship

Line of Best Fit Equation:

$ y = mx + b $

m (slope): Rate of change
b (y-intercept): Starting value

Good Samples:

Random: Everyone has equal chance
Representative: Matches population characteristics
Unbiased: No systematic errors or exclusions

💡 Key Tips for Two-Variable Statistics

✓ Line graphs: connect points; Scatter plots: don't connect
✓ Positive correlation: both increase together (upward trend)
✓ Negative correlation: one up, one down (downward trend)
✓ No correlation: random scatter, no pattern
✓ Outliers: points far from the pattern
✓ Line of best fit: balance points above and below
✓ Slope tells you rate of change; y-intercept tells starting value
✓ Interpolation (within data) more reliable than extrapolation (outside data)
✓ Random sample: everyone has equal chance of selection
✓ Biased sample: some groups over/underrepresented
✓ Larger samples usually more reliable
✓ Use line of best fit equation to make predictions