Bivariate Statistics - Ninth Grade Math
Introduction to Bivariate Data
Bivariate Data: Data involving two variables
Purpose: To examine relationships between two variables
Key Questions:
• Is there a relationship between the variables?
• How strong is the relationship?
• Can we predict one variable from the other?
Variables:
• Independent Variable (x): The input or predictor variable
• Dependent Variable (y): The output or response variable
Purpose: To examine relationships between two variables
Key Questions:
• Is there a relationship between the variables?
• How strong is the relationship?
• Can we predict one variable from the other?
Variables:
• Independent Variable (x): The input or predictor variable
• Dependent Variable (y): The output or response variable
1. Interpret a Scatter Plot
Scatter Plot: A graph showing relationship between two quantitative variables
Each Point: Represents one data pair (x, y)
x-axis: Independent variable
y-axis: Dependent variable
Each Point: Represents one data pair (x, y)
x-axis: Independent variable
y-axis: Dependent variable
Types of Associations
Direction of Association:
1. Positive Association (Positive Correlation):
• As x increases, y increases
• Points trend upward from left to right
• Example: Hours studied vs. test score
2. Negative Association (Negative Correlation):
• As x increases, y decreases
• Points trend downward from left to right
• Example: Hours watching TV vs. test score
3. No Association (No Correlation):
• No clear pattern
• Points scattered randomly
• Example: Shoe size vs. test score
1. Positive Association (Positive Correlation):
• As x increases, y increases
• Points trend upward from left to right
• Example: Hours studied vs. test score
2. Negative Association (Negative Correlation):
• As x increases, y decreases
• Points trend downward from left to right
• Example: Hours watching TV vs. test score
3. No Association (No Correlation):
• No clear pattern
• Points scattered randomly
• Example: Shoe size vs. test score
Strength of Association
How Close to a Line:
Strong Association:
• Points cluster tightly around a line
• Clear pattern visible
Moderate Association:
• Points generally follow a pattern but with scatter
• Trend visible but not tight
Weak Association:
• Points loosely follow a pattern
• Much scatter, unclear trend
Strong Association:
• Points cluster tightly around a line
• Clear pattern visible
Moderate Association:
• Points generally follow a pattern but with scatter
• Trend visible but not tight
Weak Association:
• Points loosely follow a pattern
• Much scatter, unclear trend
Form of Association
Shape of Pattern:
Linear: Points follow a straight line pattern
Nonlinear: Points follow a curved pattern (quadratic, exponential, etc.)
No Form: No discernible pattern
Linear: Points follow a straight line pattern
Nonlinear: Points follow a curved pattern (quadratic, exponential, etc.)
No Form: No discernible pattern
Example 1: Interpret scatter plot
Data: Hours studying (x) vs. Test score (y)
Pattern: Points trend upward from left to right, fairly tight to a line
Interpretation:
• Direction: Positive association
• Strength: Strong
• Form: Linear
Conclusion: There is a strong, positive, linear association between hours studying and test scores. As study time increases, test scores tend to increase.
Data: Hours studying (x) vs. Test score (y)
Pattern: Points trend upward from left to right, fairly tight to a line
Interpretation:
• Direction: Positive association
• Strength: Strong
• Form: Linear
Conclusion: There is a strong, positive, linear association between hours studying and test scores. As study time increases, test scores tend to increase.
2. Outliers in Scatter Plots
Outlier in Scatter Plot: A point that doesn't fit the general pattern
Characteristics:
• Far from other points
• Doesn't follow the trend
• May indicate error or special case
Characteristics:
• Far from other points
• Doesn't follow the trend
• May indicate error or special case
Identifying Outliers:
Visual Method:
• Look for points far from the main cluster
• Points that don't fit the linear pattern
Types of Outliers:
• Vertical outlier: Unusual y-value for its x-value
• Horizontal outlier: Unusual x-value
• Influential outlier: Point that significantly affects correlation/regression line
Visual Method:
• Look for points far from the main cluster
• Points that don't fit the linear pattern
Types of Outliers:
• Vertical outlier: Unusual y-value for its x-value
• Horizontal outlier: Unusual x-value
• Influential outlier: Point that significantly affects correlation/regression line
Example 1: Identify outlier
Data points: (1, 3), (2, 5), (3, 7), (4, 9), (5, 11), (6, 2)
Analysis:
Most points follow pattern: y ≈ 2x + 1
Point (6, 2) doesn't fit: should be around (6, 13)
Conclusion: (6, 2) is an outlier
Effect: Would weaken the correlation and pull regression line down
Data points: (1, 3), (2, 5), (3, 7), (4, 9), (5, 11), (6, 2)
Analysis:
Most points follow pattern: y ≈ 2x + 1
Point (6, 2) doesn't fit: should be around (6, 13)
Conclusion: (6, 2) is an outlier
Effect: Would weaken the correlation and pull regression line down
Effect of Outliers:
• Can weaken correlation
• Affects slope of regression line
• May dramatically change predictions
• Should investigate: error, unusual case, or valid extreme?
• Can weaken correlation
• Affects slope of regression line
• May dramatically change predictions
• Should investigate: error, unusual case, or valid extreme?
3. Match Correlation Coefficients to Scatter Plots
Correlation Coefficient (r): A number measuring strength and direction of linear relationship
Symbol: $r$
Range: $-1 \leq r \leq 1$
Also called: Pearson correlation coefficient
Symbol: $r$
Range: $-1 \leq r \leq 1$
Also called: Pearson correlation coefficient
Correlation Coefficient Values:
Perfect Positive: $r = +1$
• All points on line with positive slope
• Perfect positive linear relationship
Strong Positive: $0.7 < r < 1$
• Points cluster tightly around upward line
• Strong positive association
Moderate Positive: $0.3 < r < 0.7$
• Points loosely follow upward trend
• Moderate positive association
Weak Positive: $0 < r < 0.3$
• Slight upward trend, much scatter
• Weak positive association
No Correlation: $r = 0$
• No pattern
• No linear relationship
Weak Negative: $-0.3 < r < 0$
• Slight downward trend
Moderate Negative: $-0.7 < r < -0.3$
• Clear downward trend with scatter
Strong Negative: $-1 < r < -0.7$
• Points cluster tightly around downward line
Perfect Negative: $r = -1$
• All points on line with negative slope
Perfect Positive: $r = +1$
• All points on line with positive slope
• Perfect positive linear relationship
Strong Positive: $0.7 < r < 1$
• Points cluster tightly around upward line
• Strong positive association
Moderate Positive: $0.3 < r < 0.7$
• Points loosely follow upward trend
• Moderate positive association
Weak Positive: $0 < r < 0.3$
• Slight upward trend, much scatter
• Weak positive association
No Correlation: $r = 0$
• No pattern
• No linear relationship
Weak Negative: $-0.3 < r < 0$
• Slight downward trend
Moderate Negative: $-0.7 < r < -0.3$
• Clear downward trend with scatter
Strong Negative: $-1 < r < -0.7$
• Points cluster tightly around downward line
Perfect Negative: $r = -1$
• All points on line with negative slope
Example 1: Match r-values to scatter plots
Given r-values: -0.95, -0.4, 0.1, 0.85
Plot A: Points tightly clustered downward
→ $r = -0.95$ (strong negative)
Plot B: Points loosely trending downward
→ $r = -0.4$ (moderate negative)
Plot C: Random scatter, slight upward
→ $r = 0.1$ (very weak positive)
Plot D: Points tightly clustered upward
→ $r = 0.85$ (strong positive)
Given r-values: -0.95, -0.4, 0.1, 0.85
Plot A: Points tightly clustered downward
→ $r = -0.95$ (strong negative)
Plot B: Points loosely trending downward
→ $r = -0.4$ (moderate negative)
Plot C: Random scatter, slight upward
→ $r = 0.1$ (very weak positive)
Plot D: Points tightly clustered upward
→ $r = 0.85$ (strong positive)
4. Calculate Correlation Coefficients
Correlation Coefficient Formula:
$$r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}$$
where:
• $n$ = number of data pairs
• $x$ = values of independent variable
• $y$ = values of dependent variable
• $\sum xy$ = sum of products of paired values
• $\sum x$ = sum of x-values
• $\sum y$ = sum of y-values
• $\sum x^2$ = sum of squared x-values
• $\sum y^2$ = sum of squared y-values
$$r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}$$
where:
• $n$ = number of data pairs
• $x$ = values of independent variable
• $y$ = values of dependent variable
• $\sum xy$ = sum of products of paired values
• $\sum x$ = sum of x-values
• $\sum y$ = sum of y-values
• $\sum x^2$ = sum of squared x-values
• $\sum y^2$ = sum of squared y-values
Steps to Calculate r:
Step 1: Create table with columns: x, y, xy, x², y²
Step 2: Calculate each column
Step 3: Find sum of each column
Step 4: Substitute into formula
Step 5: Simplify to find r
Step 6: Interpret the value
Step 1: Create table with columns: x, y, xy, x², y²
Step 2: Calculate each column
Step 3: Find sum of each column
Step 4: Substitute into formula
Step 5: Simplify to find r
Step 6: Interpret the value
Example 1: Calculate r for data: (1, 2), (2, 3), (3, 5), (4, 6)
Create table:
Apply formula: $n = 4$
$$r = \frac{4(47) - (10)(16)}{\sqrt{[4(30) - 10^2][4(74) - 16^2]}}$$
$$r = \frac{188 - 160}{\sqrt{[120 - 100][296 - 256]}}$$
$$r = \frac{28}{\sqrt{20 \times 40}} = \frac{28}{\sqrt{800}} = \frac{28}{28.28} \approx 0.99$$
Answer: $r \approx 0.99$ (very strong positive correlation)
Create table:
x | y | xy | x² | y² |
---|---|---|---|---|
1 | 2 | 2 | 1 | 4 |
2 | 3 | 6 | 4 | 9 |
3 | 5 | 15 | 9 | 25 |
4 | 6 | 24 | 16 | 36 |
Σ = 10 | Σ = 16 | Σ = 47 | Σ = 30 | Σ = 74 |
Apply formula: $n = 4$
$$r = \frac{4(47) - (10)(16)}{\sqrt{[4(30) - 10^2][4(74) - 16^2]}}$$
$$r = \frac{188 - 160}{\sqrt{[120 - 100][296 - 256]}}$$
$$r = \frac{28}{\sqrt{20 \times 40}} = \frac{28}{\sqrt{800}} = \frac{28}{28.28} \approx 0.99$$
Answer: $r \approx 0.99$ (very strong positive correlation)
5-6. Write and Interpret Lines of Best Fit
Line of Best Fit: A line that best represents the data in a scatter plot
Also called: Trend line or regression line
Purpose: To model relationship and make predictions
Form: $y = mx + b$ (slope-intercept form)
Also called: Trend line or regression line
Purpose: To model relationship and make predictions
Form: $y = mx + b$ (slope-intercept form)
Line of Best Fit Equation:
$$y = mx + b$$
where:
• $m$ = slope (rate of change)
• $b$ = y-intercept (value when x = 0)
Slope Formula (using two points on line):
$$m = \frac{y_2 - y_1}{x_2 - x_1}$$
Finding b: Use a point on the line
$$b = y - mx$$
$$y = mx + b$$
where:
• $m$ = slope (rate of change)
• $b$ = y-intercept (value when x = 0)
Slope Formula (using two points on line):
$$m = \frac{y_2 - y_1}{x_2 - x_1}$$
Finding b: Use a point on the line
$$b = y - mx$$
Example 1: Write equation of line of best fit
Given: Line passes through (1, 3) and (5, 11)
Find slope:
$$m = \frac{11 - 3}{5 - 1} = \frac{8}{4} = 2$$
Find y-intercept using (1, 3):
$3 = 2(1) + b$
$3 = 2 + b$
$b = 1$
Equation: $y = 2x + 1$
Given: Line passes through (1, 3) and (5, 11)
Find slope:
$$m = \frac{11 - 3}{5 - 1} = \frac{8}{4} = 2$$
Find y-intercept using (1, 3):
$3 = 2(1) + b$
$3 = 2 + b$
$b = 1$
Equation: $y = 2x + 1$
Interpreting Lines of Best Fit
Interpretation Guide:
Slope (m):
• Represents rate of change
• "For every 1 unit increase in x, y changes by m units"
• Positive m: y increases as x increases
• Negative m: y decreases as x increases
Y-intercept (b):
• Value of y when x = 0
• Starting value or initial amount
• May or may not be meaningful in context
Making Predictions:
• Substitute x-value into equation
• Solve for y
• Interpolation: Predicting within data range (reliable)
• Extrapolation: Predicting outside data range (less reliable)
Slope (m):
• Represents rate of change
• "For every 1 unit increase in x, y changes by m units"
• Positive m: y increases as x increases
• Negative m: y decreases as x increases
Y-intercept (b):
• Value of y when x = 0
• Starting value or initial amount
• May or may not be meaningful in context
Making Predictions:
• Substitute x-value into equation
• Solve for y
• Interpolation: Predicting within data range (reliable)
• Extrapolation: Predicting outside data range (less reliable)
Example 2: Interpret line of best fit
Equation: $y = 5x + 60$
Context: x = hours studied, y = test score
Slope interpretation:
For every additional hour studied, test score increases by 5 points on average.
Y-intercept interpretation:
A student who studies 0 hours would be expected to score 60 points.
Prediction: If a student studies 8 hours:
$y = 5(8) + 60 = 40 + 60 = 100$ points
Equation: $y = 5x + 60$
Context: x = hours studied, y = test score
Slope interpretation:
For every additional hour studied, test score increases by 5 points on average.
Y-intercept interpretation:
A student who studies 0 hours would be expected to score 60 points.
Prediction: If a student studies 8 hours:
$y = 5(8) + 60 = 40 + 60 = 100$ points
7-9. Find, Interpret, and Analyze Regression Lines
Regression Line: The line of best fit calculated using least squares method
Least Squares: Minimizes sum of squared vertical distances from points to line
Also called: Linear regression, least squares regression line (LSRL)
Least Squares: Minimizes sum of squared vertical distances from points to line
Also called: Linear regression, least squares regression line (LSRL)
Linear Regression Formulas:
Slope:
$$m = \frac{n\sum xy - (\sum x)(\sum y)}{n\sum x^2 - (\sum x)^2}$$
Or using means:
$$m = r \cdot \frac{s_y}{s_x}$$
Y-intercept:
$$b = \bar{y} - m\bar{x}$$
where:
• $\bar{x}$ = mean of x-values
• $\bar{y}$ = mean of y-values
• $s_x$ = standard deviation of x
• $s_y$ = standard deviation of y
• $r$ = correlation coefficient
Regression Equation:
$$\hat{y} = mx + b$$
(hat symbol indicates predicted value)
Slope:
$$m = \frac{n\sum xy - (\sum x)(\sum y)}{n\sum x^2 - (\sum x)^2}$$
Or using means:
$$m = r \cdot \frac{s_y}{s_x}$$
Y-intercept:
$$b = \bar{y} - m\bar{x}$$
where:
• $\bar{x}$ = mean of x-values
• $\bar{y}$ = mean of y-values
• $s_x$ = standard deviation of x
• $s_y$ = standard deviation of y
• $r$ = correlation coefficient
Regression Equation:
$$\hat{y} = mx + b$$
(hat symbol indicates predicted value)
Example 1: Find regression equation
Given data: (2, 3), (4, 5), (6, 7), (8, 10)
Calculate means:
$\bar{x} = \frac{2+4+6+8}{4} = 5$
$\bar{y} = \frac{3+5+7+10}{4} = 6.25$
Use table from earlier:
$\sum xy = 146$, $\sum x^2 = 120$, $n = 4$
Calculate slope:
$$m = \frac{4(146) - (20)(25)}{4(120) - 400} = \frac{584 - 500}{480 - 400} = \frac{84}{80} = 1.05$$
Calculate y-intercept:
$$b = 6.25 - 1.05(5) = 6.25 - 5.25 = 1$$
Regression equation: $\hat{y} = 1.05x + 1$
Given data: (2, 3), (4, 5), (6, 7), (8, 10)
Calculate means:
$\bar{x} = \frac{2+4+6+8}{4} = 5$
$\bar{y} = \frac{3+5+7+10}{4} = 6.25$
Use table from earlier:
$\sum xy = 146$, $\sum x^2 = 120$, $n = 4$
Calculate slope:
$$m = \frac{4(146) - (20)(25)}{4(120) - 400} = \frac{584 - 500}{480 - 400} = \frac{84}{80} = 1.05$$
Calculate y-intercept:
$$b = 6.25 - 1.05(5) = 6.25 - 5.25 = 1$$
Regression equation: $\hat{y} = 1.05x + 1$
Analyzing Regression Lines
Key Concepts:
Residual: Difference between actual and predicted value
$$\text{Residual} = y - \hat{y}$$
Positive residual: Actual value above prediction
Negative residual: Actual value below prediction
Coefficient of Determination ($r^2$):
• Square of correlation coefficient
• Represents proportion of variation in y explained by x
• Range: 0 to 1
• Example: $r^2 = 0.81$ means 81% of variation in y is explained by x
Residual: Difference between actual and predicted value
$$\text{Residual} = y - \hat{y}$$
Positive residual: Actual value above prediction
Negative residual: Actual value below prediction
Coefficient of Determination ($r^2$):
• Square of correlation coefficient
• Represents proportion of variation in y explained by x
• Range: 0 to 1
• Example: $r^2 = 0.81$ means 81% of variation in y is explained by x
Example 2: Calculate and interpret residual
Regression equation: $\hat{y} = 2x + 3$
Actual data point: (5, 15)
Predicted value:
$\hat{y} = 2(5) + 3 = 13$
Residual:
$15 - 13 = 2$
Interpretation: The actual y-value is 2 units higher than predicted by the regression line.
Regression equation: $\hat{y} = 2x + 3$
Actual data point: (5, 15)
Predicted value:
$\hat{y} = 2(5) + 3 = 13$
Residual:
$15 - 13 = 2$
Interpretation: The actual y-value is 2 units higher than predicted by the regression line.
10. Exponential Regression
Exponential Regression: Finding best-fit exponential curve for data
Used when: Data shows exponential growth or decay pattern
Form: $y = ab^x$ or $y = ae^{kx}$
Used when: Data shows exponential growth or decay pattern
Form: $y = ab^x$ or $y = ae^{kx}$
Exponential Model:
$$y = ab^x$$
where:
• $a$ = initial value (y-intercept, when x = 0)
• $b$ = growth/decay factor
• If $b > 1$: exponential growth
• If $0 < b < 1$: exponential decay
Alternative form:
$$y = ae^{kx}$$
where:
• $k > 0$: growth
• $k < 0$: decay
$$y = ab^x$$
where:
• $a$ = initial value (y-intercept, when x = 0)
• $b$ = growth/decay factor
• If $b > 1$: exponential growth
• If $0 < b < 1$: exponential decay
Alternative form:
$$y = ae^{kx}$$
where:
• $k > 0$: growth
• $k < 0$: decay
When to Use Exponential vs Linear:
Use Linear when:
• Constant rate of change
• Points follow straight line
• Add/subtract same amount each time
Use Exponential when:
• Rate of change increases/decreases
• Points follow curved pattern
• Multiply by same factor each time
• Data doubles, triples, or halves at regular intervals
Use Linear when:
• Constant rate of change
• Points follow straight line
• Add/subtract same amount each time
Use Exponential when:
• Rate of change increases/decreases
• Points follow curved pattern
• Multiply by same factor each time
• Data doubles, triples, or halves at regular intervals
Example 1: Identify exponential pattern
Data: (0, 5), (1, 10), (2, 20), (3, 40)
Check for pattern:
$\frac{10}{5} = 2$, $\frac{20}{10} = 2$, $\frac{40}{20} = 2$
Each y-value is double the previous → exponential!
Model: $y = 5(2)^x$
• Initial value: $a = 5$
• Growth factor: $b = 2$ (doubles each time)
Data: (0, 5), (1, 10), (2, 20), (3, 40)
Check for pattern:
$\frac{10}{5} = 2$, $\frac{20}{10} = 2$, $\frac{40}{20} = 2$
Each y-value is double the previous → exponential!
Model: $y = 5(2)^x$
• Initial value: $a = 5$
• Growth factor: $b = 2$ (doubles each time)
11. Correlation and Causation
Correlation: A statistical relationship between two variables
Causation: One variable directly causes changes in another
Key Principle: Correlation does NOT imply causation!
Causation: One variable directly causes changes in another
Key Principle: Correlation does NOT imply causation!
Important Distinctions:
Correlation means:
• Two variables are associated
• They change together
• You can predict one from the other
• Does NOT mean one causes the other
Causation means:
• One variable directly influences another
• Change in one causes change in the other
• There is a cause-and-effect relationship
• Much harder to prove than correlation
Correlation means:
• Two variables are associated
• They change together
• You can predict one from the other
• Does NOT mean one causes the other
Causation means:
• One variable directly influences another
• Change in one causes change in the other
• There is a cause-and-effect relationship
• Much harder to prove than correlation
Why Correlation ≠ Causation
Three Main Reasons:
1. Third Variable (Confounding Variable):
• A hidden variable affects both
• Example: Ice cream sales and drowning deaths
→ Both caused by hot weather (third variable)
2. Reverse Causation (Directionality Problem):
• Don't know which variable causes which
• Example: Depression and low vitamin D
→ Does depression cause low vitamin D, or vice versa?
3. Coincidence:
• Pure chance
• Example: Number of Nicolas Cage movies and swimming pool drownings
→ No real connection, just coincidence
1. Third Variable (Confounding Variable):
• A hidden variable affects both
• Example: Ice cream sales and drowning deaths
→ Both caused by hot weather (third variable)
2. Reverse Causation (Directionality Problem):
• Don't know which variable causes which
• Example: Depression and low vitamin D
→ Does depression cause low vitamin D, or vice versa?
3. Coincidence:
• Pure chance
• Example: Number of Nicolas Cage movies and swimming pool drownings
→ No real connection, just coincidence
Example 1: Correlation without causation
Observation: There is a strong positive correlation between shoe size and reading ability in children.
Does large feet cause better reading? NO!
Explanation:
• Third variable: AGE
• Older children have bigger feet
• Older children read better
• Age causes both variables to increase
Conclusion: Correlation exists, but no causal relationship between shoe size and reading ability.
Observation: There is a strong positive correlation between shoe size and reading ability in children.
Does large feet cause better reading? NO!
Explanation:
• Third variable: AGE
• Older children have bigger feet
• Older children read better
• Age causes both variables to increase
Conclusion: Correlation exists, but no causal relationship between shoe size and reading ability.
Example 2: Identify causation
Scenario A: Hours of exercise and calories burned
Analysis: Exercise directly causes calorie burning
Conclusion: Causation ✓
Scenario B: Coffee consumption and heart disease
Analysis: Many confounding variables (stress, sleep, diet)
Conclusion: Correlation, but causation unclear
Scenario C: Hours studied and test scores
Analysis: Studying directly improves knowledge
Conclusion: Strong evidence for causation ✓
Scenario A: Hours of exercise and calories burned
Analysis: Exercise directly causes calorie burning
Conclusion: Causation ✓
Scenario B: Coffee consumption and heart disease
Analysis: Many confounding variables (stress, sleep, diet)
Conclusion: Correlation, but causation unclear
Scenario C: Hours studied and test scores
Analysis: Studying directly improves knowledge
Conclusion: Strong evidence for causation ✓
Establishing Causation
Requirements for Causation:
1. Correlation exists: Variables must be related
2. Temporal precedence: Cause must come before effect
3. No alternative explanation: Rule out confounding variables
Gold Standard: Controlled experiment
• Random assignment
• Control group vs. experimental group
• Manipulate one variable, measure effect on other
• Control for confounding variables
1. Correlation exists: Variables must be related
2. Temporal precedence: Cause must come before effect
3. No alternative explanation: Rule out confounding variables
Gold Standard: Controlled experiment
• Random assignment
• Control group vs. experimental group
• Manipulate one variable, measure effect on other
• Control for confounding variables
Correlation Coefficient Guide
r Value | Strength | Direction | Description |
---|---|---|---|
$r = 1$ | Perfect | Positive | All points on line, upward slope |
$0.7 < r < 1$ | Strong | Positive | Points tightly clustered, upward trend |
$0.3 < r < 0.7$ | Moderate | Positive | Clear upward trend with scatter |
$0 < r < 0.3$ | Weak | Positive | Slight upward trend, much scatter |
$r = 0$ | None | None | No linear relationship |
$-0.3 < r < 0$ | Weak | Negative | Slight downward trend |
$-0.7 < r < -0.3$ | Moderate | Negative | Clear downward trend with scatter |
$-1 < r < -0.7$ | Strong | Negative | Points tightly clustered, downward trend |
$r = -1$ | Perfect | Negative | All points on line, downward slope |
Linear vs Exponential Models
Feature | Linear Model | Exponential Model |
---|---|---|
Equation | $y = mx + b$ | $y = ab^x$ |
Shape | Straight line | Curved (J-shape or decay) |
Rate of Change | Constant (same each time) | Increasing or decreasing |
Pattern | Add/subtract same amount | Multiply/divide by same factor |
Example | 2, 5, 8, 11, 14 (+3 each) | 2, 6, 18, 54 (×3 each) |
Real-world | Constant speed, hourly wage | Population growth, compound interest |
Correlation vs Causation
Aspect | Correlation | Causation |
---|---|---|
Definition | Variables are related | One variable causes another |
How to Find | Calculate r, observe pattern | Controlled experiment required |
Implication | Can predict, not explain | Change one affects the other |
Example | Shoe size and reading ability | Exercise and calories burned |
Caution | May have third variable | Hard to prove definitively |
Success Tips for Bivariate Statistics:
✓ Scatter plots show relationships between two quantitative variables
✓ Correlation coefficient r measures strength: closer to ±1 = stronger
✓ Positive r: both variables increase together; Negative r: one decreases as other increases
✓ Line of best fit equation: y = mx + b (slope + y-intercept)
✓ Slope tells rate of change; y-intercept is starting value
✓ Outliers don't fit the pattern and can affect correlation/regression
✓ r² shows percentage of variation explained (coefficient of determination)
✓ Residual = actual - predicted value
✓ Use exponential model when data multiplies by constant factor
✓ CORRELATION ≠ CAUSATION! Always consider third variables!
✓ Scatter plots show relationships between two quantitative variables
✓ Correlation coefficient r measures strength: closer to ±1 = stronger
✓ Positive r: both variables increase together; Negative r: one decreases as other increases
✓ Line of best fit equation: y = mx + b (slope + y-intercept)
✓ Slope tells rate of change; y-intercept is starting value
✓ Outliers don't fit the pattern and can affect correlation/regression
✓ r² shows percentage of variation explained (coefficient of determination)
✓ Residual = actual - predicted value
✓ Use exponential model when data multiplies by constant factor
✓ CORRELATION ≠ CAUSATION! Always consider third variables!