Basic Math

Bivariate statistics | Twelfth Grade

Bivariate Statistics

Complete Notes & Formulae for Twelfth Grade (Precalculus)

1. Scatter Plots

Definition:

A scatter plot is a graph that shows the relationship between two quantitative variables

Independent variable (x): Plotted on horizontal axis

Dependent variable (y): Plotted on vertical axis

• Each point represents one observation (x, y)

Outliers in Scatter Plots:

An outlier is a point that lies far away from the general pattern of the data

• Outliers can significantly affect correlation and regression

• Always check for outliers before analysis

• Outliers may indicate measurement errors or special cases

2. Correlation Coefficient (r)

Definition:

The correlation coefficient (r) measures the strength and direction of a linear relationship between two variables

\[ r = \frac{\sum(x - \bar{x})(y - \bar{y})}{\sqrt{\sum(x - \bar{x})^2 \sum(y - \bar{y})^2}} \]

\[ \text{or} \quad r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} \]

Properties of r:

• Range: \( -1 \leq r \leq 1 \)

• \( r = 1 \): Perfect positive linear correlation

• \( r = -1 \): Perfect negative linear correlation

• \( r = 0 \): No linear correlation

• \( 0 < r < 1 \): Positive correlation (both increase together)

• \( -1 < r < 0 \): Negative correlation (one increases, other decreases)

Strength Interpretation:

|r| ValueStrength
0.0 - 0.3Weak
0.3 - 0.7Moderate
0.7 - 1.0Strong

3. Linear Regression Line

Equation:

The regression line (line of best fit) has the equation:

\[ \hat{y} = a + bx \]

where:

• \( \hat{y} \) = predicted value of y

• \( a \) = y-intercept (value of y when x = 0)

• \( b \) = slope (change in y per unit change in x)

• \( x \) = independent variable

Finding Slope and Intercept:

Slope:

\[ b = \frac{n\sum xy - (\sum x)(\sum y)}{n\sum x^2 - (\sum x)^2} \]

Or using correlation:

\[ b = r \frac{s_y}{s_x} \]

Y-Intercept:

\[ a = \bar{y} - b\bar{x} \]

where:

• \( s_x, s_y \) = standard deviations of x and y

• \( \bar{x}, \bar{y} \) = means of x and y

4. Interpret Regression Lines

Slope Interpretation:

The slope tells us how much y changes for each 1-unit increase in x

Example:

If \( \hat{y} = 50 + 3x \) where y = test score, x = study hours

Interpretation: For each additional hour of study, test score increases by 3 points

Y-intercept: Without studying (x=0), predicted score is 50

Making Predictions:

Substitute the x-value into the regression equation to predict y

⚠️ Caution: Only predict within the range of x-values in your data (interpolation)

Predicting outside the data range (extrapolation) can be unreliable

5. Coefficient of Determination (r²)

Definition:

r² measures the proportion of variance in y that is explained by x

\[ r^2 = (\text{correlation coefficient})^2 \]

• Range: \( 0 \leq r^2 \leq 1 \)

• \( r^2 = 0.75 \) means 75% of variation in y is explained by x

• Higher r² indicates better fit of regression line to data

6. Residuals

Definition:

A residual is the difference between observed and predicted values

\[ \text{Residual} = y - \hat{y} \]

• Positive residual: Actual value is above the regression line

• Negative residual: Actual value is below the regression line

• Sum of residuals always equals zero for least-squares line

Residual Plots:

Plot residuals vs. x-values to check if linear model is appropriate

• Random scatter: Linear model is appropriate

• Pattern present: Linear model may not be appropriate

7. Exponential Regression

Model:

When data shows exponential growth or decay, use exponential regression

\[ y = ab^x \]

where:

• \( a \) = initial value (when x = 0)

• \( b \) = growth/decay factor

• If \( b > 1 \): exponential growth

• If \( 0 < b < 1 \): exponential decay

Finding the Model:

Method: Transform data using logarithms, then perform linear regression

Step 1: Take ln of both sides

\[ \ln(y) = \ln(a) + x \cdot \ln(b) \]

Step 2: This is linear in form: Y = C + mX

where Y = ln(y), C = ln(a), m = ln(b)

Step 3: Find a and b

\[ a = e^C, \quad b = e^m \]

8. Choosing the Right Model

Model Selection:

Pattern in Scatter PlotModel to Use
Straight line patternLinear: \( y = a + bx \)
J-shaped curve (rapid growth)Exponential: \( y = ab^x \)
U-shaped curveQuadratic: \( y = ax^2 + bx + c \)
No clear patternNo relationship

9. Quick Reference Summary

Key Formulas:

Correlation: \( r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} \)

Linear Regression: \( \hat{y} = a + bx \)

Slope: \( b = r\frac{s_y}{s_x} \)

Intercept: \( a = \bar{y} - b\bar{x} \)

Residual: \( y - \hat{y} \)

Exponential: \( y = ab^x \)

📚 Study Tips

✓ Correlation measures strength and direction of linear relationship

✓ Correlation does NOT imply causation

✓ Regression line always passes through (\(\bar{x}\), \(\bar{y}\))

✓ Check residual plots to verify linear model is appropriate

✓ Use exponential regression when data shows rapid growth or decay

Shares: