Comprehensive Guide to Scatter Plots
Table of Contents
Introduction to Scatter Plots
A scatter plot (also called a scatter diagram or scattergram) is a type of plot that shows the relationship between two numerical variables. Each point represents an individual data item with its position determined by the values of the two variables.
Key Characteristics:
- Shows relationships between two quantitative variables
- Each point represents a single observation
- Typically displays correlation, not causation
- Helps identify patterns, trends, and outliers
- X-axis typically represents the independent variable
- Y-axis typically represents the dependent variable
Basic scatter plot showing positive correlation
Types of Scatter Plots
1. Positive Correlation
When one variable increases as the other variable increases, forming an upward trend. Examples include:
- Height vs. Weight
- Study Time vs. Test Scores
- Income vs. Spending
2. Negative Correlation
When one variable increases as the other variable decreases, forming a downward trend. Examples include:
- Price vs. Demand
- Age vs. Physical Reaction Time
- Distance from City Center vs. Property Size
3. No Correlation
When there is no apparent relationship between the variables. Examples include:
- Shoe Size vs. Intelligence
- Hair Color vs. Mathematical Ability
- Month of Birth vs. Career Success
4. Non-Linear Relationship
When variables show a pattern that is not a straight line. Examples include:
- Age vs. Physical Performance (inverted U-shape)
- Dosage vs. Drug Effect (quadratic)
- Learning Time vs. Skill Level (logarithmic)
5. Clustered Data
When data points form distinct groups or clusters. Examples include:
- Customer Segments
- Species Characteristics
- Regional Economic Data
Creating and Reading Scatter Plots
How to Create a Scatter Plot
- Collect paired data - Each point requires values for both variables
- Set up coordinate axes - X-axis (horizontal) for independent variable, Y-axis (vertical) for dependent variable
- Scale the axes - Choose appropriate scales to capture the full range of data
- Plot the points - Place each data point at its (x,y) coordinate
- Label the chart - Add title, axis labels, units, and legend if needed
Example Data: Ice Cream Sales vs. Temperature
Temperature (°C) | Ice Cream Sales ($) |
---|---|
14 | 215 |
16 | 325 |
19 | 332 |
22 | 406 |
25 | 522 |
28 | 612 |
31 | 644 |
How to Read and Interpret Scatter Plots
Key Elements to Analyze:
- Direction - Positive, negative, or no relationship
- Form - Linear, curved, or clustered
- Strength - How closely points follow a pattern
- Outliers - Points that deviate from the pattern
Pattern Interpretation:
- Tight cluster around a line = Strong correlation
- Scattered points = Weak correlation
- S-shaped = Complex relationship
- Separate clusters = Different groups in data
Important Note:
Correlation does not imply causation! Two variables may be related without one causing the other. Always consider external factors and potential confounding variables.
Strong Positive Correlation
Weak Positive Correlation
Strong Negative Correlation
No Correlation
Correlation and Regression
Correlation Coefficient (r)
The correlation coefficient (r) is a numerical measure of the strength and direction of the linear relationship between two variables.
- r = +1: Perfect positive correlation
- r = 0: No correlation
- r = -1: Perfect negative correlation
- 0 < |r| < 0.3: Weak correlation
- 0.3 ≤ |r| < 0.7: Moderate correlation
- 0.7 ≤ |r| < 1: Strong correlation
Pearson Correlation Formula:
r = Σ[(xi - x̄)(yi - ȳ)] / √[Σ(xi - x̄)² · Σ(yi - ȳ)²]
Where x̄ and ȳ are the means of the x and y variables
r = +1
r = +0.8
r = +0.5
r = +0.2
r = 0
Linear Regression
Linear regression finds the line of best fit through the data points, allowing us to predict values and describe the relationship mathematically.
The Regression Line Equation:
y = mx + b
Where: m = slope, b = y-intercept
Steps to Calculate Linear Regression:
- Calculate the means of x and y values
- Calculate the slope (m):
m = Σ[(xi - x̄)(yi - ȳ)] / Σ(xi - x̄)²
- Calculate the y-intercept (b):
b = ȳ - m·x̄
- Construct the equation and draw the line
Example: Predicting with Regression
For the ice cream sales data above, if we calculate regression:
Sales = 25.18 × Temperature - 138.77
This means:
- For each 1°C increase in temperature, ice cream sales increase by about $25.18
- At 0°C, we would expect sales of -$138.77 (not realistic, shows limitation of the model at extremes)
- We can predict sales for any temperature, e.g., at 27°C: Sales = 25.18 × 27 - 138.77 = $541.09
Interactive Examples
Create Your Own Scatter Plot
Click on the canvas below to add data points. The correlation and regression line will update automatically.
Correlation (r):
Regression Line:
Outlier Effect Demonstration
This example shows how a single outlier can dramatically affect the correlation and regression line.
Without Outlier
r = 0.91, y = 1.8x + 10.2
With Outlier
r = 0.42, y = 0.7x + 37.5
Key Insights:
- Outliers can significantly change correlation coefficients
- Regression lines can be heavily influenced by extreme points
- Always check for and investigate outliers before drawing conclusions
- Consider whether outliers represent errors or meaningful anomalies
Test Your Knowledge: Quiz
Question 1:
Which of the following scatter plots shows a strong positive correlation?
Question 2:
A correlation coefficient of r = -0.9 indicates:
Question 3:
For the scatter plot below, which of the following is the most appropriate regression line?
Question 4:
Which of the following real-world relationships would likely show a negative correlation?
Question 5:
What does the y-intercept (b) in a regression equation y = mx + b represent?