April 14, 2024

Bivariate statistics

Bivariate statistics are about relationships between two different variables. You can plot your individual pairs of measurements as (x, y) coordinates on a scatter diagram. Analysing bivariate data allows you to assess the relationship between the two measured variables; we describe this relationship as correlation.

Scatter diagrams

Perfect positive correlation

r = 1

Bivariate statistics

No correlation

r = 0

Bivariate statistics

Weak negative correlation

−1 < r < 0

Bivariate statistics
Through statistical methods, we can predict a mathematical model that would best describe the relationship between the two measured variables; this is called regression. In your exam you will be expected to find linear regression models using your GDC.

7.4.1 Regression line

The regression line is a linear mathematical model describing the relationship between the two measured variables. This can be used to find an estimated value for points for which we do not have actual data. It is possible to have two different types of regression lines: y on x (equation y = ax + b), which can estimate y given value x, and x on y (equation x = y c + d ), which can estimate x given value y . If the correlation between the data is perfect, then the two regression lines will be the same.

However one has to be careful when extrapolating (going further than the actual data points) as it is open to greater uncertainty. In general, it is safe to say that you should not use your regression line to estimate values outside the range of the data set you based it on.

7.4.2 Pearson’s correlation coefficient  (−1 ≤ r ≤ 1)

Besides simply estimating the correlation between two variables from a scatter diagram, you can calculate a value that will describe it in a standardised way. This value is referred to as Pearson’s correlation coefficient (r).

r = 0 means no correlation.

r ± 1 means a perfect positive/negative correlation.

Interpretation of r -values:

Bivariate statistics
Note: Remember that correlation ≠ causation.

Calculate r while finding the regression equation on your GDC. Make sure that STAT DIAGNOSTICS is turned ON (can be found in the MODE settings), otherwise the r − value will not appear.

When asked to “comment on” an r − value make sure to include both, whether the correlation is:

  1. positive / negative and
  2. strong / moderate / weak / very weak

Bivariate-statistics type questions

The height of a plant was measured the first 8 weeks

Bivariate statistics
  1. Plot a scatter diagram
Bivariate statistics
The line of best fit should pass through the mean point.

2. Use the mean point to draw a best fit line

Bivariate statistics

3. Find the equation of the regression line Using GDC

y = 1.83x + 22.7
Bivariate statistics
Bivariate statistics
4. Comment on the result.

Pearson’s correlation is r = 0.986, which is a strong positive correlation.