### Definitions

**Population** the entire group from which statistical data is drawn (and which the statistics obtained represent).

**Sample** the observations actually selected from the population for a statistical test.

**Random Sample** a sample that is selected from the population with no bias or criteria; the observations are made at random.

**Discrete** finite or countable number of possible values (e.g. money, number of people)

**Continuous** infinite amount of increments (e.g. time, weight)

*Note: continuous data can be presented as discrete data, e.g. if you round time to the nearest minute or weight to the nearest kilogram.*

### 7.1. Descriptive statistics

For 1 variable data with frequency use 1-Var Stats on GDC.

**Mean** the average value

**Mode** the value that occurs most often

**Median** when the data set is ordered low to high and the number of data points is:

- odd, then the median is the middle value;
- even, then the median is the average of the two middle values.

**Range** largest x-value−smallest x-value

**Variance**

**Standard deviation**

**Grouped data** data presented as an interval. Use the midpoint as the x-value in all calculations.

Q_{1 } first quartile = 25th percentile

Q_{2 } median = 50th percentile

Q_{3} third quartile = 75th percentile

Q_{3} – Q_{1 } interquartile range (IQR) = middle 50 percent

### 7.3. Statistical graphs

**Frequency** the number of times an event occurs in an experiment

**Cumulative frequency** the sum of the frequency for a particular class and the frequencies for all the classes below it

**Histogram**

**Cumulative frequency**

**Box and whisker plot**

## 7.4. Bivariate statistics

For analysis of data with two variables.

On GDC use LinReg(ax+b).

**Scatter diagrams**

**Perfect positive**

**No correlation**

**Weak negative**

**Regression Line** (r = ax + b) Can be used to interpolate unknown data.

**Interpretation of r -values**

The correlation between the two sets of data. Can be positive or negative.