Unit 6: Business Management Toolkit

BMT 7 - Descriptive Statistics

Understanding and Analyzing Numerical Data in Business

1. What are Descriptive Statistics?

Descriptive statistics are numerical measures and visual methods used to summarize, organize, and describe characteristics of a dataset. They help transform raw data into meaningful information that can inform business decisions.

Purpose:

Summarize large amounts of data into understandable measures
Identify patterns, trends, and relationships
Compare different datasets
Support evidence-based decision-making
Communicate findings clearly to stakeholders

Two main categories of descriptive statistics:

Measures of Central Tendency: Values representing the center or typical value (mean, median, mode)
Measures of Dispersion/Spread: Values showing how data is distributed (range, quartiles, interquartile range, standard deviation)

2. Measures of Central Tendency

Measures of central tendency identify the center point or typical value in a dataset. They answer the question: "What is the average or most common value?"

Mean (Arithmetic Average)

The mean is the sum of all values divided by the number of values. It's the most commonly used measure of central tendency.

Formula: Mean ($\bar{x}$)

\[ \bar{x} = \frac{\sum x}{n} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} \]

Where:

$\bar{x}$ = mean (read as "x-bar")
$\sum x$ = sum of all values
$n$ = number of values

Example: Calculating Mean

Scenario: A company tracks daily sales for one week:

Daily sales: $500, $600, $450, $700, $550, $650, $600

Calculate mean daily sales:

\[ \bar{x} = \frac{500 + 600 + 450 + 700 + 550 + 650 + 600}{7} = \frac{4,050}{7} = \$578.57 \]

Interpretation: Average daily sales for the week is $578.57

Advantages of Mean

Uses all data: Takes every value into account
Widely understood: Most familiar measure to general audience
Mathematical properties: Can be used in further calculations
Unique value: Only one mean for any dataset
Useful for comparison: Easy to compare means across datasets

Disadvantages of Mean

Affected by outliers: Extreme values distort the mean significantly
May not represent actual values: Can be a decimal when data is whole numbers
Can be misleading: Doesn't show data distribution or spread
Not suitable for skewed data: Pulled toward extreme values

Median

The median is the middle value when data is arranged in order. It divides the dataset into two equal halves—50% of values are below it, 50% are above it.

How to Find the Median

Step 1: Arrange data in ascending order (smallest to largest)

Step 2: Determine if $n$ (number of values) is odd or even

If $n$ is odd:

\[ \text{Median} = \text{Value at position } \frac{n+1}{2} \]

If $n$ is even:

\[ \text{Median} = \frac{\text{Value at position } \frac{n}{2} + \text{Value at position } \frac{n}{2}+1}{2} \]

Example 1: Median with Odd Number of Values

Employee ages: 25, 30, 22, 35, 28

Step 1: Order the data: 22, 25, 28, 30, 35

Step 2: $n = 5$ (odd), so median is at position $\frac{5+1}{2} = 3$

Median: 28 years (the 3rd value)

Example 2: Median with Even Number of Values

Test scores: 85, 90, 78, 92, 88, 75

Step 1: Order the data: 75, 78, 85, 88, 90, 92

Step 2: $n = 6$ (even), so median is average of positions 3 and 4

\[ \text{Median} = \frac{85 + 88}{2} = \frac{173}{2} = 86.5 \]

Median: 86.5 points

Advantages of Median

Not affected by outliers: Extreme values don't distort the median
Better for skewed data: More representative when data is not symmetrical
Easy to understand: Simple concept of "middle value"
Represents actual value: Often corresponds to an actual data point

Disadvantages of Median

Ignores extreme values: Doesn't account for very high or low values
Requires ordering: Data must be sorted first
Less mathematical: Cannot be used in some statistical calculations
Multiple possible values: With even numbers, calculated value may not exist in dataset

Mode

The mode is the value that appears most frequently in the dataset. A dataset can have one mode (unimodal), two modes (bimodal), many modes (multimodal), or no mode (all values appear once).

Example: Finding the Mode

Shoe sizes sold in one day: 7, 8, 9, 8, 10, 8, 7, 9, 8, 11

Count frequencies:

• Size 7: appears 2 times
• Size 8: appears 4 times
• Size 9: appears 2 times
• Size 10: appears 1 time
• Size 11: appears 1 time

Mode: Size 8 (most frequent)

Advantages of Mode

Easy to identify: Simple to find by counting
Always an actual value: Represents real data point
Useful for categorical data: Can be used with non-numerical data (e.g., most popular color)
Shows most common occurrence: Useful for inventory and planning

Disadvantages of Mode

May not exist: No mode if all values appear equally
May have multiple modes: Can be ambiguous with bimodal or multimodal data
Not always central: May be at extreme of dataset
Limited usefulness: Less informative for some business decisions

Comparison: Mean, Median, and Mode

Aspect	Mean	Median	Mode
Definition	Average of all values	Middle value	Most frequent value
Calculation	Sum ÷ count	Position-based	Frequency count
Uses all data	Yes	No (only position)	No (only frequency)
Affected by outliers	Yes, significantly	No	No
Best for	Symmetrical data	Skewed data	Categorical data
Uniqueness	Always unique	Always unique	May have multiple or none
Business use	Average sales, revenue	Typical salary, prices	Popular product sizes

3. Measures of Dispersion (Spread)

Measures of dispersion describe how spread out or varied the data is. Two datasets can have the same mean but very different spreads. These measures answer: "How consistent or varied is the data?"

Range

The range is the difference between the highest and lowest values in the dataset. It shows the total spread of data.

Formula: Range

\[ \text{Range} = \text{Maximum Value} - \text{Minimum Value} \]

Example: Calculating Range

Monthly profits ($000): 45, 52, 38, 61, 48, 55, 42

Maximum value: $61,000

Minimum value: $38,000

Calculate range:

\[ \text{Range} = 61 - 38 = \$23,000 \]

Interpretation: Profit varies by $23,000 across the period

Advantages of Range

Simple to calculate: Quick and easy measure
Easy to understand: Clear meaning (total spread)
Shows data span: Indicates overall variability

Disadvantages of Range

Affected by outliers: One extreme value distorts range significantly
Ignores distribution: Doesn't show how data is spread between extremes
Uses only two values: Doesn't consider all data points
Can be misleading: Large range doesn't necessarily mean inconsistent data

Quartiles and Interquartile Range (IQR)

Quartiles divide ordered data into four equal parts. The interquartile range (IQR) measures the spread of the middle 50% of data, making it resistant to outliers.

Quartile Positions

Q1 (First Quartile/Lower Quartile): 25% of data below this point
Q2 (Second Quartile): Same as median, 50% of data below
Q3 (Third Quartile/Upper Quartile): 75% of data below this point

Interquartile Range Formula:

\[ \text{IQR} = Q_3 - Q_1 \]

What it shows: The range containing the middle 50% of data

Example: Calculating Quartiles and IQR

Employee ages: 22, 25, 27, 30, 32, 35, 38, 40, 42, 45, 48

Already ordered, $n = 11$

Find Q2 (Median):

Position: $\frac{11+1}{2} = 6$ → Q2 = 35

Find Q1 (Median of lower half):

Lower half: 22, 25, 27, 30, 32

Position 3 → Q1 = 27

Find Q3 (Median of upper half):

Upper half: 38, 40, 42, 45, 48

Position 3 → Q3 = 42

Calculate IQR:

\[ \text{IQR} = 42 - 27 = 15 \text{ years} \]

Interpretation: The middle 50% of employees have ages spanning 15 years

Advantages of IQR

Not affected by outliers: Focuses on middle 50% of data
Better than range: More representative of typical spread
Useful for comparisons: Compare variability across datasets
Identifies outliers: Values beyond 1.5 × IQR from quartiles are outliers

Standard Deviation

Standard deviation measures the average distance of each data point from the mean. It's the most commonly used measure of dispersion in business statistics.

Low standard deviation: Data points are close to the mean (consistent)

High standard deviation: Data points are spread out from the mean (variable)

Formula: Standard Deviation ($\sigma$ or $s$)

For a population:

\[ \sigma = \sqrt{\frac{\sum (x - \bar{x})^2}{n}} \]

For a sample:

\[ s = \sqrt{\frac{\sum (x - \bar{x})^2}{n-1}} \]

Where:

$x$ = individual value
$\bar{x}$ = mean
$n$ = number of values
$(x - \bar{x})$ = deviation from mean

Example: Calculating Standard Deviation (Simplified)

Daily customers: 50, 55, 52, 58, 60

Step 1: Calculate mean

\[ \bar{x} = \frac{50 + 55 + 52 + 58 + 60}{5} = \frac{275}{5} = 55 \]

Step 2: Calculate deviations from mean

Value ($x$)	Deviation ($x - \bar{x}$)	Squared ($(x - \bar{x})^2$)
50	-5	25
55	0	0
52	-3	9
58	3	9
60	5	25
Sum:		68

Step 3: Calculate standard deviation

\[ s = \sqrt{\frac{68}{5-1}} = \sqrt{\frac{68}{4}} = \sqrt{17} \approx 4.12 \text{ customers} \]

Interpretation: Daily customer numbers typically vary by about 4 customers from the average

Advantages of Standard Deviation

Uses all data: Every value contributes to calculation
Mathematically sound: Can be used in advanced statistical analysis
Widely recognized: Standard measure in business and research
Shows typical deviation: Indicates how much data typically varies

Disadvantages of Standard Deviation

Complex calculation: More difficult to compute manually
Affected by outliers: Extreme values increase standard deviation
Hard to interpret: Squared units less intuitive than range
Requires mean: Mean must be calculated first

4. Business Applications of Descriptive Statistics

Descriptive statistics are used across all business functions:

Finance and Accounting

Average revenue: Mean monthly sales
Median salary: Typical employee compensation
Range of costs: Variability in expenses
Standard deviation of returns: Investment risk measurement

Marketing

Average customer age: Target demographic
Mode of purchase method: Most popular payment type
Median order value: Typical transaction size
Range of customer ratings: Satisfaction variability

Operations

Mean production time: Average efficiency
Standard deviation of delivery times: Reliability measurement
Mode of defect types: Most common quality issue
IQR of inventory levels: Typical stock range

Human Resources

Mean employee tenure: Average length of employment
Median performance score: Typical employee rating
Range of salaries: Pay scale spread
Standard deviation of absenteeism: Attendance consistency

5. Choosing the Right Measure

Decision Guide: Central Tendency

Use Mean when:

Data is symmetrically distributed (no extreme outliers)
Need to use value in further calculations
Want to account for all values

Use Median when:

Data contains outliers or extreme values
Data is skewed (not symmetrical)
Want true "middle" value (like median salary)

Use Mode when:

Data is categorical (non-numerical)
Want to know most common occurrence
Useful for inventory (most popular size/color)

Decision Guide: Dispersion

Use Range when:

Need quick, simple measure
Want to show total spread
Data has no extreme outliers

Use IQR when:

Data contains outliers
Want measure of typical spread
Comparing variability across datasets

Use Standard Deviation when:

Need precise measure of variability
Data is normally distributed
Want to compare consistency across groups

6. Limitations of Descriptive Statistics

Loss of detail: Summary measures hide individual data points
No causation: Describe patterns but don't explain causes
Context needed: Numbers alone may be meaningless without context
Can be misleading: Same mean/median can represent very different distributions
Outlier sensitivity: Some measures distorted by extreme values
No prediction: Descriptive statistics don't forecast future
Oversimplification: Complex reality reduced to single numbers

7. IB Business Management Exam Tips

Key Formulas to Remember

Mean: $\bar{x} = \frac{\sum x}{n}$
Median: Middle value (or average of two middle values)
Mode: Most frequent value
Range: Maximum - Minimum
IQR: $Q_3 - Q_1$
Standard Deviation: $s = \sqrt{\frac{\sum (x - \bar{x})^2}{n-1}}$

Common Exam Questions

"Calculate the mean of the following data" (2-3 marks)
"Find the median and explain what it shows" (4 marks)
"Calculate the range and interquartile range" (4 marks)
"Explain why median is more appropriate than mean for this data" (4 marks)
"Analyse the usefulness of standard deviation for comparing two datasets" (6 marks)
"Evaluate the importance of descriptive statistics for business decision-making" (10 marks)

Calculation Tips

Show all working: Write out formulas and steps clearly
Use correct symbols: $\bar{x}$ for mean, $\sigma$ or $s$ for standard deviation
Include units: Don't forget currency, time units, etc.
Round appropriately: Usually 2 decimal places unless specified
Check reasonableness: Does your answer make sense in context?
For median: Remember to order data first!
Interpret results: Explain what the number means in business context

Common Mistakes to Avoid

Forgetting to order data: Must order before finding median/quartiles
Using mean with outliers: Median is better for skewed data
Confusing formulas: Know which formula for population vs. sample
No interpretation: Always explain what the statistic means
Ignoring context: Connect statistics to business situation
Calculation errors: Double-check arithmetic

✓ BMT 7 Summary: Descriptive Statistics

You should now understand that descriptive statistics summarize and organize data through measures of central tendency and dispersion. Measures of central tendency identify typical values: mean (arithmetic average = sum ÷ count, uses all data but affected by outliers), median (middle value when ordered, resistant to outliers, better for skewed data), and mode (most frequent value, useful for categorical data). Measures of dispersion show data spread: range (maximum - minimum, simple but affected by outliers), interquartile range (Q3 - Q1, middle 50% spread, resistant to outliers), and standard deviation (average distance from mean, uses all data but complex calculation). Choice of measure depends on data characteristics—use mean for symmetrical data, median for skewed data with outliers, mode for categorical data, and standard deviation for precise variability measurement. Descriptive statistics are essential across business functions (finance, marketing, operations, HR) for understanding patterns, making comparisons, and supporting decisions. However, they have limitations including loss of detail, no explanation of causation, and potential for misleading interpretations without proper context. For IB exams, show all calculations clearly, interpret results in business context, choose appropriate measures for data characteristics, and explain your reasoning.

Value (\(x\))	Deviation (\(x - \bar{x}\))	Squared (\((x - \bar{x})^2\))
50	-5	25
55	0	0
52	-3	9
58	3	9
60	5	25
Sum:		68