Measures of Central Tendency in Summary Statistics (Python Examples)

In this guide, we will introduce one of the summary statistics: the measures of central tendency. We will also provide you with Python examples to illustrate how to apply this concept.

What are Measures of Central Tendency?

In statistics and data science, measures of central tendency are used to summarize the data by finding where the center of the data is. The 3 measures of center are the mean the median and the mode.

• Mean: Average value of a dataset.
• Median: Middle value in a dataset.
• Mode: Most frequently occurring value in a dataset.

What is the Mean in Statistics

In statistics, the mean, also known as the average, is a measure of central tendency.

The mean is calculated by summing up all the values in a dataset and then dividing that sum by the total number of values. The formula for calculating the mean (μ) of a dataset is:

``Mean (μ) =  Sum of all values / Number of values (n) ``

For example, if we have 3 people aged 5, 7 and 8 years old, then the mean is

``Mean (μ) = (5 + 7 + 8) / 3  ~=  6.67``

And can be calculated in Python using `np.mean` of the numpy library.

```import numpy as np

np.mean([5,7,8])
# 6.666666666666667
```

What is the Median in Statistics

In statistics, the median is a measure of central tendency where 50% of the data is lower than it an 50% of the data is higher.

The median is calculated by sorting all the values in a dataset and then selecting the middle one.

In Python, the median can be calculated using `np.median` of the numpy library.

```import numpy as np

np.median([1,2,3,4,5,6,7])
# 4.0
```

What is the Mode in Statistics

In statistics, the mode is a measure of central tendency where the most frequently occurring value in a dataset.

When we use the value_counts() method on a Pandas DataFrame, we are seeing the occurrences of values sorted by most frequent. The top value is the mode.

The mode can be calculated in Python using the `scipy.stats.mode()` or the `statistics.mode()` functions.

```from scipy import stats
import statistics

data = [1,2,2,3,4,5,5,5]

print(stats.mode(data).mode)
print(statistics.mode(data))
```

The mode is often used on categorical variables since they are often unordered and generally don’t have a numeric representation.

Mean, Mode or Median? How to Choose the Right Measure of Central Tendency

Choosing the right measure of central tendency (mean, mode, or median) depends on your data and the information you want to draw from it. While the mean is the most often used measure, it may not be the best, follow these quick guidelines to understand which measure to choose:

• Mean: More sensitive to outliers. Better for symmetrical data (normally distributed).
• Median: Less sensitive to outliers. Better for non-symmetrical (skewed) data.
• Mode: More appropriate for categorical data
Enjoyed This Post?