measures-of-central-tendency-in-summary-statistics

In this guide, we will introduce one of the summary statistics: the measures of central tendency. We will also provide you with Python examples to illustrate how to apply this concept.

What are Measures of Central Tendency?

In statistics and data science, measures of central tendency are used to summarize the data by finding where the center of the data is. The 3 measures of center are the mean the median and the mode.

  • Mean: Average value of a dataset.
  • Median: Middle value in a dataset.
  • Mode: Most frequently occurring value in a dataset.
Measure of central tendency
Measure of central tendency

What is the Mean in Statistics

In statistics, the mean, also known as the average, is a measure of central tendency.


Subscribe to my Newsletter


The mean is calculated by summing up all the values in a dataset and then dividing that sum by the total number of values. The formula for calculating the mean (μ) of a dataset is:

Mean (μ) =  Sum of all values / Number of values (n) 

For example, if we have 3 people aged 5, 7 and 8 years old, then the mean is

Mean (μ) = (5 + 7 + 8) / 3  ~=  6.67

And can be calculated in Python using np.mean of the numpy library.

import numpy as np 

np.mean([5,7,8])
# 6.666666666666667

What is the Median in Statistics

In statistics, the median is a measure of central tendency where 50% of the data is lower than it an 50% of the data is higher.

The median is calculated by sorting all the values in a dataset and then selecting the middle one.

In Python, the median can be calculated using np.median of the numpy library.

import numpy as np 

np.median([1,2,3,4,5,6,7])
# 4.0

What is the Mode in Statistics

In statistics, the mode is a measure of central tendency where the most frequently occurring value in a dataset.

When we use the value_counts() method on a Pandas DataFrame, we are seeing the occurrences of values sorted by most frequent. The top value is the mode.

The mode can be calculated in Python using the scipy.stats.mode() or the statistics.mode() functions.

from scipy import stats
import statistics

data = [1,2,2,3,4,5,5,5]

print(stats.mode(data).mode)
print(statistics.mode(data))

The mode is often used on categorical variables since they are often unordered and generally don’t have a numeric representation.

Mean, Mode or Median? How to Choose the Right Measure of Central Tendency

Choosing the right measure of central tendency (mean, mode, or median) depends on your data and the information you want to draw from it. While the mean is the most often used measure, it may not be the best, follow these quick guidelines to understand which measure to choose:

  • Mean: More sensitive to outliers. Better for symmetrical data (normally distributed).
  • Median: Less sensitive to outliers. Better for non-symmetrical (skewed) data.
  • Mode: More appropriate for categorical data
Enjoyed This Post?