Measures of Statistical Dependence

In this guide, we will introduce one of the summary statistics: the measures of correlation (statistical dependence). We will also provide you with Python examples to illustrate how to apply this concept.

What are the Measures of Statistical Dependence (Correlation)

The measures of statistical dependence, also known as the measures of correlation, are the summary statistics used to evaluate the relationships between variables.

The 8 measures of statistical dependence used to evaluate the correlation between multiple variables are:

Join the Newsletter

    1. Covariance: How much two random variables change together
    2. Correlation Coefficient: Linear relationship of two continuous variables
    3. Spearman’s Rank Correlation: Strength/direction of the monotonic relationship between two variables.
    4. Kendall’s Tau (τ): Strength/direction of ordinal association between two variables.
    5. Point-Biserial Correlation: Relationship between a continuous and a binary variables
    6. Phi Coefficient (φ): Association between two binary variables.
    7. Contingency Tables / Chi-Square Tests: Association between two categorical variables
    8. Cramér’s V: Association for categorical variables based on chi-square statistics

    In this tutorial, we will focus on the most common measures: Covariance, Pearson’s Correlation Coefficient (linear correlation), Spearman’s Rank Correlation (monotonic correlation), and Kendall’s Tau (ordinal correlation).

    Covariance

    The covariance measures how much two variables change together.

    It shows how much a variation in one variable is associated with a variation in another.

    The downside of using the covariance in establishing the correlation is that it is sensitive to the scale of the variables.

    Calculate the Covariance

    To calculate the covariance, you need to get the average from each variable and subtract each value from the average, multiply the matrices and add the values together. Finally, divide the result by the number of values.

    The formula of the covariance is

    Cov(X, Y) = Σ(Xi-µ)(Yj-v) / n

    Calculate the Covariance in Python

    To calculate the covariance between two array variables in Python, use the cov() function from the numpy library.

    import numpy as np
    
    # Sample data
    x = np.array([1, 2, 3, 4, 5])
    y = np.array([2, 3, 4, 5, 6])
    
    # Calculate the covariance matrix
    np.cov(x, y)
    

    The function returns a 2×2 array (or covariance matrix) where diagonal values measure variability and off-diagonal values show relationships.

    • Positive means they tend to increase together
    • Negative means one goes up when the other goes down.

    Here are examples of different kinds of covariance matrices.

    import numpy as np
    
    # Sample data
    x = np.array([1, 2, 3, 4, 5])
    x2 = np.array([6, 5, 4, 3, 2])
    y = np.array([2, 3, 4, 5, 6])
    x3 = np.array([1, 2, 3, 4, 5])
    y3 = np.array([1, 2, 3, 2, 1])  # Example with no covariance
    
    # Calculate the covariance matrices
    cov_matrix = np.cov(x, y)
    cov_matrix2 = np.cov(x2, y)
    cov_matrix3 = np.cov(x3, y3)
    print('Positive variation:\n', cov_matrix)
    print('Negative variation:\n', cov_matrix2)
    print('No covariance:\n', cov_matrix3)
    
    

    And what it looks like on a graph

    Pearson’s Correlation Coefficient

    The Pearson’s r correlation coefficient quantifies the linear relationship between two continuous variables.

    The results ranges from -1 to 1:

    • Perfect negative correlation: -1
    • Perfect positive correlation: 1
    • No linear Correlation: 0

    Calculate Pearson’s R Coefficient in Python

    To calculate the Pearson’s R correlation coefficient, use the pearsonr function from scipy.stats library.

    import numpy as np
    from scipy.stats import pearsonr
    
    # Sample data
    x = np.array([1, 2, 3, 4, 5])
    y = np.array([2, 3, 4, 5, 6])
    
    # Calculate Pearson's correlation coefficient
    correlation_coefficient, _ = pearsonr(x, y)
    print("Pearson's Correlation Coefficient:", correlation_coefficient)
    
    

    The output here shows a perfect positive correlation where when 1 variable increases by one, the other increases by the same amount.

    Pearson's Correlation Coefficient: 1.0

    Plot Pearson’s R Correlation in Python

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.stats import pearsonr
    import seaborn as sns
    
    # Create data for scenarios
    np.random.seed(0)
    
    # Negative correlation
    x_neg = np.linspace(0, 10, 50)
    y_neg = -2 * x_neg + 10 + np.random.normal(0, 2, 50)
    
    # Positive correlation
    x_pos = np.linspace(0, 10, 50)
    y_pos = 2 * x_pos + np.random.normal(0, 2, 50)
    
    # No correlation
    x_no_corr = np.linspace(0, 10, 50)
    y_no_corr = np.random.normal(0, 2, 50)
    
    # Calculate Pearson correlation coefficients
    corr_coeff_neg, _ = pearsonr(x_neg, y_neg)
    corr_coeff_pos, _ = pearsonr(x_pos, y_pos)
    corr_coeff_no_corr, _ = pearsonr(x_no_corr, y_no_corr)
    
    # Create subplots
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Scatter plot 1 (Negative Correlation)
    sns.regplot(x=x_neg, y=y_neg, ax=axes[0], color='red', scatter_kws={'s': 15}, line_kws={'color': 'blue'}, ci=95)
    axes[0].set_xlabel('X')
    axes[0].set_ylabel('Y')
    axes[0].set_title(f"Negative Correlation (r = {corr_coeff_neg:.2f})")
    
    # Scatter plot 2 (Positive Correlation)
    sns.regplot(x=x_pos, y=y_pos, ax=axes[1], color='green', scatter_kws={'s': 15}, line_kws={'color': 'blue'}, ci=95)
    axes[1].set_xlabel('X')
    axes[1].set_ylabel('Y')
    axes[1].set_title(f"Positive Correlation (r = {corr_coeff_pos:.2f})")
    
    # Scatter plot 3 (No Correlation)
    sns.regplot(x=x_no_corr, y=y_no_corr, ax=axes[2], color='blue', scatter_kws={'s': 15}, line_kws={'color': 'blue'}, ci=95)
    axes[2].set_xlabel('X')
    axes[2].set_ylabel('Y')
    axes[2].set_title(f"No Correlation (r = {corr_coeff_no_corr:.2f})")
    
    # Adjust layout
    plt.tight_layout()
    
    # Show all plots
    plt.show()
    

    Spearman’s Rank Correlation (rho)

    The Spearman’s rank correlation, also known as Spearman’s rho, evaluates the strength and direction of the monotonic relationship between two variables.

    A monotonic relationship is a relationship between variables that happens when the value of variable increases or decreases when the other variable increases.

    Spearman’s rho check the ranks of the data instead of their actual values. This makes it less impacted by outliers and helps with ordinal data.

    Calculate Spearman’s Rank Correlation in Python

    To calculate the Spearman’s Rank Correlation, use the spearmanr function from scipy.stats library.

    from scipy.stats import spearmanr
    # Example data
    x = [10, 20, 30, 40, 50]
    y = [5, 15, 25, 35, 45]
    
    # Calculate Spearman's rank correlation
    rho, p_value = spearmanr(x, y)
    
    # Print the result
    print(f"Spearman's Rank Correlation Coefficient: {rho}")
    print(f"P-value: {p_value}")
    

    Interpret the Spearman’s Rank Correlation (rho) Result

    When interpreting the Spearman’s rho number, check this general guideline:

    • Positive rho: As one variable increases, the other tends to increase,
    • Negative rho: As one variable increases, the other tends to decrease.
    • Rho = 0: No monotonic relationship.

    Kendall’s Tau (τ)

    In statistics, Kendall’s Tau (τ) measures the strength and direction of the ordinal association between two variables.

    Calculate Kendall’s Tau (τ) in Python

    To calculate the Spearman’s Rank Correlation, use the kendalltau function from scipy.stats library.

    import numpy as np
    from scipy.stats import kendalltau
    
    # Sample data
    x = np.array([1, 2, 3, 4, 5])
    y = np.array([2, 3, 1, 5, 4])
    
    # Calculate Kendall's Tau
    tau, p_value = kendalltau(x, y)
    
    print(f"Kendall's Tau (τ): {tau:.2f}")
    print(f"P-value: {p_value:.4f}")
    
    

    Interpret the Kendall’s Tau (τ) Result

    When interpreting the Kendall’s Tau (τ) number, check this general guideline:

    • τ is close to 1: Strong positive correlation
    • τ is close to -1: Strong negative correlation
    • τ is close to 0: No correlation

    Choose the Right Correlation Metrics (CV, R, Rho or Tau)

    Refer to this table to evaluate which correlation algorithm to choose to evaluate the relationship between variables.

    Correlation MeasureBest for Data TypeRobust to OutliersType of Relationship
    CovarianceInterval Data, Ratio DataNoLinear
    Pearson’s Correlation Coefficient (r)Interval Data, Ratio DataNoLinear
    Spearman’s Rank Correlation (ρ)Ordinal Data, Interval DataYesMonotonic
    Kendall’s Tau (τ)Ordinal Data, Data with Tied RanksYesConcordance or Discordance
    Enjoyed This Post?