18 Pandas Functions to Replace Excel with Python (and be happy forever)

Share this post

This tutorial is a simple introduction to Pandas Python library.

What is Pandas in Python?

Pandas is one of the most commonPython libraries used in data science, data analysis and SEO. It is an open-source package built on top of NumPy which allow the manipulation of multi-dimensional arrays.

Pandas is Excel on steroids

Hamlet Batista

How Pandas Can Replace Excel?

Excel is the worst. If you want to live a long and healthy life, drop it. Pandas is a much better alternative to Excel, especially if you work with large sets of data.


Subscribe to my Newsletter


Excel does nothing that Python can’t do better. Let’s learn a little bit about Python Pandas.

If you drop Excel for Python (and I mean, never use Excel EVER), I guarantee you that you will never regret it.

It can be used for:

  • Data validation
  • Statistical analysis
  • Data visualization
  • Report
  • Importing and saving data to multiple file formats (csv, sql, pkl, …)

What is a Pandas DataFrame?

A Pandas DataFrame is a data structure that combines 1-dimensional arrays into two-dimensional structures with rows and columns that can contain different data types

Basic Structure of a pandas DataFrame

A Pandas Dataframe contains columns, also called Series, rows, indexes, and also store the data types of the values.

Excel to Python

These 18 Pandas functions will help you replace Excel with Pandas.

import pandas as pd
df = pd.read_excel('filename.xlsx') # Read Excel File
df.to_excel('filename.xlsx', index=False) # Save Excel File
pd.read_csv('filename.csv') # Read CSV File
df.to_csv('filename.csv') # Save CSV File
df.head() # Preview dataframe
df.count() # Count Rows
df.describe() # Get general statistics (min,max,mean,std,...)
df['col_name'] # Select one column
df['col_name'].value_counts() # Get unique value count
df.fillna(0) # Replace Null values
df.dropna() # Remove Null values
df[df['col_name'] == 0] # Filter DataFrame
df.drop_duplicates() # Remove duplicates
df.merge(df2,on='col_name') # vlookup
df.replace() # Find and replace
df.groupby(['col1','col2']).sum() # Pivot/Groupby
df.groupby(['col1','col2']).agg({'col1':'sum','col2','count'})
df.plot() # Plot your DataFrame
df = pd.util.testing.makeDataFrame() # or # pd.get_dummies(pd.Series(list('abcaa'))) # Create dummy dataframe

Import Package

The first step is to import the pandas package into your code.

import pandas as pd

Work With Excel Files

To read and save to Excel files, use pandas.read_excel or pandas.to_excel.

import pandas as pd

df = pd.read_excel('filename.xlsx') # Read Excel File
df.to_excel('filename.xlsx', index=Fales) # Save Excel File

Work With CSV Files

import pandas as pd

pd.read_csv('filename.csv') # Read CSV File
df.to_csv('filename.csv') # Save CSV File

Create a DataFrame

With Arrays

import numpy as np 
import pandas as pd 

arr = np.array([
    [1,4], [2,5], [3,6]
    ])
    
pd.DataFrame(arr)

With Lists

import pandas as pd

ls = [
    ['a','d'], ['b','e'], ['c','f']
    ]
    
pd.DataFrame(ls)

With Dictionary

import pandas as pd

# Create lists for the data
# Each list will become a pd.Series
pages = ['/python', '/pandas', '/seo', '/excel']
clicks = [400, 300, 200, 100]

# Store lists into a dictionary
dictionary = {
    'pages':pages,
    'clicks':clicks
}

# Create the DataFrame
pd.DataFrame(dictionary)

Preview DataFrame

To preview your Dataframe, use the head() method.

df.head(2) # Preview first 2 rows of dataframe

Get Statistics

df.count() # Count Rows
df.describe() # Get general statistics (min,max,mean,std,...)
df['col_name'].value_counts() # Get unique value count

Work with DataFrame

df['col_name'] # Select one column
df.fillna(0) # Replace Null values
df.dropna() # Remove Null values
df[df['col_name'] == 0] # Filter DataFrame
df.drop_duplicates() # Remove duplicates

Replace Vlookup With Pandas merge

To replace vlookup using pandas, use the pandas.DataFrame.merge method.

df.merge(df2,  on='col_name') # vlookup

Both Data frames will need to have a common “key”. In the example above, both df and df2 should have a column named “col_name“.

You Might Also Like  What GMB Categories are the Competition Using?

Pandas .replace() vs Excel Find and Replace

Pandas .replace() method is far more efficient than Excel’s search and replace or substitute functionalities.

df.replace('to_replace', 'new_value) # Find and Replace
df.replace(regex=r'^ba.$', value='new') # Allows regex

Replace Pivot Tables with GroupBy

df.groupby(['col1','col2']).sum() # Pivot/Groupby
df.groupby(['col1','col2']).agg({'col1':'sum','col2','count'})

Plot Your DataFrame

import pandas as pd
from pandas.util.testing import makeDataFrame
import matplotlib.pyplot as plt 

# Make Dummy dataframe
df = makeDataFrame()

# Plot histogram
df['A'].hist()
plt.show()

# Plot a bar plot
g_df = df.groupby('A')['B'].mean()
g_df.plot(kind='bar')

# Scatter plot in Pandas
df.plot(x='A', y='B', kind='scatter')
plt.show()

# Customize plot
df.plot(
    x='A', 
    y='B', 
    kind='scatter', 
    color='red', 
    marker='D',
    title='Scatter Plot',
    xlabel='X-axis',
    ylabel='Y-axis'
    )
plt.show()

Interesting Applications of Python’s Pandas

If you are interested in going further, you can read about the other important Python libraries for Data Science, or learn by doing projects with Pandas:

Conclusion

This is it for the introduction to Pandas in Python.

You can now replace Excel with Python and never look back. If you are hooked, try 30 days of Pandas.