This post is part of the complete guide on Slack API with Python
In this post, we will implement and smoke test to check whether the robots.txt file changed. When the robots.txt file is changed, it will back up the previous version and send an alert to the Slack API.

To set-up a robots.txt smoke test in Python:
- Create hygiene functions to manipulate data
- Read and use the credentials
- Fetch the robots.txt
- Compare to the previous version of the Robots.txt
- Send a message to Slack
Requirements
- Have Python Installed
- Create a Slack Account
- Get your Slack API Webhooks
- Add the slack webhooks to a credentials.json file

Hygiene Functions
A lot of my projects start with these functions that I use over and over.
create_project()
: If a project does not exist, create it.get_domain_name()
: Converts the URL into a folder name.fetch_page()
: Uses the request library to fetch a page.get_date()
: Get today’s date.
from datetime import datetime import os import requests from urllib.parse import urlparse def create_project(directory): ''' Create a project if it does not exist ''' if not os.path.exists(directory): print('Create project: '+ directory) os.makedirs(directory) else: print(f'{directory} project exists') def get_domain_name(start_url): ''' Get Domain Name in the www_domain_com format 1. Parse URL 2. Get Domain 3. Replace dots to make a usable folder path ''' url = urlparse(start_url) # Parse URL into components domain_name = url.netloc # Get Domain (or network location) domain_name = domain_name.replace('.','_')# Replace . by _ to create usable folder domain_name = domain_name.split(':')[0] return domain_name def fetch_page(url): ''' Use requests to fetch URL ''' try: r = requests.get(url) except Exception as e: print(e) return r def get_date(): ''' Get today's date as YYYY-MM-DD ''' return datetime.today().strftime('%Y-%m-%d:%H:%M%S')
Slack Functions
The get_credentials()
function uses the json
library to read the credentials file.
The post_to_slack()
function adds the message json to be added to the requests header and the credentials.
import json def get_credentials(credentials): ''' Read credentials from JSON file. ''' with open(credentials, 'r') as f: creds = json.load(f) return creds['slack_webhook'] def post_to_slack(message,credentials): ''' Create header with the message and post to slack. ''' data = {'text':message} url = get_credentials(credentials) requests.post(url,json=data, verify=False)
Robots.txt Functions
Next, we will set-up the functions to deal with the robots.txt.
get_robots_files()
: List all the previous robots.txt files, sorted by dateget_robotstxt_name()
: Get the latest saved robots.txt.read_robotstxt()
: Reads the robots.txt and returns the file.write_robotstxt()
: Backups the new robots.txt file.
def get_robots_files(directory): ''' List robots.txt file in output folder ''' output_files = os.listdir(directory) output_files.sort() return output_files def get_robotstxt_name(output_files): ''' Get last saved robots.txt file. ''' robots_txt = 'robots.txt' files = [] for filename in output_files: if robots_txt in filename: files.append(filename) return files[-1] def read_robotstxt(filename): ''' Read robots.txt ''' if os.path.isfile(filename): with open(filename,'r') as f: txt = f.read() return txt def write_robotstxt(file, output, filename='robots.txt'): ''' Write robots.txt to file using date as identifier. ''' filename = get_date() + '-' + filename filename = output + filename print(f'filename: {filename}') with open(filename,'w') as f: f.write(file)
Compare Current VS New Robots.txt
The compare_robots()
function checks the latest saved robots.txt file, fetches the robots.txt in your site, and compares whether or not the file is the same.
If it is new, or has changed, it will write the robots.txt that is currently on your site to a file that contains the date.
Then, it will use the post_to_slack()
function to alert you of the change to the robots.txt.
If the file has not changed, the file is not saved, and a simple print function tells you that there were no new change.
def compare_robots(credentials,output,output_files,r): ''' Compare previous robots to actual robots. If different. Save File. ''' new_robotstxt = r.text.replace('\r', '') if output_files: robots_filename = get_robotstxt_name(output_files) filename = output + robots_filename last_robotstxt = read_robotstxt(filename) last_robotstxt = last_robotstxt.replace('\r', '') if new_robotstxt != last_robotstxt: print('Robots.txt was modified') write_robotstxt(new_robotstxt, output) message = f'The Robots.txt was changed for {filename}' post_to_slack(message,credentials) else: print('No Change to Robots.txt') else: print('No Existing Robots.txt. Saving one.') write_robotstxt(r.text, output)
Run the Code
Let’s combine the entire code into a main()
function.
site = 'https://www.jcchouinard.com/' def main(credentials,site,filename='robots.txt'): ''' Combine all functions ''' url = site + filename domain = get_domain_name(site) output = path + '/output/' + domain + '/' create_project(output) r = fetch_page(url) output_files = get_robots_files(output) compare_robots(credentials,output,output_files,r)
To run the function, add the credentials first and then run the function we created above.
if __name__ == '__main__': path = os.getcwd() credentials = path + '/creds.json' main(credentials,site)
The if name equals main line checks whether you are running the module or importing it. If you are importing it, main()
will not run.
Full Code
#!/usr/bin/env python from datetime import datetime import json import os import requests from urllib.parse import urlparse site = 'https://www.jcchouinard.com/' def main(credentials,site,filename='robots.txt'): ''' Combine all functions ''' url = site + filename domain = get_domain_name(site) output = path + '/output/' + domain + '/' create_project(output) r = fetch_page(url) output_files = get_robots_files(output) compare_robots(credentials,output,output_files,r) def get_credentials(credentials): ''' Read credentials from JSON file. ''' with open(credentials, 'r') as f: creds = json.load(f) return creds['slack_webhook'] def post_to_slack(message,credentials): ''' Create header with the message and post to slack. ''' data = {'text':message} url = get_credentials(credentials) requests.post(url,json=data, verify=False) def create_project(directory): ''' Create a project if it does not exist ''' if not os.path.exists(directory): print('Create project: '+ directory) os.makedirs(directory) else: print(f'{directory} project exists') def get_domain_name(start_url): ''' Get Domain Name in the www_domain_com format 1. Parse URL 2. Get Domain 3. Replace dots to make a usable folder path ''' url = urlparse(start_url) # Parse URL into components domain_name = url.netloc # Get Domain (or network location) domain_name = domain_name.replace('.','_')# Replace . by _ to create usable folder domain_name = domain_name.split(':')[0] return domain_name def fetch_page(url): ''' Use requests to fetch URL ''' try: r = requests.get(url) except Exception as e: print(e) return r def get_robots_files(directory): ''' List robots.txt file in output folder ''' output_files = os.listdir(directory) output_files.sort() return output_files def get_robotstxt_name(output_files): ''' Get last saved robots.txt file. ''' robots_txt = 'robots.txt' files = [] for filename in output_files: if robots_txt in filename: files.append(filename) return files[-1] def read_robotstxt(filename): ''' Read robots.txt ''' if os.path.isfile(filename): with open(filename,'r') as f: txt = f.read() return txt def get_date(): ''' Get today's date as YYYY-MM-DD ''' return datetime.today().strftime('%Y-%m-%d:%H:%M%S') def write_robotstxt(file, output, filename='robots.txt'): ''' Write robots.txt to file using date as identifier. ''' filename = get_date() + '-' + filename filename = output + filename print(f'filename: {filename}') with open(filename,'w') as f: f.write(file) def compare_robots(credentials,output,output_files,r): ''' Compare previous robots to actual robots. If different. Save File. ''' new_robotstxt = r.text.replace('\r', '') if output_files: robots_filename = get_robotstxt_name(output_files) filename = output + robots_filename last_robotstxt = read_robotstxt(filename) last_robotstxt = last_robotstxt.replace('\r', '') if new_robotstxt != last_robotstxt: print('Robots.txt was modified') write_robotstxt(new_robotstxt, output) message = f'The Robots.txt was changed for {filename}' post_to_slack(message,credentials) else: print('No Change to Robots.txt') else: print('No Existing Robots.txt. Saving one.') write_robotstxt(r.text, output) if __name__ == '__main__': path = os.getcwd() credentials = path + '/credentials.json' main(credentials,site)
Automate The Robots.txt Backups
To automate the robots.txt smoke tests and backups, you can learn how to schedule your python script on Windows task scheduler, or automate python script using CRON on Mac.
You can run your CRON task every day like this:
$ 30 8 * * * /path/to/python /path/to/file/check_robotstxt.py > /path/to/logger/robotstxtchecker.log 2>&1
This is it, you now have implemented a smoke test to check if the robots.txt file of your site has changed and alerted yourself with Python and the Slack API.
Sr SEO Specialist at Seek (Melbourne, Australia). Specialized in technical SEO. In a quest to programmatic SEO for large organizations through the use of Python, R and machine learning.