Subscribe now to Datacamp and save 63% - Subscribe now
Reddit API JSON Documentation
Share this post

This post is part of the Complete Guide on the Reddit API

The documentation on the Reddit API JSON is very confusing to non-developers.

I wrote this guide to help you make sense of Reddit’s API JSON response.

We will also parse the response to show interesting data like this:


Subscribe to my Newsletter


Getting Started

The guide is in Python, so if you don’t know how to use python, you can read my complete guide on Python for SEO, or just follow the steps with your favourite tool.

An API, or application programming interface, gives you access to their data through structured data, in this case, JSON.

You can even open this in your browser to simply view the JSON response.

https://www.reddit.com/r/python/top.json?limit=100&t=month

Make a Simple Request to get the JSON

Here, using requests we will extract the top 100 posts of the month in the r/python subreddit.

If you don’t know how to do that, just read my post on using Reddit API without credentials.

The limit is 100, if you want to get more, you will have to make multiple requests.

import requests
 
subreddit = 'python'
limit = 100
timeframe = 'month' #hour, day, week, month, year, all
listing = 'top' # controversial, best, hot, new, random, rising, top
 
def get_reddit(subreddit,listing,limit,timeframe):
    try:
        base_url = f'https://www.reddit.com/r/{subreddit}/{listing}.json?limit={limit}&t={timeframe}'
        request = requests.get(base_url, headers = {'User-agent': 'yourbot'})
    except:
        print('An Error Occured')
    return request.json()
 
r = get_reddit(subreddit,listing,limit,timeframe)

Overview of the JSON

By looking at the response r, you get a file that has that kind of structure.

{
    "kind": "string", 
    "data": {
        "modhash": "string", 
        "dist": int, 
        "children": [{
            "kind": "string", 
            "data": {
                "approved_at_utc":"string", 
                "subreddit": "string", 
                "selftext": "string" 
                ...,
                "is_video":"boolean"
            }],
        "after":"",
        "before:""
    }
}

Basically, as we will see details of it later, all the data is under r['data']['children'][i]['data']. With i being the number for the position of the post that you want to select (from 0 to 99 in our case).

You Might Also Like  How to Post on LinkedIn API With Python

You can inside the dig object by object by looking at the keys of the dictionary.

print(r.keys())
# dict_keys(['kind', 'data'])

Where is the data of the JSON file?

All the data of the Reddit API JSON response is nested inside the ‘children’ object of the ‘data’.

r['data']['children']

As you see we have a 100 posts in JSON object.

len(r['data']['children'])
# 100

What can you Extract From a Post on Reddit?

Now, we need to select the first post of the 100 posts using r['data']['children'][0].

print(r['data']['children'][0].keys())
# dict_keys(['kind', 'data'])

List of the Reddit’s JSON Keys

To access a JSON object within JSON array in python, you need to select the ‘data’ from ‘children’ object: r‘data’.

Furthermore, we will look at the keys of that dictionary to look at what we can extract.

for k in r['data']['children'][0]['data'].keys():
    print(k)
approved_at_utc
subreddit
selftext
author_fullname
saved
mod_reason_title
gilded
clicked
title
link_flair_richtext
subreddit_name_prefixed
hidden
pwls
link_flair_css_class
downs
thumbnail_height
top_awarded_type
hide_score
name
quarantine
link_flair_text_color
upvote_ratio
author_flair_background_color
subreddit_type
ups
total_awards_received
media_embed
thumbnail_width
author_flair_template_id
is_original_content
user_reports
secure_media
is_reddit_media_domain
is_meta
category
secure_media_embed
link_flair_text
can_mod_post
score
approved_by
author_premium
thumbnail
edited
author_flair_css_class
author_flair_richtext
gildings
content_categories
is_self
mod_note
created
link_flair_type
wls
removed_by_category
banned_by
author_flair_type
domain
allow_live_comments
selftext_html
likes
suggested_sort
banned_at_utc
view_count
archived
no_follow
is_crosspostable
pinned
over_18
all_awardings
awarders
media_only
link_flair_template_id
can_gild
spoiler
locked
author_flair_text
treatment_tags
visited
removed_by
num_reports
distinguished
subreddit_id
mod_reason_by
removal_reason
link_flair_background_color
id
is_robot_indexable
report_reasons
author
discussion_type
num_comments
send_replies
whitelist_status
contest_mode
mod_reports
author_patreon_flair
author_flair_text_color
permalink
parent_whitelist_status
stickied
url
subreddit_subscribers
created_utc
num_crossposts
media
is_video

Extract Interesting Data from Reddit

Last, all you have to do is select what you want from the list.

to_extract = ['title','url','score','num_comments','view_count','ups','downs','selftext']

for e in to_extract:
    print(f"{e}: {r['data']['children'][0]['data'][e]}")
title: Spent 9hrs finding a bug yesterday, took 15mins to figure it out today
url: https://www.reddit.com/r/Python/comments/koat5n/spent_9hrs_finding_a_bug_yesterday_took_15mins_to/
score: 2204
num_comments: 180
view_count: None
ups: 2204
downs: 0
selftext: I spent the whole day finding a bug yesterday, couldn't find it at the end of the day and got a headache due to stress. Woke up today and found the bug 15 mins after.

Worrying about the delay in the project fogged my mind and I couldn't think logically, blind to different possibilities.

Taking a break and having a clear mind is very important. This has happened to me a couple of times so decided to post this here today to remember not to repeat this ever lol.


Edit: Thanks for the award kind stranger. I thought this was more of a personal problem, reading all the comments I'm happy to know I'm not alone. I feel more normal now 🙂.

Understand the Other Objects of the Reddit’s JSON

We now have covered the most important aspect of the Reddit JSON.

You Might Also Like  Upload Sitemaps to Google Search Console with the API (Python)

Let’s look at the other objects.

  • Response’s Kinds
  • Other objects inside the data

Reddit Response’s Kind

Kind returns a string that tells the type of the object. You will not find any “data” in that key.

print(r['kind'])
# Listing

In that case, ‘Listing’ represents a list of things. ‘Listing’ is used to paginate results when they are too long to display all at once.

If we look at the actual post ‘kind’, you will see a different string identifier.

r['data']['children'][0]['kind']
# 't3'

Here are the meanings of the other list identifiers.

  • t1: Comment
  • t2: Account
  • t3: Link
  • t4: Message
  • t5: Subreddit
  • t6: Award

Other Objects Inside the Data

Inside the data, we had 5 elements: modhash, dist, children, after, before.

r['data'].keys()
# dict_keys(['modhash', 'dist', 'children', 'after', 'before'])

We have already covered the ‘children‘ element, let’s look at the others.

print(f"Modhash: {r['data']['modhash']}")
print(f"dist: {r['data']['dist']}")
print(f"after: {r['data']['after']}")
print(f"before: {r['data']['before']}")
# Modhash: 
# dist: 100
# after: t3_kkwabd
# before: None
  • Modhash: The modhash is to prevent CSRF, but since we did not log-in to make that request, we did not set modhash.
  • Dist: Is the number of items that you extracted
  • After: Name of the listing that follows after this page. None if no page after (i.e. you extracted the last result).
  • Before: Name of the listing that comes before this page

Conclusion

There it is. Hopefully, Reddit’s API JSON response makes a little more sense for you. If you want to dive deeper, you can always go to the dev documentation or to this outdated repository on the subject.