I was trying to get a database of recent jokes and quotes, unfortunately there aren’t a lot of really good downloadable sources that have new and relevant material.
Reddit is a social media site that has a lot of potential for building databases or lists of recent jokes and quotes.
In this blog I wanted to document how I created a list of recent jokes and quotes using the Python reddit library (praw) . I also included some filtering to help remove bad items.
Reddit is a social media source that is free to use. To pull information from Reddit you will need to create an account and get API client ID and secret. To do this go to: https://www.reddit.com/prefs/apps/ and select edit in the developed applications area.
The Reddit Python library is called Praw, and it is installed by:
pip install praw
To use the Python Reddit library you will need:
- your username
- your password
- your client ID and client secret
Reddit has a wide list of categories and once a category is selected you can sort the list by: new, hot and trending.
Python Joke Example
An example of using Python to get the “hot” dad jokes from Reddit is below. For this example I included a bad_keywords list. Some trial and error will be required to remove some returned items
# Python Reddit Example # Get top 10 dad jokes
# Some filtering is added to remove bad items
import praw
import re # use this is filter out bad items
# remove items with these words in them (could include a large list of swear words)
bad_keywords = ['sex','prostitute','shit','edit','remove','delete', 'repost','this sub']
reddit = praw.Reddit(client_id='xQsMfaHxxxxxxx',
client_secret='X8r62koQgVxxxxxxxx',
user_agent='myreddit', username='yourusername', password='xxxxxxx')
i=0
for submission in reddit.subreddit('dadjokes').hot(limit=10):
thestring = submission.title + " " + submission.selftext
if not re.compile('|'.join(bad_keywords),re.IGNORECASE).search(thestring):
i += 1
print(i,submission.title,"..." submission.selftext )
The output will look something like:
1 What’s Beethoven doing in his grave ... De-composing 2 Why can’t Swiss cheese be part of a fat-free diet? ... It’s made with hole milk. 3 People always wonder how I come up with flaccid penis jokes so easily and I just respond back with... ... It's not that hard. 4 2020 is going to be a great year. ... I can see it so clearly. 5 I got kicked out of karaoke after singing “Danger Zone” nine times in a row. ... Too many Loggins attempts. 6 What did one snowman say to the other snowman? ... "Do you smell carrots?" 7 The store near me is having a sale on batteries. ... If you buy two packs, they'll throw in a pack of dead ones, free of charge. 8 Mr Ed just moved next door to me a few days ago. ... We’re neighbors now.
There are a number of other joke categories such as : yomamajokes, jokes, cleanjokes, greyjokes…
The reddit.subreddit object can have .hot, .new and .top calls.
For quotes see categories such as : quotes, showerthoughts …