Python Friday #116: Search Twitter from Tweepy

Searching Twitter from Tweepy is a lot less troublesome than working with lists. Let’s explore the search API of Twitter and the V2 client of Tweepy.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

 

3 things you must know about Twitter search

Twitter allows you to search recent tweets only (posted in the last 7 days). When you need to go back further, you need an academic research account and Twitter does not give them out easily. Expect a much longer approval process than for your developer account – and a much higher decline rate.

The search API is built to handle a huge number of requests and it is heavily optimized to reduce the amount of data it needs to transfer. Therefore, by default you only get a minimalistic set of data about a tweet back.

For most use-cases you need additional data. Luckily, we can tell the search API exactly what data we want. This flexibility comes with the price that you need to learn what options you can use. Otherwise, you may not be able to do something useful with the search results.

 

The V2 client

For search the V2 API client works very well. We only need the bearer token to search Twitter:

The rate limit for the search endpoint is between 500’000 and 1’000’000 requests per month. If you reach that limit you better stop your application, waiting on the reset may be a colossal waste of energy.

 

Search Twitter (basic)

The basic search without any additional fields works with this code snipped:

When I run this code, I got these results back:

————————————————–
1499834381367746561 (None) – None:
RT @j_graber: #Python Friday #112: How to Use #Tweepy in #Flask https://t.co/fNNOXPB9GI

————————————————–
1499834217802387456 (None) – None:
RT @j_graber: #Python Friday #112: How to Use #Tweepy in #Flask https://t.co/fNNOXPB9GI

————————————————–
1499834195820036098 (None) – None:
RT @j_graber: #Python Friday #112: How to Use #Tweepy in #Flask https://t.co/fNNOXPB9GI

————————————————–
1499822223338655756 (None) – None:
#Python Friday #112: How to Use #Tweepy in #Flask https://t.co/fNNOXPB9GI

The basic result set has the Tweet id and the text. Everything else from author to when the Tweet was posted is empty.

 

Tailor the result

In the endpoint documentation you find all the Enum values you can use to tailor the representation of the search results. There is a good explanation on how annotations work and how you can use fields and expand them.

Here are a few combinations I find most useful:

tweet_fields author_id The numerical Id of the tweet author
created_at When the Tweet was posted
public_metrics Counter for retweets, likes, quotes and replies
attachments References for media files (images, videos)
user_fields username The screen name (@…)
name The name of the user
profile_image_url The URL of the profile picture
media_fields public_metrics Metrics on the attachment
url URL of the media
height Height of the media
width Width of the media
alt_text The alternative text
expansions author_id Must be set to get user objects
attachments.media_keys Must be set to get media fields

 

Structure of the search result

It is important to know that the extra data you request will not be part of the Tweet object. Instead, users and media objects will be in the includes dictionary of the response:

Be aware that the dictionaries may be empty if your search result does not contain media objects.

 

Search Twitter (expanded)

If we put everything from above together, we can search for the hashtag #VisitOslo and try finding some pictures:

As I run the search, I got an interesting set of results (cut to the two important tweets):

#ThePhotoHour (@ThePhotoHour) [https://pbs.twimg.com/profile_images/969225604351578113/LS5Yis-q_normal.jpg]
Ragnhild Aarseth (@Riamolde) [https://pbs.twimg.com/profile_images/1344351622/RAA_normal.jpg]
https://pbs.twimg.com/media/FNGXbMvX0AYi0uq.jpg – 960×720 – Alt: None
————————————————–
1500159252580777989 (2022-03-05 17:20:12+00:00) – #ThePhotoHour (@ThePhotoHour) [https://pbs.twimg.com/profile_images/969225604351578113/LS5Yis-q_normal.jpg:
RT @Riamolde: Winter street in #Oslo #norway #visitnorway #winterwonderland #winterinnorway #visitoslo #StormHour #ThePhotoHour https://t.c…

retweets: 4 | likes: 0
————————————————–
1500146647099219969 (2022-03-05 16:30:06+00:00) – Ragnhild Aarseth (@Riamolde) [https://pbs.twimg.com/profile_images/1344351622/RAA_normal.jpg:
Winter street in #Oslo #norway #visitnorway #winterwonderland #winterinnorway #visitoslo #StormHour #ThePhotoHour https://t.co/F27Ao2Yh3j

retweets: 4 | likes: 21
Media attachment: https://pbs.twimg.com/media/FNGXbMvX0AYi0uq.jpg – 960×720 – Alt: None

The includes part of the response got 2 authors and one media file. If you take a closer look at the tweets, you see that the first tweet is a retweet of the second, older tweet. However, only the second tweet contains a media attachment. This is the case because in the search I did not ask to expand the retweeted tweets and so I did not get that data back.

 

Next

This post shows how we can tailor the search results to exactly match our requirements. Unfortunately, that means we have to learn a whole lot to get what we need. Next week we try to stream search results in real-time as those tweets get posted.

1 thought on “Python Friday #116: Search Twitter from Tweepy”

  1. Great work! Thanks.

    What if we want to get exactly the same result as above (tweets and associated user data) but for more than 100 tweets? I know pagination and flattening can be used, but this doesn’t include user data. How can we include user data specially that search_recent_tweets doesn’t work witg next token?

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.