Python Friday #116: Search Twitter from Tweepy

Searching Twitter from Tweepy is a lot less troublesome than working with lists. Let’s explore the search API of Twitter and the V2 client of Tweepy.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

3 things you must know about Twitter search

Twitter allows you to search recent tweets only (posted in the last 7 days). When you need to go back further, you need an academic research account and Twitter does not give them out easily. Expect a much longer approval process than for your developer account – and a much higher decline rate.

The search API is built to handle a huge number of requests and it is heavily optimized to reduce the amount of data it needs to transfer. Therefore, by default you only get a minimalistic set of data about a tweet back.

For most use-cases you need additional data. Luckily, we can tell the search API exactly what data we want. This flexibility comes with the price that you need to learn what options you can use. Otherwise, you may not be able to do something useful with the search results.

The V2 client

For search the V2 API client works very well. We only need the bearer token to search Twitter:

import tweepy
import os
from dotenv import load_dotenv
load_dotenv()


bearer_token = os.getenv('bearer-token')

client = tweepy.Client(bearer_token)

import tweepy

import os

from dotenv import load_dotenv

load_dotenv()

bearer_token = os.getenv('bearer-token')

client = tweepy.Client(bearer_token)

The rate limit for the search endpoint is between 500’000 and 1’000’000 requests per month. If you reach that limit you better stop your application, waiting on the reset may be a colossal waste of energy.

Search Twitter (basic)

The basic search without any additional fields works with this code snipped:

# Search recent tweets (last 7 days)
response = client.search_recent_tweets("#Python Friday 112",max_results=15)
tweets = response.data

for tweet in tweets:
    print('-' * 50)
    print(f"{tweet.id} ({tweet.created_at}) - {tweet.author_id}:\n{tweet.text}\n")

# Search recent tweets (last 7 days)

response = client.search_recent_tweets("#Python Friday 112",max_results=15)

tweets = response.data

for tweet in tweets:

print('-' * 50)

print(f"{tweet.id} ({tweet.created_at}) - {tweet.author_id}:\n{tweet.text}\n")

When I run this code, I got these results back:

————————————————–
1499834381367746561 (None) – None:
RT @j_graber: #Python Friday #112: How to Use #Tweepy in #Flask https://t.co/fNNOXPB9GI

————————————————–
1499834217802387456 (None) – None:
RT @j_graber: #Python Friday #112: How to Use #Tweepy in #Flask https://t.co/fNNOXPB9GI

————————————————–
1499834195820036098 (None) – None:
RT @j_graber: #Python Friday #112: How to Use #Tweepy in #Flask https://t.co/fNNOXPB9GI

————————————————–
1499822223338655756 (None) – None:
#Python Friday #112: How to Use #Tweepy in #Flask https://t.co/fNNOXPB9GI

The basic result set has the Tweet id and the text. Everything else from author to when the Tweet was posted is empty.

Tailor the result

In the endpoint documentation you find all the Enum values you can use to tailor the representation of the search results. There is a good explanation on how annotations work and how you can use fields and expand them.

Here are a few combinations I find most useful:

tweet_fields	author_id	The numerical Id of the tweet author
	created_at	When the Tweet was posted
	public_metrics	Counter for retweets, likes, quotes and replies
	attachments	References for media files (images, videos)
user_fields	username	The screen name (@…)
	name	The name of the user
	profile_image_url	The URL of the profile picture
media_fields	public_metrics	Metrics on the attachment
	url	URL of the media
	height	Height of the media
	width	Width of the media
	alt_text	The alternative text
expansions	author_id	Must be set to get user objects
expansions	attachments.media_keys	Must be set to get media fields

Structure of the search result

It is important to know that the extra data you request will not be part of the Tweet object. Instead, users and media objects will be in the includes dictionary of the response:

Response(
  data=[
    <Tweet id=... text='...'>,
    <Tweet id=... text='...'>, 
    <Tweet id=... text='...'>, 
    <Tweet id=... text='...'>, 
    <Tweet id=... text='...'>
  ], 
  includes={
    'users': [
      <User id=1... name=... username=...>, 
      <User id=2... name=... username=...>, 
      <User id=3... name=... username=...>, 
      <User id=4... name=... username=...>, 
    ], 
    'media': [
      <Media media_key=3_1500146... type=photo>, 
      <Media media_key=3_1500069... type=photo>
    ]}, 
    errors=[], 
    meta={
      'newest t_id': '...', 
      'oldest_id': '...', 
      'result_count': 10, '
      next_token': '...'
    }
)

Response(

data=[

<Tweet id=... text='...'>,

includes={

'users': [

<User id=1... name=... username=...>,

<User id=2... name=... username=...>,

<User id=3... name=... username=...>,

<User id=4... name=... username=...>,

'media': [

<Media media_key=3_1500146... type=photo>,

]},

errors=[],

meta={

'newest t_id': '...',

'oldest_id': '...',

'result_count': 10, '

next_token': '...'

}

)

Be aware that the dictionaries may be empty if your search result does not contain media objects.

Search Twitter (expanded)

If we put everything from above together, we can search for the hashtag #VisitOslo and try finding some pictures:

response = client.search_recent_tweets(
                "#VisitOslo", 
                max_results=10,
                expansions="author_id,attachments.media_keys",
                tweet_fields="created_at,public_metrics,attachments",
                user_fields="username,name,profile_image_url",
                media_fields="public_metrics,url,height,width,alt_text")

# process users
users = {}
for user in response.includes['users']:
    # print(user.username)
    # print(user.name)
    users[user.id] = f"{user.name} (@{user.username}) [{user.profile_image_url}]"
    print(users[user.id])
    # print(dir(inclu))

# process media attachment
media = {}
for item in response.includes['media']:
    media[item.media_key] = f"{item.url} - {item.height}x{item.width} - Alt: {item.alt_text}"
    print(media[item.media_key])


tweets = response.data

# The expanded tweet offers a lot more data
for tweet in tweets:
    print('-' * 50)
    print(f"{tweet.id} ({tweet.created_at}) - {users[tweet.author_id]}:\n {tweet.text} \n")
    metric = tweet.public_metrics
    print(f"retweets: {metric['retweet_count']} | likes: {metric['like_count']}")
    if tweet.attachments is not None:
        for media_key in tweet.attachments['media_keys']:
            print(f"Media attachment: {media[media_key]}")

response = client.search_recent_tweets(

"#VisitOslo",

max_results=10,

expansions="author_id,attachments.media_keys",

tweet_fields="created_at,public_metrics,attachments",

user_fields="username,name,profile_image_url",

media_fields="public_metrics,url,height,width,alt_text")

# process users

users = {}

for user in response.includes['users']:

# print(user.username)

# print(user.name)

users[user.id] = f"{user.name} (@{user.username}) [{user.profile_image_url}]"

print(users[user.id])

# print(dir(inclu))

# process media attachment

media = {}

for item in response.includes['media']:

media[item.media_key] = f"{item.url} - {item.height}x{item.width} - Alt: {item.alt_text}"

print(media[item.media_key])

tweets = response.data

# The expanded tweet offers a lot more data

for tweet in tweets:

print('-' * 50)

print(f"{tweet.id} ({tweet.created_at}) - {users[tweet.author_id]}:\n {tweet.text} \n")

metric = tweet.public_metrics

print(f"retweets: {metric['retweet_count']} | likes: {metric['like_count']}")

if tweet.attachments is not None:

for media_key in tweet.attachments['media_keys']:

print(f"Media attachment: {media[media_key]}")

As I run the search, I got an interesting set of results (cut to the two important tweets):

#ThePhotoHour (@ThePhotoHour) [https://pbs.twimg.com/profile_images/969225604351578113/LS5Yis-q_normal.jpg]
Ragnhild Aarseth (@Riamolde) [https://pbs.twimg.com/profile_images/1344351622/RAA_normal.jpg]
https://pbs.twimg.com/media/FNGXbMvX0AYi0uq.jpg – 960×720 – Alt: None
————————————————–
1500159252580777989 (2022-03-05 17:20:12+00:00) – #ThePhotoHour (@ThePhotoHour) [https://pbs.twimg.com/profile_images/969225604351578113/LS5Yis-q_normal.jpg:
RT @Riamolde: Winter street in #Oslo #norway #visitnorway #winterwonderland #winterinnorway #visitoslo #StormHour #ThePhotoHour https://t.c…

retweets: 4 | likes: 0
————————————————–
1500146647099219969 (2022-03-05 16:30:06+00:00) – Ragnhild Aarseth (@Riamolde) [https://pbs.twimg.com/profile_images/1344351622/RAA_normal.jpg:
Winter street in #Oslo #norway #visitnorway #winterwonderland #winterinnorway #visitoslo #StormHour #ThePhotoHour https://t.co/F27Ao2Yh3j

retweets: 4 | likes: 21
Media attachment: https://pbs.twimg.com/media/FNGXbMvX0AYi0uq.jpg – 960×720 – Alt: None

The includes part of the response got 2 authors and one media file. If you take a closer look at the tweets, you see that the first tweet is a retweet of the second, older tweet. However, only the second tweet contains a media attachment. This is the case because in the search I did not ask to expand the retweeted tweets and so I did not get that data back.

This post shows how we can tailor the search results to exactly match our requirements. Unfortunately, that means we have to learn a whole lot to get what we need. Next week we try to stream search results in real-time as those tweets get posted.

1 thought on “Python Friday #116: Search Twitter from Tweepy”

Mona

2022-09-23 at 21:04

Great work! Thanks.

What if we want to get exactly the same result as above (tweets and associated user data) but for more than 100 tweets? I know pagination and flattening can be used, but this doesn’t include user data. How can we include user data specially that search_recent_tweets doesn’t work witg next token?

Python Friday #116: Search Twitter from Tweepy

3 things you must know about Twitter search

The V2 client

Search Twitter (basic)

Tailor the result

Structure of the search result

Search Twitter (expanded)

Next

Like this:

Related

1 thought on “Python Friday #116: Search Twitter from Tweepy”

Leave a Comment Cancel reply

3 things you must know about Twitter search

The V2 client

Search Twitter (basic)

Tailor the result

Structure of the search result

Search Twitter (expanded)

Next

Share this:

Like this:

Related

1 thought on “Python Friday #116: Search Twitter from Tweepy”

Leave a Comment Cancel reply