Python Friday #117: Streaming Search Results With Tweepy

The number of tweets written on Twitter every second is massive. With the streaming clients we can get a glimpse on what is going on in near real-time. Let’s find out what Tweepy offers and how the V1 and V2 endpoints differ.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

 

Streaming with the V1 endpoint

The V1 endpoint for streaming is straightforward. You need to create a subclass of tweepy.Stream, override the method on_status(), initialise it with your keys and filter the stream by the keywords you are interested in:

When you run the code, you get a near real-time output of the tweets containing the hashtag #Python:

————————————————–
1500537305265754115 glowee (@gloryokafor6) at 2022-03-06 18:22:26+00:00:
RT @KanezaDiane: #Infographic
Facts you need to know about AI
#ArtificialIntelligence #100DaysOfCode #AI #DataScience @Fisher85M #Python #B…
————————————————–
1500537361561792523 TensorFlow Bot (@bot_tensorflow) at 2022-03-06 18:22:40+00:00:
RT @byLilyV: #FEATURED #COURSES

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on #machine #learning tutoria…
————————————————–

 

Streaming with the V2 endpoint

The V2 endpoint has some significant differences that you need to know. Otherwise, you will spend hours in debugging and end up with nothing to show for.

We need to subclass tweepy.StreamingClient and instantiate it with our bearer token. That is a welcoming simplification compared to V1. Here we must override the method on_tweet() to print the tweets we receive.

We can filter the stream with a rule (of type StreamRule) in which we set our search terms in the value argument.

With everything in place, we can call the filter() to filter the stream:

When we run the code, the results will pop out immediately:

————————————————–
1500538764749381636 None (None): RT @ClubSignage: Open Data and Why it is Necessary https://t.co/… #100Daysofcode #programming #CodeNewbie #python #reactjs #bugb…
————————————————–
1500538764174802950 None (None): RT @byLilyV: #FEATURED #COURSES

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on #machine #learning tutoria…
————————————————–
1500538768427782149 None (None): RT @ClubSignage: Let’s Find Out How to Make Engaging Videos? https://t.co/xWinlyQZpG #100Daysofcode #programming #CodeNewbie #python #rea…
————————————————–

If you take a close look at the output, you notice that the created_at and the author_id fields are empty. Remember, this is a V2 endpoint and as with search, we need to tell what fields we want to expand.

The field expansion happens as parameter to the filter() method and works the same way you would expand tweets in the V2 search endpoint:

With this change we now get additional data with the tweets:

————————————————–
1500540125603930117 2022-03-06 18:33:39+00:00 (1250164632968298497): RT @byLilyV: #FEATURED #COURSES
Complete #Python #Developer in 2021: Zero to Mastery
How to become a #Python3 Developer and get hired
Build…
————————————————–
1500540126438604800 2022-03-06 18:33:39+00:00 (1295715136141963267): RT @thisisgulshan: Introduction to #Python Ensembles

#BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #PyTorch #RStats #Te…
————————————————–
1500540149935058952 2022-03-06 18:33:45+00:00 (1260015280048222208): RT @gp_pulipaka: Best Python Books to Read for the Weekend. #BigData #Analytics #DataScience #IoT #IIoT #PyTorch #Python #RStats #TensorFlo…

 

Rules are stateful in V2

As a next step you probably try to search for a different term. You will notice that you still get results back for #Python, even when your application nowhere has such a rule. The reason for this is that rules have a state that is maintained by Twitter for your application. And those rules are combined with OR.

Therefore, before you can search for a different term, you need to remove the current rules. For some reasons unknown to me, I had to recreate the client after cleaning up the old rules:

We now can run the code and get results for #NumPy back:

1500541598735732738 2022-03-06 18:39:30+00:00 (1492915170166943745): RT @j_graber: #Python Friday #109: #Set Operations on Lists With #NumPy https://t.co/rkcVzQUMQt
————————————————–
1500541807234592769 2022-03-06 18:40:20+00:00 (1191676114369941505): I got 24 out of 25 points on the NumPy Quiz from W3Schools https://t.co/b7PcYu8QkR

#Pyton #NumPy #programming #SoftwareEngineer #pythondeveloper
————————————————–

 

Other problems with streaming search results in V2

I could not get the name of the user from the stream back before the on_tweet() method was called. You can overwrite the method on_includes() and in combination with expansions=”author_id” you get the users back, but too late. At this point the tweet is already printed out. While it is possible to add a queue and some more logic, it all feels like a hack just to get even with the V1 functionality.

But the biggest obstacle at the moment is the lack of documentation. I hope that this will change soon. Until then I suggest you use the V1 streaming client.

 

Next

Next week we take a look on how we can improve the experience on Twitter with blocking and muting troll accounts.

4 thoughts on “Python Friday #117: Streaming Search Results With Tweepy”

  1. This is exactly what I was looking for, really appreciate the time you took to try and make sense of v2. I was struggling for days, still can’t figure out how to stream from a specific user but I may well go back v1.

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.