The number of tweets written on Twitter every second is massive. With the streaming clients we can get a glimpse on what is going on in near real-time. Let’s find out what Tweepy offers and how the V1 and V2 endpoints differ.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
Streaming with the V1 endpoint
The V1 endpoint for streaming is straightforward. You need to create a subclass of tweepy.Stream, override the method on_status(), initialise it with your keys and filter the stream by the keywords you are interested in:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
import tweepy import os from dotenv import load_dotenv load_dotenv() consumer_key = os.getenv('api-key') consumer_secret = os.getenv('api-key-secret') access_token = os.getenv('access-token') access_token_secret = os.getenv('access-token-secret') # Subclass tweepy.Stream to print the tweets class TweetPrinter(tweepy.Stream): def on_status(self, status): print('-' * 50) print("{id} {name} (@{screen}) at {time}:" .format( id=status.id, name=status.user.name, screen=status.user.screen_name, time=status.created_at)) print(status.text) # Initialize instance of the subclass printer = TweetPrinter( consumer_key, consumer_secret, access_token, access_token_secret ) # Filter real-time Tweets by keyword printer.filter(track=["#Python"]) |
When you run the code, you get a near real-time output of the tweets containing the hashtag #Python:
————————————————–
1500537305265754115 glowee (@gloryokafor6) at 2022-03-06 18:22:26+00:00:
RT @KanezaDiane: #Infographic
Facts you need to know about AI
#ArtificialIntelligence #100DaysOfCode #AI #DataScience @Fisher85M #Python #B…
————————————————–
1500537361561792523 TensorFlow Bot (@bot_tensorflow) at 2022-03-06 18:22:40+00:00:
RT @byLilyV: #FEATURED #COURSESMachine Learning, Data Science and Deep Learning with Python
Complete hands-on #machine #learning tutoria…
————————————————–
Streaming with the V2 endpoint
The V2 endpoint has some significant differences that you need to know. Otherwise, you will spend hours in debugging and end up with nothing to show for.
We need to subclass tweepy.StreamingClient and instantiate it with our bearer token. That is a welcoming simplification compared to V1. Here we must override the method on_tweet() to print the tweets we receive.
We can filter the stream with a rule (of type StreamRule) in which we set our search terms in the value argument.
With everything in place, we can call the filter() to filter the stream:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import tweepy from tweepy import StreamingClient, StreamRule import os from dotenv import load_dotenv load_dotenv() bearer_token = os.getenv('bearer-token') class TweetPrinterV2(tweepy.StreamingClient): def on_tweet(self, tweet): print(f"{tweet.id} {tweet.created_at} ({tweet.author_id}): {tweet.text}") print("-"*50) printer = TweetPrinterV2(bearer_token) # add new rules rule = StreamRule(value="Python") printer.add_rules(rule) printer.filter() |
When we run the code, the results will pop out immediately:
————————————————–
1500538764749381636 None (None): RT @ClubSignage: Open Data and Why it is Necessary https://t.co/… #100Daysofcode #programming #CodeNewbie #python #reactjs #bugb…
————————————————–
1500538764174802950 None (None): RT @byLilyV: #FEATURED #COURSESMachine Learning, Data Science and Deep Learning with Python
Complete hands-on #machine #learning tutoria…
————————————————–
1500538768427782149 None (None): RT @ClubSignage: Let’s Find Out How to Make Engaging Videos? https://t.co/xWinlyQZpG #100Daysofcode #programming #CodeNewbie #python #rea…
————————————————–
If you take a close look at the output, you notice that the created_at and the author_id fields are empty. Remember, this is a V2 endpoint and as with search, we need to tell what fields we want to expand.
The field expansion happens as parameter to the filter() method and works the same way you would expand tweets in the V2 search endpoint:
1 |
printer.filter(expansions="author_id",tweet_fields="created_at") |
With this change we now get additional data with the tweets:
————————————————–
1500540125603930117 2022-03-06 18:33:39+00:00 (1250164632968298497): RT @byLilyV: #FEATURED #COURSES
Complete #Python #Developer in 2021: Zero to Mastery
How to become a #Python3 Developer and get hired
Build…
————————————————–
1500540126438604800 2022-03-06 18:33:39+00:00 (1295715136141963267): RT @thisisgulshan: Introduction to #Python Ensembles#BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #PyTorch #RStats #Te…
————————————————–
1500540149935058952 2022-03-06 18:33:45+00:00 (1260015280048222208): RT @gp_pulipaka: Best Python Books to Read for the Weekend. #BigData #Analytics #DataScience #IoT #IIoT #PyTorch #Python #RStats #TensorFlo…
Rules are stateful in V2
As a next step you probably try to search for a different term. You will notice that you still get results back for #Python, even when your application nowhere has such a rule. The reason for this is that rules have a state that is maintained by Twitter for your application. And those rules are combined with OR.
Therefore, before you can search for a different term, you need to remove the current rules. For some reasons unknown to me, I had to recreate the client after cleaning up the old rules:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
printer = TweetPrinterV2(bearer_token) # clean-up pre-existing rules rule_ids = [] result = printer.get_rules() for rule in result.data: print(f"rule marked to delete: {rule.id} - {rule.value}") rule_ids.append(rule.id) if(len(rule_ids) > 0): printer.delete_rules(rule_ids) printer = TweetPrinterV2(bearer_token) else: print("no rules to delete") # add new rules # rule = StreamRule(value="Python") rule = StreamRule(value="NumPy") printer.add_rules(rule) printer.filter(expansions="author_id",tweet_fields="created_at") |
We now can run the code and get results for #NumPy back:
1500541598735732738 2022-03-06 18:39:30+00:00 (1492915170166943745): RT @j_graber: #Python Friday #109: #Set Operations on Lists With #NumPy https://t.co/rkcVzQUMQt
————————————————–
1500541807234592769 2022-03-06 18:40:20+00:00 (1191676114369941505): I got 24 out of 25 points on the NumPy Quiz from W3Schools https://t.co/b7PcYu8QkR#Pyton #NumPy #programming #SoftwareEngineer #pythondeveloper
————————————————–
Other problems with streaming search results in V2
I could not get the name of the user from the stream back before the on_tweet() method was called. You can overwrite the method on_includes() and in combination with expansions=”author_id” you get the users back, but too late. At this point the tweet is already printed out. While it is possible to add a queue and some more logic, it all feels like a hack just to get even with the V1 functionality.
But the biggest obstacle at the moment is the lack of documentation. I hope that this will change soon. Until then I suggest you use the V1 streaming client.
Next
Next week we take a look on how we can improve the experience on Twitter with blocking and muting troll accounts.
This is exactly what I was looking for, really appreciate the time you took to try and make sense of v2. I was struggling for days, still can’t figure out how to stream from a specific user but I may well go back v1.
you can get username by the following
def on_data(self, raw_data):
ref: https://github.com/tweepy/tweepy/issues/1884
(Big F to twitter streaming apiv2, took me 1 day to figure out…)
Thank you very much, really helpful! 🙂