Python Friday #117: Streaming Search Results With Tweepy

The number of tweets written on Twitter every second is massive. With the streaming clients we can get a glimpse on what is going on in near real-time. Let’s find out what Tweepy offers and how the V1 and V2 endpoints differ.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

Streaming with the V1 endpoint

The V1 endpoint for streaming is straightforward. You need to create a subclass of tweepy.Stream, override the method on_status(), initialise it with your keys and filter the stream by the keywords you are interested in:

import tweepy
import os
from dotenv import load_dotenv
load_dotenv()

consumer_key = os.getenv('api-key')
consumer_secret = os.getenv('api-key-secret')
access_token = os.getenv('access-token')
access_token_secret = os.getenv('access-token-secret')

# Subclass tweepy.Stream to print the tweets
class TweetPrinter(tweepy.Stream):

    def on_status(self, status):
        print('-' * 50)
        print("{id} {name} (@{screen}) at {time}:"
            .format(
                id=status.id,
                name=status.user.name,
                screen=status.user.screen_name,
                time=status.created_at))
        print(status.text)

# Initialize instance of the subclass
printer = TweetPrinter(
  consumer_key, consumer_secret,
  access_token, access_token_secret
)

# Filter real-time Tweets by keyword
printer.filter(track=["#Python"])

import tweepy

import os

from dotenv import load_dotenv

load_dotenv()

consumer_key = os.getenv('api-key')

consumer_secret = os.getenv('api-key-secret')

access_token = os.getenv('access-token')

access_token_secret = os.getenv('access-token-secret')

# Subclass tweepy.Stream to print the tweets

class TweetPrinter(tweepy.Stream):

def on_status(self, status):

print('-' * 50)

print("{id} {name} (@{screen}) at {time}:"

.format(

id=status.id,

name=status.user.name,

screen=status.user.screen_name,

time=status.created_at))

print(status.text)

# Initialize instance of the subclass

printer = TweetPrinter(

consumer_key, consumer_secret,

access_token, access_token_secret

)

# Filter real-time Tweets by keyword

printer.filter(track=["#Python"])

When you run the code, you get a near real-time output of the tweets containing the hashtag #Python:

————————————————–
1500537305265754115 glowee (@gloryokafor6) at 2022-03-06 18:22:26+00:00:
RT @KanezaDiane: #Infographic
Facts you need to know about AI
#ArtificialIntelligence #100DaysOfCode #AI #DataScience @Fisher85M #Python #B…
————————————————–
1500537361561792523 TensorFlow Bot (@bot_tensorflow) at 2022-03-06 18:22:40+00:00:
RT @byLilyV: #FEATURED #COURSES

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on #machine #learning tutoria…
————————————————–

Streaming with the V2 endpoint

The V2 endpoint has some significant differences that you need to know. Otherwise, you will spend hours in debugging and end up with nothing to show for.

We need to subclass tweepy.StreamingClient and instantiate it with our bearer token. That is a welcoming simplification compared to V1. Here we must override the method on_tweet() to print the tweets we receive.

We can filter the stream with a rule (of type StreamRule) in which we set our search terms in the value argument.

With everything in place, we can call the filter() to filter the stream:

import tweepy
from tweepy import StreamingClient, StreamRule
import os
from dotenv import load_dotenv
load_dotenv()

bearer_token = os.getenv('bearer-token')

class TweetPrinterV2(tweepy.StreamingClient):
    
    def on_tweet(self, tweet):
        print(f"{tweet.id} {tweet.created_at} ({tweet.author_id}): {tweet.text}")
        print("-"*50)

printer = TweetPrinterV2(bearer_token)

# add new rules    
rule = StreamRule(value="Python")
printer.add_rules(rule)

printer.filter()

import tweepy

from tweepy import StreamingClient, StreamRule

import os

from dotenv import load_dotenv

load_dotenv()

bearer_token = os.getenv('bearer-token')

class TweetPrinterV2(tweepy.StreamingClient):

def on_tweet(self, tweet):

print(f"{tweet.id} {tweet.created_at} ({tweet.author_id}): {tweet.text}")

print("-"*50)

printer = TweetPrinterV2(bearer_token)

# add new rules

rule = StreamRule(value="Python")

printer.add_rules(rule)

printer.filter()

When we run the code, the results will pop out immediately:

————————————————–
1500538764749381636 None (None): RT @ClubSignage: Open Data and Why it is Necessary https://t.co/… #100Daysofcode #programming #CodeNewbie #python #reactjs #bugb…
————————————————–
1500538764174802950 None (None): RT @byLilyV: #FEATURED #COURSES

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on #machine #learning tutoria…
————————————————–
1500538768427782149 None (None): RT @ClubSignage: Let’s Find Out How to Make Engaging Videos? https://t.co/xWinlyQZpG #100Daysofcode #programming #CodeNewbie #python #rea…
————————————————–

If you take a close look at the output, you notice that the created_at and the author_id fields are empty. Remember, this is a V2 endpoint and as with search, we need to tell what fields we want to expand.

The field expansion happens as parameter to the filter() method and works the same way you would expand tweets in the V2 search endpoint:

printer.filter(expansions="author_id",tweet_fields="created_at")

1	printer.filter(expansions="author_id",tweet_fields="created_at")

With this change we now get additional data with the tweets:

————————————————–
1500540125603930117 2022-03-06 18:33:39+00:00 (1250164632968298497): RT @byLilyV: #FEATURED #COURSES
Complete #Python #Developer in 2021: Zero to Mastery
How to become a #Python3 Developer and get hired
Build…
————————————————–
1500540126438604800 2022-03-06 18:33:39+00:00 (1295715136141963267): RT @thisisgulshan: Introduction to #Python Ensembles

#BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #PyTorch #RStats #Te…
————————————————–
1500540149935058952 2022-03-06 18:33:45+00:00 (1260015280048222208): RT @gp_pulipaka: Best Python Books to Read for the Weekend. #BigData #Analytics #DataScience #IoT #IIoT #PyTorch #Python #RStats #TensorFlo…

Rules are stateful in V2

As a next step you probably try to search for a different term. You will notice that you still get results back for #Python, even when your application nowhere has such a rule. The reason for this is that rules have a state that is maintained by Twitter for your application. And those rules are combined with OR.

Therefore, before you can search for a different term, you need to remove the current rules. For some reasons unknown to me, I had to recreate the client after cleaning up the old rules:

printer = TweetPrinterV2(bearer_token)

# clean-up pre-existing rules
rule_ids = []
result = printer.get_rules()
for rule in result.data:
    print(f"rule marked to delete: {rule.id} - {rule.value}")
    rule_ids.append(rule.id)

if(len(rule_ids) > 0):
    printer.delete_rules(rule_ids)
    printer = TweetPrinterV2(bearer_token)
else:
    print("no rules to delete")

# add new rules    
# rule = StreamRule(value="Python")
rule = StreamRule(value="NumPy")
printer.add_rules(rule)

printer.filter(expansions="author_id",tweet_fields="created_at")

printer = TweetPrinterV2(bearer_token)

# clean-up pre-existing rules

rule_ids = []

result = printer.get_rules()

for rule in result.data:

print(f"rule marked to delete: {rule.id} - {rule.value}")

rule_ids.append(rule.id)

if(len(rule_ids) > 0):

printer.delete_rules(rule_ids)

printer = TweetPrinterV2(bearer_token)

else:

print("no rules to delete")

# add new rules

# rule = StreamRule(value="Python")

rule = StreamRule(value="NumPy")

printer.add_rules(rule)

printer.filter(expansions="author_id",tweet_fields="created_at")

We now can run the code and get results for #NumPy back:

1500541598735732738 2022-03-06 18:39:30+00:00 (1492915170166943745): RT @j_graber: #Python Friday #109: #Set Operations on Lists With #NumPy https://t.co/rkcVzQUMQt
————————————————–
1500541807234592769 2022-03-06 18:40:20+00:00 (1191676114369941505): I got 24 out of 25 points on the NumPy Quiz from W3Schools https://t.co/b7PcYu8QkR

#Pyton #NumPy #programming #SoftwareEngineer #pythondeveloper
————————————————–

Python Friday #117: Streaming Search Results With Tweepy

Streaming with the V1 endpoint

Streaming with the V2 endpoint

Rules are stateful in V2

Other problems with streaming search results in V2

Next

Like this:

Related

4 thoughts on “Python Friday #117: Streaming Search Results With Tweepy”

Leave a Comment Cancel reply

Streaming with the V1 endpoint

Streaming with the V2 endpoint

Rules are stateful in V2

Other problems with streaming search results in V2

Next

Share this:

Like this:

Related

4 thoughts on “Python Friday #117: Streaming Search Results With Tweepy”

Leave a Comment Cancel reply