Python Friday #146: Download Jetpack Statistics With Selenium

The Jetpack extension of WordPress.com uses a lot of JavaScript to work. With Selenium we do not have to bother about that and let our web browser deal with it. This allows us to grab the statistics Jetpack collects for this site.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

 

Preparation

Make sure that you have Selenium installed on your machine and that you use the download manager to get the right drivers.

If you want to run the examples, you must have your own WordPress blog and the access credentials for your user.

 

Keeping your password secret

I use a .env file to keep my password out of my code. I strongly suggest that you do the same. Make sure that you use the username and password of your WordPress.com user. That is especially important for self-hosted blogs. It is much simpler to access Jetpack if you have the right credentials.

In the .env file you need these three entries:

login_page=XYZ
user=ABC
password=PASSWORD

For login_page you can use the statistic site you want to access. That way the authorisation will go to the right authentication provider.

 

Setup Selenium

We need a bit of setup code to get Selenium ready:

This method returns a browser instance that we will use throughout this script.

 

Login to WordPress.com

Before we can access our site statistics, we need to login to WordPress.com. For that we need to call the correct login page and fill out the login form:

The wait time is necessary because WordPress.com uses a two-part login form in which the password only appears after you entered the username. If you do not wait a bit, Selenium is too fast and will not find the password field.

If this succeeds, our browser got an authenticated session, and we can proceed. If it fails, set a breakpoint, and inspect the page to check if the input fields got different ID’s.

 

Iterate through the days

Depending on how detailed you want to have the statistics, you best use the daily statistics and iterate through the days. Be aware that Jetpack returns the statistics of today if we ask for a date in the future. To prevent us from creating rubbish data, we need to check that we stay within a valid date range:

Today’s statistic can change until 23:59:59; therefore, we exclude the current day.

 

Download the statistics

We can find the URL for our blog statistics by clicking through WordPress.com. At the bottom of the page is a link with the text “Download data as CSV” that turns the displayed table into a CSV file:

At the bottom of the page is the download link we are looking for.

In our script we then tell our browser to access that statistics page for a specific day, let the page load, scroll at the end of the page, and then click on the download link:

 

Glue everything together

With all the parts are in place, all we need is this glue code to call them in the correct order:

If we run the script, it will go and download all daily statistics form 1. October 2022 until yesterday.

 

Next

As demonstrated with this script, we can use Selenium to automate a browser and do repetitive things with ease. It does not matter if it requires authentication or is full of JavaScript, Selenium allows us to access a web page the same way we can access it with a web browser. Next week we look at the missing parts to turn our browser automation into end-to-end tests in pytest.

1 thought on “Python Friday #146: Download Jetpack Statistics With Selenium”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.