The Jetpack extension of WordPress.com uses a lot of JavaScript to work. With Selenium we do not have to bother about that and let our web browser deal with it. This allows us to grab the statistics Jetpack collects for this site.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
Preparation
Make sure that you have Selenium installed on your machine and that you use the download manager to get the right drivers.
If you want to run the examples, you must have your own WordPress blog and the access credentials for your user.
Keeping your password secret
I use a .env file to keep my password out of my code. I strongly suggest that you do the same. Make sure that you use the username and password of your WordPress.com user. That is especially important for self-hosted blogs. It is much simpler to access Jetpack if you have the right credentials.
In the .env file you need these three entries:
login_page=XYZ
user=ABC
password=PASSWORD
For login_page
you can use the statistic site you want to access. That way the authorisation will go to the right authentication provider.
Setup Selenium
We need a bit of setup code to get Selenium ready:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
from selenium.webdriver.firefox.service import Service from selenium import webdriver from webdriver_manager.firefox import GeckoDriverManager from selenium.webdriver.common.by import By from datetime import date, timedelta import time import os from dotenv import load_dotenv import logging def prepare_browser(): logging.getLogger('WDM').setLevel(logging.NOTSET) driver = webdriver.Firefox( service=Service(GeckoDriverManager().install())) driver.implicitly_wait(2) # 2 second return driver |
This method returns a browser instance that we will use throughout this script.
Login to WordPress.com
Before we can access our site statistics, we need to login to WordPress.com. For that we need to call the correct login page and fill out the login form:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
def login(driver): # go to statistik page to get correct redirect to login mask driver.get(os.getenv('login_page')) # fill username field username = driver.find_element( by=By.ID, value="usernameOrEmail") username.send_keys(os.getenv('user')) cont = driver.find_element( by=By.CLASS_NAME, value="form-button") cont.click() time.sleep(2) # fill password field password = driver.find_element( by=By.ID, value="password") password.send_keys(os.getenv('password')) submit = driver.find_element( by=By.CLASS_NAME, value="form-button") submit.click() # wait a moment to finish login time.sleep(2) # select correct site select_site = driver.find_element( by=By.LINK_TEXT, value="Improve & Repeat") select_site.click() |
The wait time is necessary because WordPress.com uses a two-part login form in which the password only appears after you entered the username. If you do not wait a bit, Selenium is too fast and will not find the password field.
If this succeeds, our browser got an authenticated session, and we can proceed. If it fails, set a breakpoint, and inspect the page to check if the input fields got different ID’s.
Iterate through the days
Depending on how detailed you want to have the statistics, you best use the daily statistics and iterate through the days. Be aware that Jetpack returns the statistics of today if we ask for a date in the future. To prevent us from creating rubbish data, we need to check that we stay within a valid date range:
1 2 3 4 5 6 7 |
def download_statistics(driver, start): end = date.today() stats_day = min(start, end) while stats_day < end: download_statistics_for_day(driver, stats_day) stats_day += timedelta(days=1) |
Today’s statistic can change until 23:59:59; therefore, we exclude the current day.
Download the statistics
We can find the URL for our blog statistics by clicking through WordPress.com. At the bottom of the page is a link with the text “Download data as CSV” that turns the displayed table into a CSV file:
In our script we then tell our browser to access that statistics page for a specific day, let the page load, scroll at the end of the page, and then click on the download link:
1 2 3 4 5 6 7 8 9 10 11 12 |
def download_statistics_for_day(driver, day): posts = f"https://wordpress.com/stats/day/posts/improveandrepeat.com?startDate={day}" driver.get(posts) time.sleep(2) driver.execute_script('window.scrollTo(0, document.body.scrollHeight);') download = driver.find_element( by=By.CLASS_NAME, value="stats-download-csv") download.click() |
Glue everything together
With all the parts are in place, all we need is this glue code to call them in the correct order:
1 2 3 4 5 6 |
if __name__ == '__main__': load_dotenv() driver = prepare_browser() login(driver) download_statistics(driver, date(2022, 10, 1)) driver.quit() |
If we run the script, it will go and download all daily statistics form 1. October 2022 until yesterday.
Next
As demonstrated with this script, we can use Selenium to automate a browser and do repetitive things with ease. It does not matter if it requires authentication or is full of JavaScript, Selenium allows us to access a web page the same way we can access it with a web browser. Next week we look at the missing parts to turn our browser automation into end-to-end tests in pytest.
1 thought on “Python Friday #146: Download Jetpack Statistics With Selenium”