Python Friday #146: Download Jetpack Statistics With Selenium

The Jetpack extension of WordPress.com uses a lot of JavaScript to work. With Selenium we do not have to bother about that and let our web browser deal with it. This allows us to grab the statistics Jetpack collects for this site.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

Preparation

Make sure that you have Selenium installed on your machine and that you use the download manager to get the right drivers.

If you want to run the examples, you must have your own WordPress blog and the access credentials for your user.

Keeping your password secret

I use a .env file to keep my password out of my code. I strongly suggest that you do the same. Make sure that you use the username and password of your WordPress.com user. That is especially important for self-hosted blogs. It is much simpler to access Jetpack if you have the right credentials.

In the .env file you need these three entries:

login_page=XYZ
user=ABC
password=PASSWORD

For login_page you can use the statistic site you want to access. That way the authorisation will go to the right authentication provider.

Setup Selenium

We need a bit of setup code to get Selenium ready:

from selenium.webdriver.firefox.service import Service
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.common.by import By
from datetime import date, timedelta
import time
import os
from dotenv import load_dotenv
import logging


def prepare_browser():
    logging.getLogger('WDM').setLevel(logging.NOTSET)
    driver = webdriver.Firefox(
        service=Service(GeckoDriverManager().install()))
    driver.implicitly_wait(2) # 2 second
    return driver

from selenium.webdriver.firefox.service import Service

from selenium import webdriver

from webdriver_manager.firefox import GeckoDriverManager

from selenium.webdriver.common.by import By

from datetime import date, timedelta

import time

import os

from dotenv import load_dotenv

import logging

def prepare_browser():

logging.getLogger('WDM').setLevel(logging.NOTSET)

driver = webdriver.Firefox(

service=Service(GeckoDriverManager().install()))

driver.implicitly_wait(2) # 2 second

return driver

This method returns a browser instance that we will use throughout this script.

Login to WordPress.com

Before we can access our site statistics, we need to login to WordPress.com. For that we need to call the correct login page and fill out the login form:

def login(driver):
    # go to statistik page to get correct redirect to login mask
    driver.get(os.getenv('login_page'))

    # fill username field
    username = driver.find_element(
        by=By.ID, 
        value="usernameOrEmail")
    username.send_keys(os.getenv('user'))
    cont = driver.find_element(
        by=By.CLASS_NAME, 
        value="form-button")
    cont.click()
    
    time.sleep(2)

    # fill password field
    password = driver.find_element(
        by=By.ID, 
        value="password")
    password.send_keys(os.getenv('password'))
    submit = driver.find_element(
        by=By.CLASS_NAME, 
        value="form-button")
    submit.click()

    # wait a moment to finish login
    time.sleep(2)

    # select correct site
    select_site = driver.find_element(
        by=By.LINK_TEXT,
        value="Improve & Repeat")
    select_site.click()

def login(driver):

# go to statistik page to get correct redirect to login mask

driver.get(os.getenv('login_page'))

# fill username field

username = driver.find_element(

by=By.ID,

value="usernameOrEmail")

username.send_keys(os.getenv('user'))

cont = driver.find_element(

by=By.CLASS_NAME,

value="form-button")

cont.click()

time.sleep(2)

# fill password field

password = driver.find_element(

by=By.ID,

value="password")

password.send_keys(os.getenv('password'))

submit = driver.find_element(

by=By.CLASS_NAME,

value="form-button")

submit.click()

# wait a moment to finish login

time.sleep(2)

# select correct site

select_site = driver.find_element(

by=By.LINK_TEXT,

value="Improve & Repeat")

select_site.click()

The wait time is necessary because WordPress.com uses a two-part login form in which the password only appears after you entered the username. If you do not wait a bit, Selenium is too fast and will not find the password field.

If this succeeds, our browser got an authenticated session, and we can proceed. If it fails, set a breakpoint, and inspect the page to check if the input fields got different ID’s.

Iterate through the days

Depending on how detailed you want to have the statistics, you best use the daily statistics and iterate through the days. Be aware that Jetpack returns the statistics of today if we ask for a date in the future. To prevent us from creating rubbish data, we need to check that we stay within a valid date range:

def download_statistics(driver, start):
    end = date.today()
    stats_day = min(start, end)
        
    while stats_day < end:
        download_statistics_for_day(driver, stats_day)
        stats_day += timedelta(days=1)

def download_statistics(driver, start):

end = date.today()

stats_day = min(start, end)

while stats_day < end:

download_statistics_for_day(driver, stats_day)

stats_day += timedelta(days=1)

Today’s statistic can change until 23:59:59; therefore, we exclude the current day.

Download the statistics

We can find the URL for our blog statistics by clicking through WordPress.com. At the bottom of the page is a link with the text “Download data as CSV” that turns the displayed table into a CSV file:

At the bottom of the page is the download link we are looking for.

In our script we then tell our browser to access that statistics page for a specific day, let the page load, scroll at the end of the page, and then click on the download link:

def download_statistics_for_day(driver, day):
    posts = f"https://wordpress.com/stats/day/posts/improveandrepeat.com?startDate={day}"
    driver.get(posts)

    time.sleep(2)
    
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    
    download = driver.find_element(
        by=By.CLASS_NAME, 
        value="stats-download-csv")
    download.click()

def download_statistics_for_day(driver, day):

posts = f"https://wordpress.com/stats/day/posts/improveandrepeat.com?startDate={day}"

driver.get(posts)

time.sleep(2)

driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

download = driver.find_element(

by=By.CLASS_NAME,

value="stats-download-csv")

download.click()

Glue everything together

With all the parts are in place, all we need is this glue code to call them in the correct order:

if __name__ == '__main__':
    load_dotenv()
    driver = prepare_browser()
    login(driver)
    download_statistics(driver, date(2022, 10, 1))
    driver.quit()

if __name__ == '__main__':

load_dotenv()

driver = prepare_browser()

download_statistics(driver, date(2022, 10, 1))

driver.quit()

If we run the script, it will go and download all daily statistics form 1. October 2022 until yesterday.

As demonstrated with this script, we can use Selenium to automate a browser and do repetitive things with ease. It does not matter if it requires authentication or is full of JavaScript, Selenium allows us to access a web page the same way we can access it with a web browser. Next week we look at the missing parts to turn our browser automation into end-to-end tests in pytest.

Python Friday #146: Download Jetpack Statistics With Selenium

Preparation

Keeping your password secret

Setup Selenium

Login to WordPress.com

Iterate through the days

Download the statistics

Glue everything together

Next

Like this:

Related

1 thought on “Python Friday #146: Download Jetpack Statistics With Selenium”

Leave a Comment Cancel reply

Preparation

Keeping your password secret

Setup Selenium

Login to WordPress.com

Iterate through the days

Download the statistics

Glue everything together

Next

Share this:

Like this:

Related

1 thought on “Python Friday #146: Download Jetpack Statistics With Selenium”

Leave a Comment Cancel reply