Python Friday #155: Download Jetpack Statistics With Playwright

With the newly gained knowledge about Playwright, we have everything together to go and automate a browser to do repetitive tasks. Let’s figure out how Playwright differs from Selenium when it comes to download Jetpack statistics.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

Back to Python Friday post #146

This post revisits the topic I covered in Python Friday #146: Download Jetpack Statistics with Selenium. However, this time we use Playwright instead of Selenium.

Check the older post for the prerequisites and how to create the .env file with your credentials. If you followed along the solution for Selenium, you can use the same .env file.

Start with the codegen template

For this post we use the starting template of the test recorder codegen. If you get stuck anywhere, start codegen, click through the application and then copy the generated code into your application.

I only changed the parameter of the run() method to accept a start_date and set that date in the sync_playwright() block:

from playwright.sync_api import Playwright, sync_playwright, expect, Page
from datetime import date, timedelta
import time
import os
from dotenv import load_dotenv

def run(playwright: Playwright, start_day: date) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    login(page)
    download_statistics(page, start_day)

    # ---------------------
    context.close()
    browser.close()


with sync_playwright() as playwright:
    load_dotenv()
    run(playwright, date(2022, 12, 1))

from playwright.sync_api import Playwright, sync_playwright, expect, Page

from datetime import date, timedelta

import time

import os

from dotenv import load_dotenv

def run(playwright: Playwright, start_day: date) -> None:

browser = playwright.chromium.launch(headless=False)

context = browser.new_context()

page = context.new_page()

download_statistics(page, start_day)

# ---------------------

context.close()

browser.close()

with sync_playwright() as playwright:

load_dotenv()

run(playwright, date(2022, 12, 1))

Log in to WordPress.com

As with Selenium, we need to login with Playwright to WordPress.com. The login site uses a two-step form where the password field only appears after we entered the username. This transition can take time, which is the reason we add a few time.sleep() between the different parts:

def login(page: Page):
    # go to statistik page to get correct redirect to login mask
    page.goto(os.getenv('login_page'))

    # fill username field
    username = page.get_by_label("Email Address or Username")
    username.fill(os.getenv('user'))

    page.get_by_role("button", name="Continue").first.click()
    time.sleep(2)

    # fill password field
    password = page.get_by_label("Password")
    password.fill(os.getenv('password'))

    page.get_by_role("button", name="Log In").click()
    time.sleep(2)

    # select correct part
    page.get_by_role("link", name="Improve & Repeat").click()

def login(page: Page):

# go to statistik page to get correct redirect to login mask

page.goto(os.getenv('login_page'))

# fill username field

username = page.get_by_label("Email Address or Username")

username.fill(os.getenv('user'))

page.get_by_role("button", name="Continue").first.click()

time.sleep(2)

# fill password field

password = page.get_by_label("Password")

password.fill(os.getenv('password'))

page.get_by_role("button", name="Log In").click()

time.sleep(2)

# select correct part

page.get_by_role("link", name="Improve & Repeat").click()

The site selection at the end is something I need to do for the last few weeks. If you only have one blog, then you can remove this part.

Iterate through the days

Depending on how detailed you want to have the statistics, you best use the daily statistics and iterate through the days. Be aware that Jetpack returns the statistics of today if we ask for a date in the future. To prevent us from creating rubbish data, we need to check that we stay within a valid date range:

def download_statistics(page: Page, start: date):
    end = date.today()
    stats_day = min(start, end)
        
    while stats_day < end:
        download_statistics_for_day(page, stats_day)
        stats_day += timedelta(days=1)

def download_statistics(page: Page, start: date):

end = date.today()

stats_day = min(start, end)

while stats_day < end:

download_statistics_for_day(page, stats_day)

stats_day += timedelta(days=1)

Today’s statistic can change until 23:59:59; therefore, we exclude the current day.

Download the statistics

We can find the URL for our blog statistics by clicking through WordPress.com. At the bottom of the page is a link with the text “Download data as CSV” that turns the displayed table into a CSV file:

At the bottom of the page is the download link we are looking for.

In our script we then tell our browser to access that statistics page for a specific day, let the page load, scroll at the end of the page, and then click on the download link:

def download_statistics_for_day(page: Page, day: date):
    posts = f"https://wordpress.com/stats/day/posts/improveandrepeat.com?startDate={day}"
    print(posts)
    page.goto(posts)
    
    time.sleep(2)
     
    download = page.get_by_role("button", name="Download data as CSV")
    download.scroll_into_view_if_needed() 
    download.click()

def download_statistics_for_day(page: Page, day: date):

posts = f"https://wordpress.com/stats/day/posts/improveandrepeat.com?startDate={day}"

print(posts)

page.goto(posts)

time.sleep(2)

download = page.get_by_role("button", name="Download data as CSV")

download.scroll_into_view_if_needed()

download.click()

Comparison of Playwright and Selenium

Solving the same problem with two different frameworks gives us a good idea of how they differ and what is the same. The solution for Playwright needs around 20 lines less code, mostly because the imports are fewer.

I used labels to select the elements in Playwright, while I preferred the ID selectors in Selenium. You can use either way to select elements in both frameworks.

For me the biggest difference between the two frameworks made codegen. When I got stuck writing my test, I could use codegen and record my interactions. Copying the result to the application saved me a lot of time. Therefore, I think Playwright has an advantage over Selenium when it comes to automating tasks in a browser.

Next week we look how we can reuse Selenium Grid with Playwright and what we need to do to connect to BrowserStack.

Python Friday #155: Download Jetpack Statistics With Playwright

Back to Python Friday post #146

Start with the codegen template

Log in to WordPress.com

Iterate through the days

Download the statistics

Comparison of Playwright and Selenium

Next

Like this:

Related

Leave a Comment Cancel reply

Back to Python Friday post #146

Start with the codegen template

Log in to WordPress.com

Iterate through the days

Download the statistics

Comparison of Playwright and Selenium

Next

Share this:

Like this:

Related

Leave a Comment Cancel reply