Python Friday #145: Automate Browsers With Selenium (Part 2)

Last week we covered the basic parts to automate our browser with Selenium. If you try that with real web pages, you quickly notice that they are often not enough. In this post we look at few helpful tricks to make your Selenium automation more robust.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

Read out values inside elements

In the last post we called the find_elements() method directly on the browser. However, we can fetch an element and then run the find_elements() on that specific element:

results_div = driver.find_element(
    by=By.CLASS_NAME, 
    value="results")
results = results_div.find_elements(
    by=By.TAG_NAME, 
    value="h2")

for result in results:
    print(f"* {result.text}")

results_div = driver.find_element(

by=By.CLASS_NAME,

value="results")

results = results_div.find_elements(

by=By.TAG_NAME,

value="h2")

for result in results:

print(f"* {result.text}")

This code sample reads the results div from DuckDuckGo and then selects the H2 elements inside that div. When we run the code, we get something like this:

* Selenium
* WebDriver | Selenium
* Selenium WebDriver Tutorial – javatpoint
* Web elements | Selenium
* What is Selenium WebDriver Architecture? How Does it works? – TOOLSQA
* How to Use Selenium? | Complete Guide to Selenium WebDriver – EDUCBA
* Introduction to Web Scraping using Selenium – Medium
* Web Scraping with Selenium and Python – ScrapFly Blog
* Selenium – Webdriver – tutorialspoint.com
* Complete Selenium WebDriver Tutorial with Examples – LambdaTest

Especially in complex pages it is an immense help to use multiple steps and refine the selectors as you go to get to the desired result. This requires a bit more typing, but it makes your code more maintainable.

Scroll into position

As long as we only read elements, it does not matter if they are in the viewable part of the browser. That changes if we try to click them. If the element is not visible, it will often give us an error. Should you run into this problem, you can use this code to tell the browser to scroll down to that element:

more = driver.find_element(
    by=By.CLASS_NAME, 
    value="result--more")
driver.execute_script("arguments[0].scrollIntoView();", more)
more.click()

more = driver.find_element(

by=By.CLASS_NAME,

value="result--more")

driver.execute_script("arguments[0].scrollIntoView();", more)

more.click()

If our element is at the bottom of the page, we can use this approach to scroll at the end:

driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

1	driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

Wait on elements

The implicit wait time is a good general way to tell Selenium to wait a bit when we ask it for an element. We can do that directly on the browser instance and specify how many seconds it should wait:

driver.implicitly_wait(1) # 1 second

1	driver.implicitly_wait(1) # 1 second

However, there are cases we know that they will take a bit longer to render. To wait only on them the extra time and not on everything, we can use this code to wait:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "mySlowElement"))
    )
except TimeoutException as error:
    print(f"{type(error)}: {error}")

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.common.exceptions import TimeoutException

try:

element = WebDriverWait(driver, 10).until(

EC.presence_of_element_located((By.ID, "mySlowElement"))

)

except TimeoutException as error:

print(f"{type(error)}: {error}")

Our element mySlowElement has now 10 seconds to show up before Selenium throws a TimeoutException exception:

<class ‘selenium.common.exceptions.TimeoutException’>: Message:
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.jsm:12:1
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:192:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.jsm:404:5
element.find/

Those little tricks may help you to create a much more robust browser automation. Well-structured web pages may not need those extra steps, but most often the best data is on pages that are a mess. Next week we download website statistics from WordPress.com and circumvent all the JavaScript that the Jetpack extension uses.

2 thoughts on “Python Friday #145: Automate Browsers With Selenium (Part 2)”

nono

2022-10-24 at 13:50

Have you tried playwright? that seem to be the project that will replace puppeteer.
- JGraber
  
  2022-10-24 at 16:06
  
  Hi Nono,
  Yes, Playwright is interesting and next week I start a series of blog posts on it.
  
  Regards,
  Johnny

Python Friday #145: Automate Browsers With Selenium (Part 2)

Read out values inside elements

Scroll into position

Wait on elements

Next

Like this:

Related

2 thoughts on “Python Friday #145: Automate Browsers With Selenium (Part 2)”

Leave a Comment Cancel reply

Read out values inside elements

Scroll into position

Wait on elements

Next

Share this:

Like this:

Related

2 thoughts on “Python Friday #145: Automate Browsers With Selenium (Part 2)”

Leave a Comment Cancel reply