Python Friday #145: Automate Browsers With Selenium (Part 2)

Last week we covered the basic parts to automate our browser with Selenium. If you try that with real web pages, you quickly notice that they are often not enough. In this post we look at few helpful tricks to make your Selenium automation more robust.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

 

Read out values inside elements

In the last post we called the find_elements() method directly on the browser. However, we can fetch an element and then run the find_elements() on that specific element:

This code sample reads the results div from DuckDuckGo and then selects the H2 elements inside that div. When we run the code, we get something like this:

* Selenium
* WebDriver | Selenium
* Selenium WebDriver Tutorial – javatpoint
* Web elements | Selenium
* What is Selenium WebDriver Architecture? How Does it works? – TOOLSQA
* How to Use Selenium? | Complete Guide to Selenium WebDriver – EDUCBA
* Introduction to Web Scraping using Selenium – Medium
* Web Scraping with Selenium and Python – ScrapFly Blog
* Selenium – Webdriver – tutorialspoint.com
* Complete Selenium WebDriver Tutorial with Examples – LambdaTest

Especially in complex pages it is an immense help to use multiple steps and refine the selectors as you go to get to the desired result. This requires a bit more typing, but it makes your code more maintainable.

 

Scroll into position

As long as we only read elements, it does not matter if they are in the viewable part of the browser. That changes if we try to click them. If the element is not visible, it will often give us an error. Should you run into this problem, you can use this code to tell the browser to scroll down to that element:

If our element is at the bottom of the page, we can use this approach to scroll at the end:

 

Wait on elements

The implicit wait time is a good general way to tell Selenium to wait a bit when we ask it for an element. We can do that directly on the browser instance and specify how many seconds it should wait:

However, there are cases we know that they will take a bit longer to render. To wait only on them the extra time and not on everything, we can use this code to wait:

Our element mySlowElement has now 10 seconds to show up before Selenium throws a TimeoutException exception:

<class ‘selenium.common.exceptions.TimeoutException’>: Message:
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.jsm:12:1
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:192:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.jsm:404:5
element.find/

 

Next

Those little tricks may help you to create a much more robust browser automation. Well-structured web pages may not need those extra steps, but most often the best data is on pages that are a mess. Next week we download website statistics from WordPress.com and circumvent all the JavaScript that the Jetpack extension uses.

2 thoughts on “Python Friday #145: Automate Browsers With Selenium (Part 2)”

    • Hi Nono,
      Yes, Playwright is interesting and next week I start a series of blog posts on it.

      Regards,
      Johnny

      Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.