Python Friday #85: Eager Loading vs Lazy Loading in SQLAlchemy ORM

SQLAlchemy ORM does a lot of magic in the background when we ask it to fetch data from our database. It is time to take a closer look how SQLAlchemy loads our data and what can go wrong if we are not careful.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

Pre-Requisites

The examples in this post build on top of the bookstore example from the post Relationship Patterns in SQLAlchemy ORM.

To have enough data to see the difference between eager and lazy loading, I use this method to create the test data:

def create_publisher_with_3_books():
    session = db_session.factory()
    manning = create_publisher("Manning Publications")
    session.add(manning)
    
    bookA = create_book("The Mikado Method", "9781617291210", 245, manning)
    session.add(bookA)
    
    bookB = create_book("Specification by Example", "9781617290084", 249, manning)
    session.add(bookB)
    
    bookC = create_book("Python Workout", "9781617295508", 248, manning)
    session.add(bookC)
    
    authorA = create_author("Ola", "Ellnestam")
    session.add(authorA)
    
    authorB = create_author("Daniel", "Brolund")
    session.add(authorB)
    
    authorC = create_author("Gojko", "Adzic")
    session.add(authorC)
    
    authorD = create_author("Reuven M.", "Lerner")
    session.add(authorD)
    
    bookA.authors.append(authorA)
    bookA.authors.append(authorB)
    bookB.authors.append(authorC)
    bookC.authors.append(authorD)
    
    session.commit()
    return manning.id

def create_publisher_with_3_books():

session = db_session.factory()

manning = create_publisher("Manning Publications")

session.add(manning)

bookA = create_book("The Mikado Method", "9781617291210", 245, manning)

session.add(bookA)

bookB = create_book("Specification by Example", "9781617290084", 249, manning)

session.add(bookB)

bookC = create_book("Python Workout", "9781617295508", 248, manning)

session.add(bookC)

authorA = create_author("Ola", "Ellnestam")

session.add(authorA)

authorB = create_author("Daniel", "Brolund")

session.add(authorB)

authorC = create_author("Gojko", "Adzic")

session.add(authorC)

authorD = create_author("Reuven M.", "Lerner")

session.add(authorD)

bookA.authors.append(authorA)

bookA.authors.append(authorB)

bookB.authors.append(authorC)

bookC.authors.append(authorD)

session.commit()

return manning.id

The setup_db() and the create_**() methods are the same as in the last post.

Lazy loading by default

If we do not tell otherwise, SQLAlchemy uses lazy loading and makes a request to the database whenever we access a property it not already has loaded from the database. As long as we keep the session open until we finish our work with the data, everything works:

def lazy_loading_works_while_session_open(publisher_id):
    session = db_session.factory()
    publisher = session.query(Publisher).\
        filter(Publisher.id == publisher_id).\
        first()
        
    print(f"{publisher}:")
    for book in publisher.books:
        print(f"\t {book}:")
        for author in book.authors:
            print(f"\t\t {author}")

    session.close()

def lazy_loading_works_while_session_open(publisher_id):

session = db_session.factory()

publisher = session.query(Publisher).\

filter(Publisher.id == publisher_id).\

first()

print(f"{publisher}:")

for book in publisher.books:

print(f"\t {book}:")

for author in book.authors:

print(f"\t\t {author}")

session.close()

<Publisher 42 (Manning Publications)>:
<Book 84 (The Mikado Method 9781617291210) 245>:
<Author 85 (Ola Ellnestam)>
<Author 86 (Daniel Brolund)>
<Book 85 (Specification by Example 9781617290084) 249>:
<Author 87 (Gojko Adzic)>
<Book 86 (Python Workout 9781617295508) 248>:
<Author 88 (Reuven M. Lerner)>

SELECT n+1

The code example above looks innocent, but that changes when you set echo=True in the db_session.py. To show us the output, SQLAlchemy ORM creates these queries in the background:

1x SELECT for the publisher
1x SELECT for the books published by the publisher
3x SELECT for the authors of each book

Without noticing it, we end up in the infamous SELECT n+1 problem. Each row we have asks the database for more data. The devious thing about this problem is that we do not notice it during development – we do not have enough data. As soon as we test this with real data, our application becomes unbearable slow. And then it is too late.

We can force the issue and get an error when we close the session as soon as we get our publisher back. In this case the lazy loading is no longer possible and SQLAlchemy throws an exception:

def lazy_loading_throws_error(publisher_id):
    session = db_session.factory()
    publisher = session.query(Publisher).\
        filter(Publisher.id == publisher_id).\
        first()
    session.close()
    
    print(f"{publisher}:")
    for book in publisher.books:
        print(f"\t {book}:")
        for author in book.authors:
            print(f"\t\t {author}")

def lazy_loading_throws_error(publisher_id):

session = db_session.factory()

publisher = session.query(Publisher).\

filter(Publisher.id == publisher_id).\

first()

session.close()

print(f"{publisher}:")

for book in publisher.books:

print(f"\t {book}:")

for author in book.authors:

print(f"\t\t {author}")

<Publisher 42 (Manning Publications)>:
: Parent instance <Publisher at 0x2b7fa82eca0> is not bound to a Session; lazy load operation of attribute ‘books’ cannot proceed (Background on this error at: http://sqlalche.me/e/14/bhk3)

Eager loading by request

We can fix the SELECT n+1 problem by loading all the data we need in one go. Even better, we can do that when we create the query and do not need to change our mapping. This allows us to get exactly the data back that we need.

def eager_loading_joinedload(publisher_id):
    session = db_session.factory()
    publisher = session.query(Publisher).\
        options(joinedload("books").\
        joinedload("authors")).\
        filter(Publisher.id == publisher_id).\
        first()
    session.close()
    
    print(f"{publisher}:")
    for book in publisher.books:
        print(f"\t {book}:")
        for author in book.authors:
            print(f"\t\t {author}")

def eager_loading_joinedload(publisher_id):

session = db_session.factory()

publisher = session.query(Publisher).\

options(joinedload("books").\

joinedload("authors")).\

filter(Publisher.id == publisher_id).\

first()

session.close()

print(f"{publisher}:")

for book in publisher.books:

print(f"\t {book}:")

for author in book.authors:

print(f"\t\t {author}")

The joinedload() option uses JOIN in the generated SQL to fetch the data in a single SELECT statement:

BEGIN (implicit)
SELECT anon_1."Publisher_Id" AS "anon_1_Publisher_Id", anon_1."Publisher_Name" AS "anon_1_Publisher_Name", "Book_1"."Id" AS "Book_1_Id", "Book_1"."Title" AS "Book_1_Title", "Book_1"."ISBN" AS "Book_1_ISBN", "Book_1"."Pages" AS "Book_1_Pages", "Book_1"."PublishedBy" AS "Book_1_PublishedBy", "Author_1"."Id" AS "Author_1_Id", "Author_1"."FirstName" AS "Author_1_FirstName", "Author_1"."LastName" AS "Author_1_LastName"
FROM (SELECT "Publisher"."Id" AS "Publisher_Id", "Publisher"."Name" AS "Publisher_Name"
FROM "Publisher"
WHERE "Publisher"."Id" = ?
 LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN "Book" AS "Book_1" ON anon_1."Publisher_Id" = "Book_1"."PublishedBy" LEFT OUTER JOIN ("BookAuthor" AS "BookAuthor_1" JOIN "Author" AS "Author_1" ON "Author_1"."Id" = "BookAuthor_1"."AuthorId") ON "Book_1"."Id" = "BookAuthor_1"."BookId"
[generated in 0.00162s] (42, 1, 0)
ROLLBACK

BEGIN (implicit)

SELECT anon_1."Publisher_Id" AS "anon_1_Publisher_Id", anon_1."Publisher_Name" AS "anon_1_Publisher_Name", "Book_1"."Id" AS "Book_1_Id", "Book_1"."Title" AS "Book_1_Title", "Book_1"."ISBN" AS "Book_1_ISBN", "Book_1"."Pages" AS "Book_1_Pages", "Book_1"."PublishedBy" AS "Book_1_PublishedBy", "Author_1"."Id" AS "Author_1_Id", "Author_1"."FirstName" AS "Author_1_FirstName", "Author_1"."LastName" AS "Author_1_LastName"

FROM (SELECT "Publisher"."Id" AS "Publisher_Id", "Publisher"."Name" AS "Publisher_Name"

FROM "Publisher"

WHERE "Publisher"."Id" = ?

LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN "Book" AS "Book_1" ON anon_1."Publisher_Id" = "Book_1"."PublishedBy" LEFT OUTER JOIN ("BookAuthor" AS "BookAuthor_1" JOIN "Author" AS "Author_1" ON "Author_1"."Id" = "BookAuthor_1"."AuthorId") ON "Book_1"."Id" = "BookAuthor_1"."BookId"

[generated in 0.00162s] (42, 1, 0)

ROLLBACK

<Publisher 42 (Manning Publications)>:
<Book 84 (The Mikado Method 9781617291210) 245>:
<Author 85 (Ola Ellnestam)>
<Author 86 (Daniel Brolund)>
<Book 85 (Specification by Example 9781617290084) 249>:
<Author 87 (Gojko Adzic)>
<Book 86 (Python Workout 9781617295508) 248>:
<Author 88 (Reuven M. Lerner)>

There is a second option called selectinload() that we can use to optimize the database access. This options creates an IN statement in SQL and fetches the data with one query per table:

def eager_loading_selectinload(publisher_id):
    session = db_session.factory()
    publisher = session.query(Publisher).\
        options(selectinload("books").\
        selectinload("authors")).\
        filter(Publisher.id == publisher_id).\
        first()
    session.close()
    
    print(f"{publisher}:")
    for book in publisher.books:
        print(f"\t {book}:")
        for author in book.authors:
            print(f"\t\t {author}")

def eager_loading_selectinload(publisher_id):

session = db_session.factory()

publisher = session.query(Publisher).\

options(selectinload("books").\

selectinload("authors")).\

filter(Publisher.id == publisher_id).\

first()

session.close()

print(f"{publisher}:")

for book in publisher.books:

print(f"\t {book}:")

for author in book.authors:

print(f"\t\t {author}")

BEGIN (implicit)
SELECT "Publisher"."Id" AS "Publisher_Id", "Publisher"."Name" AS "Publisher_Name"
FROM "Publisher"
WHERE "Publisher"."Id" = ?
 LIMIT ? OFFSET ?
[generated in 0.00074s] (42, 1, 0)
SELECT "Book"."PublishedBy" AS "Book_PublishedBy", "Book"."Id" AS "Book_Id", "Book"."Title" AS "Book_Title", "Book"."ISBN" AS "Book_ISBN", "Book"."Pages" AS "Book_Pages" 
FROM "Book"
WHERE "Book"."PublishedBy" IN (?)
[generated in 0.00106s] (42,)
SELECT "Book_1"."Id" AS "Book_1_Id", "Author"."Id" AS "Author_Id", "Author"."FirstName" AS "Author_FirstName", "Author"."LastName" AS "Author_LastName"
FROM "Book" AS "Book_1" JOIN "BookAuthor" AS "BookAuthor_1" ON "Book_1"."Id" = "BookAuthor_1"."BookId" JOIN "Author" ON "Author"."Id" = "BookAuthor_1"."AuthorId"
WHERE "Book_1"."Id" IN (?, ?, ?)
[generated in 0.00073s] (84, 85, 86)
ROLLBACK

BEGIN (implicit)

SELECT "Publisher"."Id" AS "Publisher_Id", "Publisher"."Name" AS "Publisher_Name"

FROM "Publisher"

WHERE "Publisher"."Id" = ?

LIMIT ? OFFSET ?

[generated in 0.00074s] (42, 1, 0)

SELECT "Book"."PublishedBy" AS "Book_PublishedBy", "Book"."Id" AS "Book_Id", "Book"."Title" AS "Book_Title", "Book"."ISBN" AS "Book_ISBN", "Book"."Pages" AS "Book_Pages"

FROM "Book"

WHERE "Book"."PublishedBy" IN (?)

[generated in 0.00106s] (42,)

SELECT "Book_1"."Id" AS "Book_1_Id", "Author"."Id" AS "Author_Id", "Author"."FirstName" AS "Author_FirstName", "Author"."LastName" AS "Author_LastName"

FROM "Book" AS "Book_1" JOIN "BookAuthor" AS "BookAuthor_1" ON "Book_1"."Id" = "BookAuthor_1"."BookId" JOIN "Author" ON "Author"."Id" = "BookAuthor_1"."AuthorId"

WHERE "Book_1"."Id" IN (?, ?, ?)

[generated in 0.00073s] (84, 85, 86)

ROLLBACK

<Publisher 42 (Manning Publications)>:
<Book 84 (The Mikado Method 9781617291210) 245>:
<Author 85 (Ola Ellnestam)>
<Author 86 (Daniel Brolund)>
<Book 85 (Specification by Example 9781617290084) 249>:
<Author 87 (Gojko Adzic)>
<Book 86 (Python Workout 9781617295508) 248>:
<Author 88 (Reuven M. Lerner)>

Which method should you use? In the SQLAlchemy documentation we find this advice:

selectinload() tends to be more appropriate for loading related collections while joinedload() tends to be better suited for many-to-one relationships, due to the fact that only one row is loaded for both the lead and the related object.

Take this as a starting point and then check what queries your code produces. It is a small change to switch between selectinload() and joinedload() if you do not like the generated SQL.

We can now write our queries in a way that will not exhaust the database. The next step is to change the database to keep it in sync with our changing application.

Python Friday #85: Eager Loading vs Lazy Loading in SQLAlchemy ORM

Pre-Requisites

Lazy loading by default

SELECT n+1

Eager loading by request

Next

Like this:

Related

Leave a Comment Cancel reply

Pre-Requisites

Lazy loading by default

SELECT n+1

Eager loading by request

Next

Share this:

Like this:

Related

Leave a Comment Cancel reply