«    »

Improving Performance via Eager Fetching in Hibernate

My previous article started discussing Hibernate relationships, focusing on lazy versus non-lazy relationships. This article continues the theme by discussing how to improve performance when dealing with relationships in Hibernate through a feature called eager fetching.

Hibernate's abstraction of database access behind getter and setter methods on domain objects hides potentially inefficient database access. Mindlessly using Hibernate without considering the implications for the database operations being performed can lead to significant performance hits. One of the most significant performance problems is the N+1 query problem, where iterating over a collection of entities and accessing a related entity for each will result in one query to return the collection, and then a separate query for each entity to retrieve its related entity. For N entities in the collection, this is a total of N+1 queries. This is quite inefficient, especially for larger values of N, considering that a single database query can be written to return all of the related entities for all the entities of the collection. I touched on this issue previously in the context of non-lazy relationships, but it still occurs with lazy relationships. To provide a concrete example, I will refer to the following object model (the same as in my previous article):
Class Diagram

The corresponding classes are summarized as follows:

class Customer {
  public Collection getOrders();
  public void addOrder(Order order);
}
class Order {
  public Customer getCustomer();
  public PaymentMethod getPaymentMethod();
  public void setPaymentMethod(PaymentMethod m);
}
class PaymentMethod {
  public Order getOrder();
}

Let us say I want to give a bonus to a customer who has only paid by cash. The naive implementation for a given customer would be the following:

public boolean qualifiesForBonus(Customer customer) {
  for (Order order : customer.getOrders() ) {
    PaymentMethod paymentMethod = order.getPaymentMethod();
    if (!paymentMethod.isCash()) {
      return false;
    }
  }
  return true;
}

This implementation results in N+1 queries, which may not be so bad if the customer has only a few orders. But what if we want to calculate this for all customers? Now it is M(N+1) + 1 queries (where M is the number of customers), and the cost becomes much higher. This question can be answered with a single SQL query:

select * from Customer c join Order o on o.customer_FK = c.id 
where not exists (
select * from PaymentMethod m where m.order_FK = o.id and m.type <> 'CASH'
)

It would be nice to be able to calculate this logic across relationships using our domain objects while efficiently retrieving them via a single query. Hibernate's eager fetching feature lets you accomplish this. Eager fetching allows you to load related entities at the same time using a single query. You simply tell Hibernate in your query by criteria or HQL to fetch the associated entity. Behind the scenes, Hibernate generates an SQL statement that joins to the associated entity table. The result set includes the columns from both the base entity and the associated entity, which Hibernate converts into the appropriate domain objects.

For an example, we will use Hibernate HQL to retrieve all customers in order to calculate if they qualify for a bonus. We want to eagerly fetch the orders and payment methods used in the calculation ahead of time to avoid multiple SQL queries being issued. The following code does this:

List list = session.createQuery(
  "select c from Customer c left join fetch c.orders o " +
  "left join fetch o.paymentMethod p").list();

The list of customers returned already has the Order and PaymentMethod objects loaded in a single SQL query, so now when you iterate over the list and call the qualifiesForBonus() method, the processing will be done entirely in memory with no additional SQL queries.

As wonderful as eager fetching is, it comes with several surprises. Understanding the different types of SQL joins is important. The above HQL specifying "left join" is shorthand for "left outer join", which means that Customers without Orders will still be returned. If you just specify "join", which is shorthand for "inner join", then such Customers will not be returned. Let us say that the database holds three customers with two orders and one customer with no orders. The resulting SQL using "left join" will return seven rows (3 * 2 + 1), whereas using "join" it will return only six rows. For the "left join" query, how many elements will the resulting list created by Hibernate contain? You might expect the answer to be four, for the four customers in the database, but the answer is seven. Hibernate does only return four instances of Customer, as per its guarantee of having a single Java instance of a given domain object per session, but three of these instances are referenced twice in the list. Given the strong object-oriented nature of HQL and Hibernate, I found it surprising that such an object-based query language would return duplicate references to match what the SQL query returns. The apparent explanation from the Hibernate website for this behavior is that there are situations where this the desired behavior (although I cannot think of any myself). Fortunately, there is a simple solution. Adding the keyword distinct to the query (i.e. "select distinct c from ...") will not change the SQL query or the number of results it returns, but it will cause Hibernate to eliminate the duplicate references and return four elements in the list as expected.

To use eager fetching when doing query by criteria, use the setFetchMode() method. For example:

List list = session.createCriteria(Customer.class)
  .setFetchMode("orders", FetchMode.JOIN).list();

Hibernate will determine whether to do an inner or outer join: if the relationship is optional, an outer join will be used, potentially returning duplicate references as described above.

The entire point of eager fetching is to achieve better performance. If you try to eagerly fetch too many relationships, however, especially optional relationships resulting in outer joins, you may find your performance becomes worse, not better. Each additional outer join adds another dimension to the Cartesian product representing the result of the query, and too large a result set will cause performance to be worse than without the eager fetching. Hibernate 2 actually limited eager fetching to a single to-many relationship to help prevent this from occurring. Hibernate 3 drops this restriction, leaving you responsible for the consequences.

There are other surprises lurking in Hibernate's implementation of eager fetching. I recommend referring to Hibernate's online documentation for more information, particularly the FAQs.

This article is one of a series on Hibernate Tips & Tricks.

If you find this article helpful, please make a donation.

4 Comments on “Improving Performance via Eager Fetching in Hibernate”

  1. Carles Barrob├ęs says:

    Very useful article. I had run into the problem of duplicated rows, which also surprised me as default behavior.

    I found that if you want distinct to be applied to a Criteria query, you have to use…

    criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);

    …before calling list().

  2. Mike says:

    a better solution is to just write your own SQL, lot simpler and will perform a lot better!

  3. Lucas says:

    Hi, i found this article when searching a way to improve performance in a nullable one-to-one mapping. I understand hibernate needs to query the database to know its nullability, but it does it with subsequent selects. So, if i want to retrieve a list of entities that have 4 one-to-one associations, hibernate triggers 4 selects for each entity retrieved. I tried everything to make hibernate retrieve the associations in a single query (fetch=’join’ and outer-join=’true’) but i see no difference. Do you have an idea of how can i avoid the separate selects?
    Lucas

  4. yogesh chavan vaijapur says:

    I found this article very useful to understand the eager fetching and performance benefits.

    Thanks a lot.

Leave a Reply

(Not displayed)

«    »