All Articles ↓
6 months ago
Fetching data with ORM is easy! Is it?

Introduction

Almost any system operates with external data stores in some way. In most of the cases, it is a relational database and very often data fetching is delegated to some ORM implementation. ORM covers a lot of routine and brings along a few new abstractions in return.

Martin Fowler wrote an interesting article about ORM and one of the key thoughts there is “ORMs help us deal with a very real problem for most enterprise applications. ... They aren't pretty tools, but then the problem they tackle isn't exactly cuddly either. I think they deserve a little more respect and a lot more understanding”.

In CUBA framework we use ORM very heavily and know a lot about its limitations since we have various kinds of projects all over the world. There are a lot of things that can be discussed, but we will focus on one of them: lazy vs eager data fetch. We’ll talk about different approaches to data fetch (mostly within JPA API and Spring), how we deal with it in CUBA and what RnD work we do to improve ORM layer in CUBA. We will have a look at essentials that might help developers to not hit issues with terrible performance using ORMs.

Fetching Data: Lazy way or Eager way?

If your data model contains only one entity, there will be no issues with using ORM. Let’s have a look at the example. We have a user who has ID and Name:

public class User {
   @Id
   @GeneratedValue
   private int id;
   private String name;

   //Getters and Setters here
}

To fetch it we just need to ask EntityManager nicely:

EntityManager em = entityManagerFactory.createEntityManager();
User user = em.find(User.class, id);

Things get interesting when we have one-to-many relation between entities:

public class User {
   @Id
   @GeneratedValue
   private int id;
   private String name;
   @OneToMany
   private List<Address> addresses;

   //Getters and Setters here
}

If we want to fetch a user record from the database, a question arises: “Should we fetch an address too?”. And the “right” answer will be: “It depends”. In some use cases, we may need an address in some of them - not. Usually, an ORM provides two options for fetching data: lazy and eager. Most of them set the lazy fetch mode by default. And when we write the following code:

EntityManager em = entityManagerFactory.createEntityManager();
User user = em.find(User.class, 1);
em.close();
System.out.println(user.getAddresses().get(0));

We get so-called “LazyInitException” which confuses ORM rookies very much. And here we need to explain the concept on an “Attached” and “Detached” objects as well as tell about database sessions and transactions.

OK then, an entity instance should be attached to a session so we should be able to fetch details attributes. In this case, we got another problem - transactions are getting longer, therefore, a risk of getting a deadlock increase. And splitting our code to a chain of short transactions may cause “death of million mosquitos” for the database due to an increased number of very short separate queries.

As it was said, you may or may not need Addresses attribute fetched, therefore you need to “touch” the collection only in some use cases, adding more conditions. Hmmmm…. Looks like it’s getting complex.

OK, will another fetch type help?

public class User {
   @Id
   @GeneratedValue
   private int id;
   private String name;
   @OneToMany(fetch = FetchType.EAGER)
   private List<Address> addresses;

   //Getters and Setters here
}

Well, not exactly. We’ll get rid of the annoying lazy init exception and should not check whether an instance is attached or detached. But we got a performance problem, because, again, we don’t need Addresses for all cases, but select them always. Any other ideas?

Spring JDBC

Some developers become so annoyed with ORM that they switch to “semi-automatic” mappings using Spring JDBC. In this case, we create unique queries for unique use cases and return objects that contain attributes valid for a particular use case only.

It gives us great flexibility. We can get only one attribute:

String name = this.jdbcTemplate.queryForObject(
       "select name from t_user where id = ?",
       new Object[]{1L}, String.class);

Or the whole object:

User user = this.jdbcTemplate.queryForObject(
       "select id, name from t_user where id = ?",
       new Object[]{1L},
       new RowMapper<User>() {
           public User mapRow(ResultSet rs, int rowNum) throws SQLException {
               User user = new User();
               user.setName(rs.getString("name"));
               user.setId(rs.getInt("id"));
               return user;
           }
       });

You can fetch addresses too using ResultSetExtractor, but it involves writing some extra code and you should know how to write SQL joins to avoid n+1 select problem.

Well, it’s getting complex again. You control all the queries and you control mapping, but you have to write more code, learn SQL and know how database queries are executed. Though I think knowing SQL basics is a necessary skill for almost every developer, some of them do not think so and I’m not going to argue with them. Knowing x86 assembler is not a vital skill for everyone nowadays too. Let’s just think about how we can simplify development.

JPA EntityGraph

Let’s make a step back and try to understand what we’re going to achieve? It seems like all we need to do is to tell exactly which attributes we’re going to fetch in different use cases. Let’s do it then! JPA 2.1 has introduced a new API - Entity Graph. The idea behind this API is simple - you just write several annotations that describe what should be fetched. Let’s have a look at the example:

@Entity
@NamedEntityGraphs({
       @NamedEntityGraph(name = "user-only-entity-graph"),
       @NamedEntityGraph(name = "user-addresses-entity-graph",
               attributeNodes = {@NamedAttributeNode("addresses")})
       })
public class User {
   @Id
   @GeneratedValue
   private int id;
   private String name;
   @OneToMany(fetch = FetchType.LAZY)
   private Set<Address> addresses;

   //Getters and Setters here

}

For this entity we’ve described two entity graphs - the user-only-entity-graph does not fetch the Addresses attribute (which is marked as lazy), whilst the second graph instructs the ORM to select addresses. If we mark an attribute as eager, entity graph settings will be ignored and the attribute will be fetched.

So, starting from JPA 2.1 you can select entities in the following way:

EntityManager em = entityManagerFactory.createEntityManager();
EntityGraph graph = em.getEntityGraph("user-addresses-entity-graph");
Map<String, Object> properties = Map.of("javax.persistence.fetchgraph", graph);
User user = em.find(User.class, 1, properties);
em.close();

This approach greatly simplifies a developer’s work, there is no need to “touch” lazy attributes and create long transactions. The great thing is that the entity graph can be applied at the SQL generation level, so no extra data is fetched to Java application from the database. But there is a problem still. We cannot say which attributes were fetched and which weren’t. There is an API for this, you can check attributes using PersistenceUnit class:

PersistenceUtil pu = entityManagerFactory.getPersistenceUnitUtil();
System.out.println("User.addresses loaded: " + pu.isLoaded(user, "addresses"));

But it is pretty boring. Can we simplify it and just do not show unfetched attributes?

Spring Projections

Spring Framework provides a fantastic facility called Projections (and it’s different from Hibernate’s Projections). If we want to fetch only some properties of an entity, we can specify an interface and Spring will select interface “instances” from a database. Let’s have a look at the example. If we define the following interface:

interface NamesOnly {
   String getName();
}

And then define a Spring JPA repository to fetch our User entities:

interface UserRepository extends CrudRepository<User, Integer> {
   Collection<NamesOnly> findByName(String lastname);
}

In this case after the invocation of the findByName method we just won’t be able to access unfetched attributes! The same principle applies to detail entity classes too. So you can fetch both master and detail records this way. Moreover, in the most cases Spring generates “proper” SQL and fetches only attributes specified in the projection, i.e. projections work like entity graph descriptions.

It is a very powerful concept, you can use SpEL expressions, use classes instead of interfaces, etc. There is more information in the documentation you can check it if you’re interested.

The only problem with Projections is that under the hood they are implemented as maps, hence read-only. Therefore, thought you can define a setter method for a projection, you won’t be able to save changes using neither CRUD repositories nor EntityManager. You can treat projections as DTOs, and you have to write your own DTO-to-entity conversion code.

CUBA Implementation

From the beginning of CUBA framework development, we tried to optimize the code that works with a database. In the framework, we use EclipseLink to implement data access layer API. The good thing about EclipseLink - it supported partial entity load from the beginning, that’s why we chose it over Hibernate in the first place. In this ORM, you could specify which exactly attributes should be loaded before JPA 2.1 had become a standard. Therefore we added our internal “Entity Graph”-like concept to our framework - CUBA Views. Views are pretty powerful - you can extend them, combine, etc. The second reason behind CUBA Views creation - we wanted to use short transactions, and focus on working mostly with detached objects, otherwise, we could not make rich web UI fast and responsive.

In CUBA view descriptions are stored in XML file and look like this:

<view class="com.sample.User"
     extends="_local"
     name="user-minimal-view">
   <property name="name"/>
   <property name="addresses"
             view="address-street-only-view"/>
   </property>
</view>

This view instructs CUBA DataManager to fetch User entity with its local name attribute and fetch addresses applying address-street-only-view while fetching them (important!) at the query level. When a view is defined you could apply it to get entities using DataManager class:

List<User> users = dataManager.load(User.class).view("user-edit-view").list();

It works like a charm, and saves a lot of network traffic on not loading unused attributes but like in JPA Entity Graph there is a small issue: we cannot say which attributes of the User entity were loaded. And in CUBA we have annoying “IllegalStateException: Cannot get unfetched attribute [...] from detached object”. Like in JPA, you can check whether an attribute unfetched, but writing these checks for every entity being fetched is a boring job and developers are not happy with it.

CUBA View Interfaces PoC

And what if we could take the best of two worlds? We decided to implement so-called entity interfaces that utilize Spring’s approach, but those interfaces are translated into CUBA views during application startup and then can be used in DataManager. The idea is pretty simple: you define an interface (or a set of interfaces) that specify entity graph. It looks like Spring Projections and works like Entity Graph:

interface UserMinimalView extends BaseEntityView<User, Integer> {
   String getName();
   void setName(String val);
   List<AddressStreetOnly> getAddresses();

   interface AddressStreetOnly extends BaseEntityView<Address, Integer> {
      String getStreet();
      void setStreet(String street);
   }
}

Note that AddressStreetOnly interface can be nested if it is used only in one case.

During CUBA Application startup (in fact, it is mostly Spring Context Initialization), we create a programmatic representation for CUBA views and store them in an internal repository bean in Spring context.

After that we need to tweak the DataManager, so it can accept class names in addition to CUBA View string names and then we simply pass interface class:

List<User> users = dataManager.loadWithView(UserMinimalView.class).list();

We generate proxies implementing entity view for each instance fetched from the database as hibernate does. And when you try to get an attribute’s value, the proxy forwards the invocation to the real entity.

With this implementation we’re trying to kill two birds with one stone:

  • The data that is not stated in the interface is not loaded to the Java application code, thus saving server resources
  • A developer uses only properties that were fetched, therefore, no more “UnfetchedAttribute” errors (aka LazyInitException in Hibernate).

In contrast to Spring Projections, Entity Views wrap entities and implement CUBA’s Entity interface, therefore they can be treated as entities: you can update a property and save changes to the database.

The “third bird” here - you can define a “read-only” interface that contains only getters, completely preventing entities from modifications at the API level.

Also, we can implement some operations on the detached entity like this user’s name conversion to lowercase:

@MetaProperty
default String getNameLowercase() {
   return getName().toLowerCase();
}

In this case, all calculated attributes can be moved from the entity model, so you don’t mix data fetch logic with use case-specific business logic.

Another interesting opportunity - you can inherit interfaces. This gives you a possibility to prepare several views with a different set of attributes and then mix them if needed. For example, you can have one interface that contains the user’s name and email and another one, that contains name and addresses. And if you need a third view interface that should contain a name, email, and addresses, you can do it just by combining both - thanks to multiple inheritance of interfaces in Java. Please note that you can pass this third interface to methods that consume either first or second interface, OOP principles work here as usual.

We’ve also implemented entity conversion between views - each entity view has reload() method that accepts another view class as a parameter:

UserFullView userFull = userMinimal.reload(UserFullView.class);

UserFullView may contain additional attributes, so the entity will be reloaded from the database. And entity reload is a lazy process, it will be performed only when you try to get an entity attribute value. We did this on purpose because in CUBA we have a “web” module that renders rich UI and may contain custom REST controllers. In this module, we use the same entities, and it can be deployed on a separate server. Therefore, each entity reload causes an additional request to the database via the core module (a.k.a middleware). So, by introducing lazy entity reload we save some network traffic and database queries.

The PoC can be downloaded from the GitHub - feel free to play with it.

Conclusion

ORMs are going to be massively used in enterprise applications in the near future. We just have to provide something that will convert database rows into Java objects. Of course in complex, high-load applications we’ll continue seeing unique solutions, but ORM will live as long as RDBMSes will.

In CUBA framework we’re trying to simplify ORM use to make it as painless for developers as possible. And in the next versions, we’re going to introduce more changes. I’m not sure whether those will be view interfaces or something else, but I’m pretty sure with one thing - working with ORM in the next version with CUBA will be simplified.

Andrey Belyaev