All Articles ↓
3 weeks ago

Code Generation in CUBA: What makes the magic

Introduction

Code generation is a common thing in modern frameworks. There may be different reasons behind introducing code generation - from getting rid of boilerplate code to reflection replacement and creating complex solutions based on domain-specific languages.

Like every technology, code generation has application areas and limitations. In this article, we will have a look at code generation usage in CUBA Framework nowadays and discuss the future development of this technique.

What is generated in CUBA?

CUBA framework is built over the well-known Spring framework. Basically, every CUBA application can be treated as a Spring application with some additional APIs added in order to simplify the development of the common business functionality.

CUBA provides a Vaadin-based library for quick UI development. And this library uses declarative data binding approach. Therefore, it is possible to display different object property values in the same UI widget by switching binding in runtime.

It means that every object should be able to provide a property value by its string name. Having the Spring framework in the core means that reflection can be used easily to fetch property values.

Entities Enhancement

Reflection is a powerful thing, but it is still slow despite all the optimization efforts of the JVM team. And when we talk about UI, especially displaying big data grids, we come to the conclusion that reflective methods will be invoked pretty frequently. E.g. displaying 20 lines with 10 properties easily leads to 200 invocations. Let’s multiply it by amount of users and take into account that all these invocations will happen on the app server (it’s the way Vaadin works), and we may get a pretty good workload for the server.

So, for every data object (entity) class we need to define a simple method that will invoke a property getter (or setter) based on the property name. The simple switch statement will do.

Also, the UI needs to “know” if an object was changed, so that a user could confirm data changes when the screen is closed. So, we need to inject a state listener into every object’s setter to check if the property’s value has changed.

And in addition to the method that sets or gets property values, we need to update every setter and invoke a state change listener to mark the entity object as changed.

This method is also not complex, basically one-line. But it would be unfair to ask a developer to do the boring job - adding and updating a bunch of very simple methods for every entity’s property. And that’s exactly the case where code generation shines.

CUBA uses EclipseLink ORM under the hood. And this framework solves some tasks stated earlier. As it is said in the documentation: “The EclipseLink JPA persistence provider uses weaving to enhance both JPA entities and Plain Old Java Object (POJO) classes for such things as lazy loading, change tracking, fetch groups, and internal optimizations.”

In CUBA, EclipseLink’s static weaving is invoked in build-time (it is executed at run-time by default) by the CUBA build plugin.

In addition to this, invoking change listeners is still a task that should be resolved by CUBA. And it is done by the build plugin at build-time. So, if you open an entity’s .class file, you can see a number of methods that you don’t see in your source code. And you may be surprised to see how your setters changed. For example, instead of:

public void setName(String name) {
   this.name = name;
}

In the decompiled code you will see:

public void setName(String name) {
   String __prev = this.getName();
   this._persistence_set_name(name);
   Object var5 = null;
   String __new = this.getName();
   if (!InstanceUtils.propertyValueEquals(__prev, __new)) {
       this.propertyChanged("name", __prev, __new);
   }
}

This is the mix of code generated by EclipseLink weaver and CUBA build plugin. So, in CUBA compiled entity classes are different from what you actually write in the IDE.

Bean Validation Messages

CUBA Platform supports internationalization for the bean validation messages. It means that in JPA validation annotations you can refer to .properties file entry instead of writing the message string directly into the annotation value.

In the code, it looks like this:

@NotNull(message = "{msg://hr_Person.name.validation.NotNull}")
@Column(name = "NAME", nullable = false, unique = true)
private String name;

Translation resource files for entities should be in the same package as entities. So, if we want to simplify loading property values, we need to specify the package name in this line. The action is simple, the update algorithm is clear, so it was decided to use code generation.

CUBA Platform plugin transforms the message reference above into the format:

@NotNull(message = "{msg://com.company.hr/hr_Person.name.validation.NotNull}")
@Column(name = "NAME", nullable = false, unique = true)
private String name;

And now we have the package name, therefore, fetching the message from the resource file using getResourceAsStream() method is much simpler.

What is the Future?

There is not too much code generated at the moment, but the framework is evolving. The development team is thinking about using code generation for other cases.

Common Entity Methods

At the moment, in CUBA the entity structure is flexible, but it is based on interfaces, so you need to implement methods defined in those interfaces. For example, if you want your entity to support soft delete, you need to implement the following interface:

public interface SoftDelete {
   Date getDeleteTs();
   String getDeletedBy();
   //More methods here
}

Of course, there are default implementations of those interfaces like com.haulmont.cuba.core.entity.StandardEntity, so you can extend this entity in order to use the implementation.

But it would be much simpler to use method names that are not hardcoded and just mark properties that you want to use to store the delete date and a username of who performed the delete. In this case, we could generate the methods shown above and map invocations to proper getters and setters. Let’s have a look at an example of an entity:

@Entity
public class Account {

   //Other fields
   @DeletedDate
   private Date disposedAt;

   @DeletedBy
   private String disposedBy;

   public Date getDisposedAt() {
       return disposedAt;
   }

   public String getDisposedBy() {
       return disposedBy;
   }

}

In this entity, you can see special fields defined to store the data about the delete process. So, what will we see if we apply some enhancement to this entity?

@Entity
public class Account implements SoftDelete {

   //Other fields
   @DeletedDate
   private Date disposedAt;

   @DeletedBy
   private String disposedBy;

   public Date getDisposedAt() {
       return disposedAt;
   }

   public String getDisposedBy() {
       return disposedBy;
   }

   //Generated
   @Override
   public Date getDeleteTs() {
       return getDisposedAt();
   }

   //Generated
   @Override
   public String getDeletedBy() {
       return getDisposedBy();
   }
}

Now we can check if the instance supports soft delete by applying the instanceof operator, thus implementing a generic approach for soft delete operations within the framework relying only upon the framework’s interfaces and methods instead of detecting annotations in runtime.

This approach will add more flexibility to entities’ definition, especially in database reverse engineering.

So, in future CUBA versions, we plan to add more bits and pieces of code generation here and there to make a developer’s life easier.

Build-time generation vs runtime generation

As you may notice, in CUBA, code generation happens at build-time. There are pros and cons for this approach, let’s describe them.

Build-time generation allows you to catch problems at earlier stages. When you generate code, there are many “variables” that you should take into account. For example, if EclipseLink API changes, then the invocations generated by CUBA during the enhancement process will become invalid. JDK API changes may cause issues, too. By generating code during build-time we rely on the Java compiler to find such issues at early stages. And compile-time errors are usually easier to find than runtime ones, source code is a static thing. Even if it is generated.

But build-time generation requires a separate tool that is not a part of the project codebase - build plugin. Introducing one more tool means introducing one more point of failure. A developer now depends on a compiler and code generation tool. And if any of them will contain a bug - there is a problem, because a developer cannot update them.

In runtime, there is no separate tool, so a code generator is part of the framework. But generation happens at runtime and developers depend on the program state and the VM state. Sometimes dynamic code generation may fail suddenly due to memory consumption or other issues because it is quite hard to control the VM state completely.

So, for CUBA we've chosen code generation at build-time. The amount of generated code is not that huge, the set of classes is limited to entities only, so for this particular case the code generator was pretty simple and there were no blocking issues with it so far.

Generation tools

In Java, a standardized code generation approach appeared starting from Java 5, and it was annotation processing. The idea is simple - you create a processor that can generate new code based on the annotations in the existing code. And you can generate code with annotations that may trigger another processing cycle.

Standard annotation processor has a limitation - it cannot update the existing code, just generate a new one. So, for CUBA, a Javassist library was used.

This library allows existing code updates, and it can use plain strings. For example, this code stores the previous value of the property before a setter invocation:

ctMethod.insertBefore(
       "__prev = this." + getterName + "();"
);

Javassist contains its own limited Java compiler to verify code correctness. Using strings for code generation doesn’t provide type safety, so some bugs can be introduced due to a mistype. But it is much simpler than using a library that uses a typed model for the code generation like ByteBuddy. You literally can see the code that will be added to your classes.

Conclusion

Code generation is a very powerful tool that helps developers to:

  1. Avoid doing boring jobs like writing simple repetitive code
  2. Automate some methods update due to code changes

On the other side, your program is not what you wrote. Extensive code generation can change your sources completely, so you’ll have to debug not your code, but someone else’s.

In addition to that, you become dependent on the framework’s code generators, and in case of bugs, you have to wait for the plugin update.

In CUBA, code generation areas are limited to entities, and we plan to extend this area slowly to simplify developers' work and add more flexibility to the framework.

So, if you’re planning to create your own framework or introduce a code generator for the existing one, consider this technique as very powerful, but fragile. Try to generate simple code and document all the generation steps and conditions, because any change in any API may break the generation easily.