Tag Archives: Hibernate

Rapid Development with Hibernate in CQRS Read Models

In this post I’m going to share a few tricks for using Hibernate tooling in CQRS read models for rapid development.

Why Hibernate?

Hibernate is extremely popular. It’s also deceptively easy on the outside and fairly complex on the inside. It makes it very easy get started without in-depth understanding, misuse, and discover problems when it’s already too late. For all these reasons these days it’s rather infamous.

However, it still is a piece of solid and mature technology. Battle-tested, robust, well-documented and having solutions to many common problems in the box. It can make you *very* productive. Even more so if you include tooling and libraries around it. Finally, it is safe as long as you know what you’re doing.

Automatic Schema Generation

Keeping SQL schema in sync with Java class definitions is rather expensive a bit of a struggle. In the best case it’s very tedious and time-consuming activity. There are numerous opportunities for mistakes.

Hibernate comes with a schema generator (hbm2ddl), but in its “native” form is of limited use in production. It can only validate the schema, attempt an update or export it, when the SessionFactory is created. Fortunately, the same utility is available for custom programmatic use.

We went one step further and integrated it with CQRS projections. Here’s how it works:

  • When the projection process thread starts, validate whether DB schema matches the Java class definitions.
  • If it does not, drop the schema and re-export it (using hbm2ddl). Restart the projection, reprocessing the event store from the very beginning. Make the projection start from the very beginning.
  • If it does match, just continue updating the model from the current state.

Thanks to this, much of the time you don’t have to we almost never type SQL with table definitions by hand. It makes development a lot faster. It’s similar to working with hbm2ddl.auto = create-drop. However, using this in a view model means it does not actually lose data (which is safe in the event store). Also, it’s smart enough to only recreate the schema if it’s actually changed – unlike the create-drop strategy.

Preserving data and avoiding needless restarts does not only improve the development cycle. It also may make it usable in production. At least under certain conditions, see below.

There is one caveat: Not all changes in the schema make the Hibernate validation fail. One example is changing field length – as long as it’s varchar or text, validation passes regardless of limit. Another undetected change is nullability.

These issues can be solved by restarting the projection by hand (see below). Another possibility is having a dummy entity that doesn’t store data, but is modified to trigger the automatic restart. It could have a single field called schemaVersion, with @Column(name = "v_4") annotation updated (by developer) every time the schema changes.

Implementation

Here’s how it can be implemented:

public class HibernateSchemaExporter {
    private final EntityManager entityManager;

    public HibernateSchemaExporter(EntityManager entityManager) {
        this.entityManager = entityManager;
    }

    public void validateAndExportIfNeeded(List<Class> entityClasses) {
        Configuration config = getConfiguration(entityClasses);
        if (!isSchemaValid(config)) {
            export(config);
        }
    }

    private Configuration getConfiguration(List<Class> entityClasses) {
        SessionFactoryImplementor sessionFactory = (SessionFactoryImplementor) getSessionFactory();
        Configuration cfg = new Configuration();
        cfg.setProperty("hibernate.dialect", sessionFactory.getDialect().toString());

        // Do this when using a custom naming strategy, e.g. with Spring Boot:
        
        Object namingStrategy = sessionFactory.getProperties().get("hibernate.ejb.naming_strategy");
        if (namingStrategy instanceof NamingStrategy) {
            cfg.setNamingStrategy((NamingStrategy) namingStrategy);
        } else if (namingStrategy instanceof String) {
            try {
                log.debug("Instantiating naming strategy: " + namingStrategy);
                cfg.setNamingStrategy((NamingStrategy) Class.forName((String) namingStrategy).newInstance());
            } catch (ReflectiveOperationException ex) {
                log.warn("Problem setting naming strategy", ex);
            }
        } else {
            log.warn("Using default naming strategy");
        }
        entityClasses.forEach(cfg::addAnnotatedClass);
        return cfg;
    }

    private boolean isSchemaValid(Configuration cfg) {
        try {
            new SchemaValidator(getServiceRegistry(), cfg).validate();
            return true;
        } catch (HibernateException e) {
            // Yay, exception-driven flow!
            return false;
        }
    }

    private void export(Configuration cfg) {
        new SchemaExport(getServiceRegistry(), cfg).create(false, true);
        clearCaches(cfg);
    }

    private ServiceRegistry getServiceRegistry() {
        return getSessionFactory().getSessionFactoryOptions().getServiceRegistry();
    }

    private void clearCaches(Configuration cfg) {
        SessionFactory sf = entityManager.unwrap(Session.class).getSessionFactory();
        Cache cache = sf.getCache();
        stream(cfg.getClassMappings()).forEach(pc -> {
            if (pc instanceof RootClass) {
                cache.evictEntityRegion(((RootClass) pc).getCacheRegionName());
            }
        });
        stream(cfg.getCollectionMappings()).forEach(coll -> {
            cache.evictCollectionRegion(((Collection) coll).getCacheRegionName());
        });
    }

    private SessionFactory getSessionFactory() {
        return entityManager.unwrap(Session.class).getSessionFactory();
    }
}

The API looks pretty dated and cumbersome. There does not seem to be a way to extract Configuration from the existing SessionFactory. It’s only something that’s used to create the factory and thrown away. We have to recreate it from scratch. The above is all we needed to make it work well with Spring Boot and L2 cache.

Restarting Projections

We’ve also implemented a way to perform such a reinitialization manually, exposed as a button in the admin console. It comes in handy when something about the projection changes but does not involve modifying the schema. For example, if a value is calculated/formatted differently, but it’s still a text field, this mechanism can be used to manually have the history reprocessed. Another use case is fixing a bug.

Production Use?

We’ve been using this mechanism with great success during development. It let us freely modify the schema by only changing the Java classes and never worrying about table definitions. Thanks to combination with CQRS, we could even maintain long-running demo or pilot customer instances. Data has always been safe in the event store. We could develop the read model schema incrementally and have the changes automatically deployed to a running instance, without data loss or manually writing SQL migration scripts.

Obviously this approach has its limits. Reprocessing the entire event store at random point in time is only feasible on very small instances or if the events can be processed fast enough.

Otherwise the migration might be solved using an SQL migration script, but it has its limits. It’s often risky and difficult. It may be slow. Most importantly, if the changes are bigger and involve data that was not previously included in the read model (but is available in the events), using an SQL script simply is not an option.

A much better solution is to point the projection (with new code) to a new database. Let it reprocess the event log. When it catches up, test the view model, redirect traffic and discard the old instance. The presented solution works perfectly with this approach as well.

This post also appeared on the Oasis Digital blog.

Careful With Native SQL in Hibernate

I really like Hibernate, but I also don’t know a tool that would be nearly as powerful and deceptive at the same time. I could write a book on surprises in production and cargo cult programming related to Hibernate alone. It’s more of an issue with the users than with the tool, but let’s not get too ranty.

So, here’s a recent example.

Problem

We need a background job that lists all files in a directory and inserts an entry for each of them to a table.

Naive Solution

The job used to be written in Bash and there is some direct SQL reading from the table. So, blinders on and let’s write some direct SQL!

for (String fileName : folder.list()) {
    SQLQuery sql = session.getDelegate().createSQLQuery(
        "insert into dir_contents values (?)");
    sql.setString(0, fileName);
    sql.executeUpdate();
}

Does it work? Sure it does.

Now, what happens if there are 10,000 files in the folder? What if you also have a not so elegant domain model, with way too many entity classes, thousands of instances and two levels of cache all in one context?

All of a sudden this trivial job takes 10 minutes to execute, all that time keeping 2 or 3 CPUs busy at 100%.

What, for just a bunch of inserts?

Easy Fix

The problem is that it’s Hibernate. It’s not just a dumb JDBC wrapper, but it has a lot more going on. It’s trying to keep caches and session state up to date. If you run a bare SQL update, it has no idea what table(s) you are updating, what it depends on and how it affects everything, so just in case it pretty much flushes everything.

If you do this 10,000 times in such a crowded environment, it adds up.

Here’s one way to fix it – rather than running 10,000 updates with flushes, execute everything in one block and flush once.

session.doWork(new Work() {
    public void execute(Connection connection) throws SQLException {
        PreparedStatement ps = connection
                .prepareStatement("insert into dir_contents values (?)");
        for (String fileName : folder.list()) {
            ps.setString(1, fileName);
            ps.executeUpdate();
        }
    }
});

Other Solutions

Surprise, surprise:

  • Do use Hibernate. Create a real entity to represent DirContents and just use it like everything else. Then Hibernate knows what caches to flush when, how to batch updates and so on.
  • Don’t use Hibernate. Use plain old JDBC, MyBatis, or whatever else suits your stack or is there already.

Takeaway

Native SQL has its place, even if this example is not the best use case. Anyway, the point is: If you are using native SQL with Hibernate, mind the session state and caches.

Version-Based Optimistic Concurrency Control in JPA/Hibernate

This article is an introduction to version-based optimistic concurrency control in Hibernate and JPA. The concept is fairly old and much has been written on it, but anyway I have seen it reinvented, misunderstood and misused. I’m writing it just to spread knowledge and hopefully spark interest in the subject of concurrency control and locking.

Use Cases

Let’s say we have a system used by multiple users, where each entity can be modified by more than one user. We want to prevent situations where two persons load some information, make some decision based on what they see, and update the state at the same time. We don’t want to lose changes made by the user who first clicked “save” by overwriting them in the following transaction.

It can also happen in server environment – multiple transactions can modify a shared entity, and we want to prevent scenarios like this:

  1. Transaction 1 loads data
  2. Transaction 2 updates that data and commits
  3. Using state loaded in step 1 (which is no longer current), transaction 1 performs some calculations and update the state

In some ways it’s comparable to non-repeatable reads.

Solution: Versioning

Hibernate and JPA implement the concept of version-based concurrency control for this reason. Here’s how it works.

You can mark a simple property with @Version or <version> (numeric or timestamp). It’s going to be a special column in database. Our mapping can look like:

@Entity
@Table(name = "orders")
public class Order {
	@Id
	private long id;

	@Version
	private int version;

	private String description;

	private String status;

	// ... mutators
}

When such an entity is persisted, the version property is set to a starting value.

Whenever it’s updated, Hibernate executes query like:

update orders
set description=?, status=?, version=? 
where id=? and version=?

Note that in the last line, the WHERE clause now includes version. This value is always set to the “old” value, so that it only will update a row if it has the expected version.

Let’s say two users load an order at version 1 and take a while looking at it in the GUI.

Anne decides to approve the order and executes such action. Status is updated in database, everything works as expected. Versions passed to update statement look like:

update orders
set description=?, status=?, version=2
where id=? and version=1

As you can see, while persisting that update the persistence layer increments the version counter to 2.

In her GUI, Betty still has the old version (number 1). When she decides to perform an update on the order, the statement looks like:

update orders
set description=?, status=?, version=2
where id=? and version=1

At this point, after Anne’s update, the row’s version in database is 2. So this second update affects 0 rows (nothing matches the WHERE clause). Hibernate detects that and an org.hibernate.StaleObjectStateException (wrapped in a javax.persistence.OptimisticLockException).

As a result, the second user cannot perform any updates unless he refreshes the view. For proper user experience we need some clean exception handling, but I’ll leave that out.

Configuration

There is little to customize here. The @Version property can be a number or a timestamp. Number is artificial, but typically occupies fewer bytes in memory and database. Timestamp is larger, but it always is updated to “current timestamp”, so you can actually use it to determine when the entity was updated.

Why?

So why would we use it?

  • It provides a convenient and automated way to maintain consistency in scenarios like those described above. It means that each action can only be performed once, and it guarantees that the user or server process saw up-to-date state while making a business decision.
  • It takes very little work to set up.
  • Thanks to its optimistic nature, it’s fast. There is no locking anywhere, only one more field added to the same queries.
  • In a way it guarantees repeatable reads even with read committed transaction isolation level. It would end with an exception, but at least it’s not possible to create inconsistent state.
  • It works well with very long conversations, including those that span multiple transactions.
  • It’s perfectly consistent in all possible scenarios and race conditions on ACID databases. The updates must be sequential, an update involves a row lock and the “second” one will always affect 0 rows and fail.

Demo

To demonstrate this, I created a very simple web application. It wires together Spring and Hibernate (behind JPA API), but it would work in other settings as well: Pure Hibernate (no JPA), JPA with different implementation, non-webapp, non-Spring etc.

The application keeps one Order with schema similar to above and shows it in a web form where you can update description and status. To experiment with concurrency control, open the page in two tabs, do different modifications and save. Try the same thing without @Version.

It uses an embedded database, so it needs minimal setup (only a web container) and only takes a restart to start with a fresh database.

It’s pretty simplistic – accesses EntityManager in a @Transactional @Controller and backs the form directly with JPA-mapped entity. May not be the best way to do things for less trivial projects, but at least it gathers all code in one place and is very easy to grasp.

Full source code as Eclipse project can be found at my GitHub repository.

Domain Modeling: Naive OO Hurts

I’ve read a post recently on two ways to model data of business domain. My memory is telling me it was Ayende Rahien, but I can’t find it on his blog.

One way is full-blown object-relational mapping. Entities reference each other directly, and the O/R mapper automatically loads data for you as you traverse the object graph. To obtain Product for an OrderLine, you just call line.getProduct() and are good to go. Convenient and deceptively transparent, but can easily hurt performance if you aren’t careful enough.

The other way is what that post may have called a document-oriented mapping. Each entity has its ID and its own data. It may have some nested entities if it’s an aggregate root (in domain-driven design terminology). In this case, OrderLine only has productId, and if you want to get the product you have to call ProductRepository.getProduct(line.getProductId()). It’s a bit less convenient and requires more ceremony, but thanks to its explicitness it also is much easier to optimize or avoid performance pitfalls.

So much for the beforementioned post. I recently had an opportunity to reflect more on this matter on a real world example.

The Case

The light dawned when I set out to create a side project for a fairly large system that has some 200+ Hibernate mappings and about 300 tables. I knew I only needed some 5 core tables, but for the sake of consistency and avoiding duplication I wanted to reuse mappings from the big system.

I knew there could be more dependencies on things I don’t need, and I did not have a tool to generate a dependency graph. I just included the first mapping, watched Hibernate errors for unmapped entities, added mappings, checked error log again… And so on, until Hibernate was happy to know all the referenced classes.

When I finished, the absolutely minimal and necessary “core” in my side project had 110 mappings.

As I was adding them, I saw that most of them are pretty far from the core and from my needs. They corresponded to little subsystems somewhere on the rim.

It felt like running a strong magnet over a messy workplace full of all kinds of metal things when all I needed was two nails.

Pain Points

It turns out that such object orientation is more pain than good. Having unnecessary dependencies in a spin-off reusing the core is just one pain point, but there are more.

It also is making my side project slower and using too many resources – I have to map 100+ entities and have them supported in my 2nd level cache. When I’m loading some of the core entities, I also pull many things I don’t need: numerous fields used in narrow contexts, even entire eagerly-loaded entities. At all times I have too much data floating around.

Such a model also is making development much slower. Build and tests take longer, because there are many more tables to generate, mappings to scan etc.

It’s also slower for another reason: If a domain class references 20 other classes, how does a developer know which are important and which are not? In any case it may lead to very long and somewhat unpleasant classes. What should be core becomes a gigantic black hole sucking in the entire universe. When an unaware newbie goes near, most of the time he will either sink trying to understand everything, or simply break something – unaware of all the links in his context, unable to understand all links present in the class. Actually, even seniors can be deceived to make such mistakes.

The list is probably much longer.

Solution?

There are two issues here.

How did that happen?

I’m writing a piece of code that’s pretty distant from the core, but could really use those two new attributes on this core entity. What is the fastest way? Obvious: Add two new fields to the entity. Done.

I need to add a bunch of new entities for a new use case that are strongly related to a core entity. The shortest path? Easy, just reference a few entites from the core. When I need those new objects and I already have the old core entity, Hibernate will do the job of loading the new entities for me as I call the getters. Done.

Sounds natural and I can see how I could make such mistakes a few years ago, but the trend could have been stopped or even reversed. With proper code reviews and retrospectives, the team may have found a better way earlier. Having some slack and good will it may have even refactored the existing code.

Is there a better way to do it?

Let’s go back to the opening section on two ways to map domain classes: “Full-blown ORM” vs. document/aggregate style.

Today I believe full-blown ORM may be a good thing for a fairly small project with a few closely related use cases. As soon as we branch out new bigger chunks of functionality and introduce more objects, they should become their own aggregates. They should never be referenced from the core, even though they themselves may orbit around and have a direct link to the core. The same is true for the attributes of core entites: If something is needed in a faraway use case, don’t spoil the core mapping with a new field. Even introduce a new entity if necessary.

In other words, learn from domain-driven design. If you haven’t read the book by Eric Evans yet, go do it now. It’s likely the most worthwhile and influential software book I’ve read to date.

Testing Spring & Hibernate Without XML

I’m very keen on the improvements in Spring 3 that eventually let you move away from XML into plain Java configuration with proper support from IDE and compiler. It doesn’t change the fact that Spring is a huge suite and it sometimes finding the thing you need can take a while.

XML-free unit tests around Hibernate are one such thing. I knew it was possible, but it took me more than 5 minutes to find all the pieces, so here I am writing it down.

I am going to initialize all my beans in a @Configuration class like this:

@Configuration
@EnableTransactionManagement
public class TestRepositoryConfig {
	@Bean
	public DataSource dataSource() {
		return new EmbeddedDatabaseBuilder().setType(EmbeddedDatabaseType.H2)
				.setName("Nuts").build();
	}

	@Bean
	public LocalSessionFactoryBean sessionFactoryBean() {
		LocalSessionFactoryBean result = new LocalSessionFactoryBean();
		result.setDataSource(dataSource());
		result.setPackagesToScan(new String[] { "pl.squirrel.testnoxml.entity" });

		Properties properties = new Properties();
		properties.setProperty("hibernate.hbm2ddl.auto", "create-drop");
		result.setHibernateProperties(properties);
		return result;
	}

	@Bean
	public SessionFactory sessionFactory() {
		return sessionFactoryBean().getObject();
	}

	@Bean
	public HibernateTransactionManager transactionManager() {
		HibernateTransactionManager man = new HibernateTransactionManager();
		man.setSessionFactory(sessionFactory());
		return man;
	}

	@Bean
	public OrderRepository orderRepo() {
		return new OrderRepository();
	}
}

… and my test can look like this:

@RunWith(SpringJUnit4ClassRunner.class)
@TransactionConfiguration(defaultRollback = true)
@ContextConfiguration(classes = { TestRepositoryConfig.class })
@Transactional
public class OrderRepositoryTest {
	@Autowired
	private OrderRepository repo;

	@Autowired
	private SessionFactory sessionFactory;

	@Test
	public void testPersistOrderWithItems() {
		Session s = sessionFactory.getCurrentSession();

		Product chestnut = new Product("Chestnut", "2.50");
		s.save(chestnut);
		Product hazelnut = new Product("Hazelnut", "5.59");
		s.save(hazelnut);

		Order order = new Order();
		order.addLine(chestnut, 20);
		order.addLine(hazelnut, 150);

		repo.saveOrder(order);
		s.flush();

		Order persistent = (Order) s.createCriteria(Order.class).uniqueResult();
		Assert.assertNotSame(0, persistent.getId());
		Assert.assertEquals(new OrderLine(chestnut, 20), persistent
				.getOrderLines().get(0));
		Assert.assertEquals(new OrderLine(hazelnut, 150), persistent
				.getOrderLines().get(1));
	}
}

There are a few details worth noting here, though:

  1. I marked the test @Transactional, so that I can access Session directly. In this scenario, @EnableTransactionManagement on @Configuration seems to have no effect as the test is wrapped in transaction anyway.
  2. If the test is not marked as @Transactional (sensible when it only uses @Transactional components), the transaction seems to always be committed regardless of @TransactionConfiguration settings.
  3. If the test is marked as @Transactional, @TransactionConfiguration seems to be applied by default. Even if it’s omitted the transaction will be rolled back at the end of the test, and if you want it committed you need @TransactionConfiguration(defaultRollback=false).
  4. This probably goes without saying, but the @Configuration for tests is probably different from production. Here it uses embedded H2 database, for real application I would use a test database on the same engine as production.

That’s it, just those two Java classes. No XML or twisted depedencies. Take a look at my github repository for complete code.

Hibernate Cache Is Fundamentally Broken

Simple as the title says, I dare say Hibernate second level (L2) cache is fundamentally broken. At least with clustered cache, which appears to be supported from the official docs.

There are two strong reasons for clustering, that is spreading your load over multiple servers: Scaling out and availability.

What does availability mean? Among other things, zero downtime on updates. You take one server down, update, start it up and then continue with the next. With reasonable load balancer, proxy, middleware or what have you, the clients won’t notice a thing and be seamlessly redirected to whichever server is up at the moment.

Now, what if during an update you change definition of an entity? Add or remove a field? You have old servers with old definition, updated servers with new definition, all sharing the same clustered cache. Java has serialVersionUID for it. If you serialize an object, and then deserialize it on another node with a different version, it will fail with an exception.

Since Hibernate openly advertises clustered caching, one would expect it to work just fine with this case. Unfortunately, this is not so.

What Hibernate puts in cache is a plain old array of values of individual fields. That is submitted to clustered cache. Then a node with different version loads this array from cache and tries to populate entity with different definition with it, simply copying field by field by their numeric index.

When that happens, you’re screwed.

Suppose you have this entity definition:

class User {
  int id;
  String password;
  String email;
  String login;
}

… and you’re updating to this:

class User {
  int id;
  String password;
  Timestamp passwordExpires;
  String email;
  String login;
}

The best thing that can happen is an outage. For instance, the 3rd field used to be a String. You added a Timestamp field before it and in the new definition the 3rd field is a Timestamp. Hibernate on new nodes fails on load() with ClassCastException from String to Timestamp, because cache still has the old definition.

The much worse case is data corruption. Suppose you need a User for the following transaction:

User user = session.load(User.class, 4);
user.setPasswordExpires(...);
session.update(user);

Let’s say that email was null. load() does not yield a ClassCastException, because null is a perfectly valid Timestamp. But when Hibernate loads such entity from cache, the cached entry only has 4 fields. Login is not restored and remains null. When you update(), you’re doomed. In this made up example here this would hopefully fail on a DB not-null constraint. In real life, though, it can silently save corrupted data in database and guarantee hours of very interesting debugging and restoring from backups, if not physical damage caused by your application’s misbehavior.

There’s this old piece of music called “Careful with that Axe, Eugene”. If you don’t know it, don’t bother googling. Don’t even get me started about YouTube, it needs proper sound setup and dynamics. So, here’s how it goes. Apparently boring, monotonic bass softly playing “bing, bang, bing, bang” (or D, D, D, D as Wikipedia says). Nothing happens for a few minutes, except for just as soft ambientish keyboard tones. And so on for one minute, another, then another. Then, out of the blue, an air-shattering scream. For the first time in my life I heard it in Australians’ concert, and it literally made me jump with a shot of adrenaline and panic.

That’s the experience with clustered cache and Hibernate. It’s very robust and stable. Boring. Unnoticeable. Until one day it makes you scream hard and tear your hair out.

Handle with care. Be ware. Be prepared.

Little piece of disclaimer: I don’t know if this exact example here reproduces the problem. It’s merely a made up illustration. The fields might be ordered by name, and Hibernate may refuse to restore from an array that has fewer fields than the current definition. But I have witnessed both issues in real life, and they caused much pain and cost time and money.