I’ve read a post recently on two ways to model data of business domain. My memory is telling me it was Ayende Rahien, but I can’t find it on his blog.
One way is full-blown object-relational mapping. Entities reference each other directly, and the O/R mapper automatically loads data for you as you traverse the object graph. To obtain Product
for an OrderLine
, you just call line.getProduct()
and are good to go. Convenient and deceptively transparent, but can easily hurt performance if you aren’t careful enough.
The other way is what that post may have called a document-oriented mapping. Each entity has its ID and its own data. It may have some nested entities if it’s an aggregate root (in domain-driven design terminology). In this case, OrderLine
only has productId
, and if you want to get the product you have to call ProductRepository.getProduct(line.getProductId())
. It’s a bit less convenient and requires more ceremony, but thanks to its explicitness it also is much easier to optimize or avoid performance pitfalls.
So much for the beforementioned post. I recently had an opportunity to reflect more on this matter on a real world example.
The Case
The light dawned when I set out to create a side project for a fairly large system that has some 200+ Hibernate mappings and about 300 tables. I knew I only needed some 5 core tables, but for the sake of consistency and avoiding duplication I wanted to reuse mappings from the big system.
I knew there could be more dependencies on things I don’t need, and I did not have a tool to generate a dependency graph. I just included the first mapping, watched Hibernate errors for unmapped entities, added mappings, checked error log again… And so on, until Hibernate was happy to know all the referenced classes.
When I finished, the absolutely minimal and necessary “core” in my side project had 110 mappings.
As I was adding them, I saw that most of them are pretty far from the core and from my needs. They corresponded to little subsystems somewhere on the rim.
It felt like running a strong magnet over a messy workplace full of all kinds of metal things when all I needed was two nails.
Pain Points
It turns out that such object orientation is more pain than good. Having unnecessary dependencies in a spin-off reusing the core is just one pain point, but there are more.
It also is making my side project slower and using too many resources – I have to map 100+ entities and have them supported in my 2nd level cache. When I’m loading some of the core entities, I also pull many things I don’t need: numerous fields used in narrow contexts, even entire eagerly-loaded entities. At all times I have too much data floating around.
Such a model also is making development much slower. Build and tests take longer, because there are many more tables to generate, mappings to scan etc.
It’s also slower for another reason: If a domain class references 20 other classes, how does a developer know which are important and which are not? In any case it may lead to very long and somewhat unpleasant classes. What should be core becomes a gigantic black hole sucking in the entire universe. When an unaware newbie goes near, most of the time he will either sink trying to understand everything, or simply break something – unaware of all the links in his context, unable to understand all links present in the class. Actually, even seniors can be deceived to make such mistakes.
The list is probably much longer.
Solution?
There are two issues here.
How did that happen?
I’m writing a piece of code that’s pretty distant from the core, but could really use those two new attributes on this core entity. What is the fastest way? Obvious: Add two new fields to the entity. Done.
I need to add a bunch of new entities for a new use case that are strongly related to a core entity. The shortest path? Easy, just reference a few entites from the core. When I need those new objects and I already have the old core entity, Hibernate will do the job of loading the new entities for me as I call the getters. Done.
Sounds natural and I can see how I could make such mistakes a few years ago, but the trend could have been stopped or even reversed. With proper code reviews and retrospectives, the team may have found a better way earlier. Having some slack and good will it may have even refactored the existing code.
Is there a better way to do it?
Let’s go back to the opening section on two ways to map domain classes: “Full-blown ORM” vs. document/aggregate style.
Today I believe full-blown ORM may be a good thing for a fairly small project with a few closely related use cases. As soon as we branch out new bigger chunks of functionality and introduce more objects, they should become their own aggregates. They should never be referenced from the core, even though they themselves may orbit around and have a direct link to the core. The same is true for the attributes of core entites: If something is needed in a faraway use case, don’t spoil the core mapping with a new field. Even introduce a new entity if necessary.
In other words, learn from domain-driven design. If you haven’t read the book by Eric Evans yet, go do it now. It’s likely the most worthwhile and influential software book I’ve read to date.