Monthly Archives: September 2012

Careful with def in Clojure

Let’s start with a puzzle. Let’s create a little Leiningen project called careful. Let’s set :main careful.core in project.clj and put this in careful/core.clj:

(ns careful.core
  (:gen-class))

(defn get-my-value []
  (println "Sleeping...")
  (Thread/sleep 5000)
  (println "Woke up")
  "Done")

(def my-def (get-my-value))

(defn -main [& args]
  (println "Hello, World!"))

Here’s the question: What happens when you compile this project?

And the answer is…

$ time lein2 compile
Compiling careful.core
Sleeping...
Woke up
Compilation succeeded.

real	0m7.098s
user	0m5.276s
sys	0m0.212s

Hey, my program actually printed something during compilation! And it took way too much time. I didn’t expect that.

Such an apparently innocuous def can get you in a lot of trouble. Sure, no-one puts a sleep like that, but what about:

  • Code that computes something, perhaps something taking time or space?
  • Code that loads something from the network?

I don’t yet understand why it is resolved at compile time.

But I can understand why using def in that way is not a good idea. It’s an old imperative habit. This may be a perfectly valid imperative program:

public static void main(String[] args) {
	Data data = loadDataFromInternet();
	ProcessedData proc = process(data);
	generateReport(proc);
}

You may be tempted to do it this way in Clojure:

(def data (load-data-from-internet))
(def proc (process data))
(defn -main [& args]
  (generate-report proc))

… but that’s still an imperative style and it feels wrong.

It also is real pain to test.

How about one of these equivalents?

(defn -main [& args]
  (let [data (load-data-from-internet)]
     (generate-report (process proc))))
(defn -main [& args]
   (generate-report (process (load-data-from-internet))))

In the end, I arrived at the following conclusion. You should only use def for constants, some global parameters, dynamic variables, definitions of higher order functions – that kind of static stuff. All logic and behavior belongs in functions.

Update

As djork pointed out at Reddit, it’s because def creates a var in the current namespace with specific value.

It makes some sense when you think of what defn looks like – it’s really a macro wrapping def (also pointed out by djork). And we do expect functions introduced by defn to be compiled, right. Even docs clearly state that defn is the “same as (def name (fn [params* ] exprs*))“.

I still find it very confusing, though. I wonder if I’m just abusing the language.

Second Update

I came back to it later, and I may have finally understood.

This is a perfectly valid statement:

(def my-def (get-my-value))

But what about these?

; Unexpected argument:
(def my-def (get-my-value))

; Type cast exception (vector to number)
(def my-def-2 (+ 2 []))

; Whatever invalid statement
(def my-def-2 (+ 2 +))

Should they throw a compile-time error? It makes sense, right?

Now, when you type this:

(def my-def (get-current-date))

At runtime, do you expect it to have state as of compile time, or as of run time? In other words, should it be date of compilation, or “now” at the time of execution? The latter, right?

I can see why both evaluations (at compile and run time) are needed. Depending on point of view, it’s either some sort of language fragility or developer abusing the language. Either way, the conclusion stays the same: Careful with that def, Eugene.

Discussion

Aside from this blog, there is an interesting discussion with more detail at Reddit. Thanks guys!

Domain Modeling: Naive OO Hurts

I’ve read a post recently on two ways to model data of business domain. My memory is telling me it was Ayende Rahien, but I can’t find it on his blog.

One way is full-blown object-relational mapping. Entities reference each other directly, and the O/R mapper automatically loads data for you as you traverse the object graph. To obtain Product for an OrderLine, you just call line.getProduct() and are good to go. Convenient and deceptively transparent, but can easily hurt performance if you aren’t careful enough.

The other way is what that post may have called a document-oriented mapping. Each entity has its ID and its own data. It may have some nested entities if it’s an aggregate root (in domain-driven design terminology). In this case, OrderLine only has productId, and if you want to get the product you have to call ProductRepository.getProduct(line.getProductId()). It’s a bit less convenient and requires more ceremony, but thanks to its explicitness it also is much easier to optimize or avoid performance pitfalls.

So much for the beforementioned post. I recently had an opportunity to reflect more on this matter on a real world example.

The Case

The light dawned when I set out to create a side project for a fairly large system that has some 200+ Hibernate mappings and about 300 tables. I knew I only needed some 5 core tables, but for the sake of consistency and avoiding duplication I wanted to reuse mappings from the big system.

I knew there could be more dependencies on things I don’t need, and I did not have a tool to generate a dependency graph. I just included the first mapping, watched Hibernate errors for unmapped entities, added mappings, checked error log again… And so on, until Hibernate was happy to know all the referenced classes.

When I finished, the absolutely minimal and necessary “core” in my side project had 110 mappings.

As I was adding them, I saw that most of them are pretty far from the core and from my needs. They corresponded to little subsystems somewhere on the rim.

It felt like running a strong magnet over a messy workplace full of all kinds of metal things when all I needed was two nails.

Pain Points

It turns out that such object orientation is more pain than good. Having unnecessary dependencies in a spin-off reusing the core is just one pain point, but there are more.

It also is making my side project slower and using too many resources – I have to map 100+ entities and have them supported in my 2nd level cache. When I’m loading some of the core entities, I also pull many things I don’t need: numerous fields used in narrow contexts, even entire eagerly-loaded entities. At all times I have too much data floating around.

Such a model also is making development much slower. Build and tests take longer, because there are many more tables to generate, mappings to scan etc.

It’s also slower for another reason: If a domain class references 20 other classes, how does a developer know which are important and which are not? In any case it may lead to very long and somewhat unpleasant classes. What should be core becomes a gigantic black hole sucking in the entire universe. When an unaware newbie goes near, most of the time he will either sink trying to understand everything, or simply break something – unaware of all the links in his context, unable to understand all links present in the class. Actually, even seniors can be deceived to make such mistakes.

The list is probably much longer.

Solution?

There are two issues here.

How did that happen?

I’m writing a piece of code that’s pretty distant from the core, but could really use those two new attributes on this core entity. What is the fastest way? Obvious: Add two new fields to the entity. Done.

I need to add a bunch of new entities for a new use case that are strongly related to a core entity. The shortest path? Easy, just reference a few entites from the core. When I need those new objects and I already have the old core entity, Hibernate will do the job of loading the new entities for me as I call the getters. Done.

Sounds natural and I can see how I could make such mistakes a few years ago, but the trend could have been stopped or even reversed. With proper code reviews and retrospectives, the team may have found a better way earlier. Having some slack and good will it may have even refactored the existing code.

Is there a better way to do it?

Let’s go back to the opening section on two ways to map domain classes: “Full-blown ORM” vs. document/aggregate style.

Today I believe full-blown ORM may be a good thing for a fairly small project with a few closely related use cases. As soon as we branch out new bigger chunks of functionality and introduce more objects, they should become their own aggregates. They should never be referenced from the core, even though they themselves may orbit around and have a direct link to the core. The same is true for the attributes of core entites: If something is needed in a faraway use case, don’t spoil the core mapping with a new field. Even introduce a new entity if necessary.

In other words, learn from domain-driven design. If you haven’t read the book by Eric Evans yet, go do it now. It’s likely the most worthwhile and influential software book I’ve read to date.