“Programming Concurrency on the JVM”

A few years ago when I took concurrency classes pretty much everything I was told was that in java synchronized is key. That’s the way to go, whenever you have multithreading you have to do it this way, period. I also spent quite a while solving many classic and less classic concurrency problems using only this construct, reimplementing more fancy locks using only this construct, preventing deadlocks, starvation and everything.

Later in my career I learned that is not the only way to go, or at least there are those fancy java.util.concurrent classes that take care of some stuff for you. That was nice, but apparently I never took enough time to actually stop and think how those things work, what they solve and why.

The light dawned when I started reading Programming Concurrency on the JVM: Mastering Synchronization, STM, and Actors by Venkat Subramaniam.

The book starts with a brief introduction on why concurrency is important today with its powers and perils. It quickly moves on to a few examples of different problems: IO-intensive task like calculating size of a large directory, and computationally intensive task of calculating prime numbers. Once the ground is set, it introduces three approaches to concurrent programming.

The first way to do it is what I summed up in the first paragraph, and what Venkat calls the “synchronize and suffer” model. Been there, done that, we know how bad it can get. This approach is called shared mutability, where different threads mutate shared state concurrently. It may be tamed (and a few ways to do it are shown in the book), but is a lot harder than it seems.

Another approach is isolated mutability, where each mutable part of state is only accessed by one thread. Usually this is the actor based concurrency model. The third way is pure immutability where there simply is no mutable state. That is typical for functional programming.

In the following chapters the book explores each of those areas in depth. It briefly explains the Java memory model nad shows what options for dealing with shared mutability and coordinating threads exist in core Java. It clearly states why the features from Java 5 are superior to the classic “synchronize and suffer” and describes locks, concurrent collections, executors, atomic references etc. in more detail. This is what most of us typically deal with in our daily Java programming, and the book is a great introduction to those modern (if old, in a way) APIs.

That’s about one third of the book. The rest is devoted to much more interesting, intriguing and powerful tools: software transactional memory and actors.

Sometimes we have to deal with shared mutability, and very often we need to coordinate many threads accessing many variables. The classic synchronization tools don’t have proper support for it: Rolling back changes and preventing one thread from seeing uncommited changes of another is difficult, and most likely they lead to coarse-grained locks which basically lock everything while a thread is mutating something.

We know how relational databases deal with it with their ACID transactional model. Software transactional memory is just that but applied to memory, with proper atomicity, consistency and isolation of transactions. If one thread mutates a transactional reference in transaction, another will not see it until that transaction is committed. There is no need for any explicit locks as the libraries (like Akka or Clojure) monitor what variables you access and mutate in transaction and apply locking automatically. They even can rollback and retry the transaction for you.

Another approach is isolated mutability, a.k.a. actors, best demonstated on Akka. Each actor runs in a single thread and all it can do is receive or pass messages. This is probably closest to the original concept of object-oriented programming (recommended reading by Michael Feathers). You have isolated cells that pass messages to each other, and that’s it. When you have a task to execute, you spawn actors and dispatch it to them as immutable messages. When they’re done, they can call you back by passing another message (if the coordinator is also an actor), or if you’re not that pure you can wait for the result. Either way, eveything is neatly isolated in scope of a single thread.

Lengthy as this summary/review is, it really does not do justice to the book. The book itself is dense with valuable information and practical examples, which are as close to perfection as possible: There are a few recurring problems which are fairly simple and easy to grasp, solved over and over again with different techniques and different languages. There are many examples in Java, Scala, Groovy, Clojure and JRuby, dealing with libraries such as the core Java API, Clojure, Akka, GPars… In a few words, a ton of useful stuff.

Last but not the least, it’s excellently written. If anyone has seen Venkat in real life, this book is all like him – entertaining, but also thought-provoking, challenging and inspiring. It reads like a novel (if not better than some of them) and is very hard to put down until you’re done.

Highly recommended.

Ring Handlers – Functional Decorator Pattern

During our last pairing session with Jacek Laskowski on Librarian there was a brief moment of confusion over Ring handlers. We struggled for a short while trying to figure out what order to put them in and what it really means to have code like:

(def app
  (-> routes
    ; sandbar
    (auth/with-security security-policy log-in)
	; compojure helper that includes a few Ring handlers
    site
	; sandbar again
    session/wrap-stateful-session))

It didn’t take us long to figure it out and the solution turns out to be a very elegant functional flavor of decorator.

It’s easy to dive too deep without proper understanding (and that’s what I admittedly did). Let’s start from the very beginning and see what these bits really mean. For starters, here’s a very basic app in plain Ring that simply returns the entire request:

(defn my-handler [request]
    {:body (str request)})

(def app my-handler)

When I hit http://localhost:3000/?my_param=54 in my browser, in return I get:

{:remote-addr "0:0:0:0:0:0:0:1",
 :scheme :http,
 :request-method :get,
 :query-string "my_param=54",
 :content-type nil,
 :uri "/",
 :server-name "localhost",
 :headers {"cookie" "__utma=111872281.60059650.1328613066.1328726201.1328785442.5; __utmz=111872281.1328613066.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)", "connection" "keep-alive", "accept-encoding" "gzip, deflate", "accept-language" "pl,en-us;q=0.7,en;q=0.3", "accept" "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "user-agent" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0", "host" "localhost:3000"},
 :content-length nil,
 :server-port 3000,
 :character-encoding nil,
 :body #<Input org.mortbay.jetty.HttpParser$Input@7c04703c>}

Note that my_param made it to :query-string, but obviously it’s quite inconvenient at this point and that’s not what we really want to deal with.

What is app at this point? No magic here, it’s just a very simple and boring function.

Let’s move on and add one of the seemingly magic Ring handlers – ring.middleware.params/wrap-params:

(def app
  (wrap-params my-handler))

This time for the same URL I get the same map, with a few new entries:

{:remote-addr "0:0:0:0:0:0:0:1",
 ; Trimmed for brevity
 :query-params {"my_param" "54"},
 :form-params {},
 :query-string "my_param=54",
 :params {"my_param" "54"}}

I can see that the wrapper added a few new entries: :query-params, :form-params and :params. Great, just like it was supposed to.

Now, what is app at this point? Just like before, it’s a regular function of request. So what does wrap-params really do? Let’s take a peek at (parts of) its source:

(defn wrap-params
  [handler & [opts]]
  (fn [request]
    (let [request  (if (:query-params request)
                     request
                     (assoc-query-params request))]
      (handler request))))

assoc-query-params is no magic, it simply parses query params and merges it with the request map.

Now let’s take a close look at the last line and back at wrap-params signature. Here’s what’s really going on:

  1. wrap-params takes handler (which is a function of request) as argument. In our case, it’s the trivial function that returns request in body.
  2. It then performs some work, in this case rebinding request to a map with a few more entries.
  3. Eventually it calls the handler that it got as parameter with the augmented request map.

In other words, wrap-params takes a handler function, and returns a function that performs some extra work and invokes the original handler.

Does it look familiar? Yup, it’s the old good decorator pattern. Do some work, then pass control on to the next handler (which can also be a decorator and delegate it further). In this case, though, it’s astonishingly simple (compared to what it takes in Java).

Now let’s say I want to chain one more handler that relies on the previous one. Let’s say I dislike strings and want to map params by Clojure keywords. There’s a handler for it: ring.middleware.keyword-params/wrap-keyword-params.

No need to think too long, let’s jump in and use it:

(def app
  (wrap-keyword-params (wrap-params my-handler)))

… and I get:

{; Trimmed for brevity
 :params {"my_param" "54"}}

Whoops, that’s not what I expected. wrap-keyword-params was supposed to create a map with keys as keywords, not strings. Why didn’t it work?

Naive intuition tells me to treat wrappers as function calls. I wrap my-handler in wrap-params and pass the result of this invocation to wrap-keyword-params, right? Wrong!

Take a look at a sample wrapper above (wrap-params) and think what we were trying to do. What I really created here is a reversed chain like:

  1. Given a request, map its :params into keywords (wrap-keyword-params).
  2. Then pass control to the next function in chain, wrap-params. It parses query string and adds :params map to request.
  3. Then pass control to my-handler which prints the whole thing to browser

Nothing happens in the first step, because :params does exist at this point – it’s only created by wrap-params in the second step.

If we reverse it, it works like expected:

(def app
  (wrap-params (wrap-keyword-params my-handler)))
{; Trimmed for brevity
 :params {:my_param "54"}}

To recap, a few things to remember from this lesson:

  • In functional programming, the decorator pattern is elegantly represented as a higher order function. I find it much easier to grasp than the OO flavor – in Java I would need an interface and 3 implementing classes for the job, greatly limiting (re)usability and readability.
  • In case of Ring wrappers, typically we have “before” decorators that perform some preparations before calling the “real” business function. Since they are higher order functions and not direct function calls, they are applied in reversed order. If one depends on the other, the dependent one needs to be on the “inside”.

Sequential ID Generation in Congomongo

By default MongoDB uses 12-byte BSON id for objects. For some reason I wanted to use an increasing sequence of integers.

The samples in documentation are in JSON and I was not sure how they translate to the Java driver. The JSON sample looks like:

function counter(name) {
    var ret = db.counters.findAndModify({query:{_id:name}, update:{$inc : {next:1}}, "new":true, upsert:true});
    // ret == { "_id" : "users", "next" : 1 }
    return ret.next;
}

db.users.insert({_id:counter("users"), name:"Sarah C."}) // _id : 1
db.users.insert({_id:counter("users"), name:"Bob D."}) // _id : 2

After some googling I found an implementation in Java. Just as I expected, it’s much longer and completely different.

public static String getNextId(DB db, String seq_name) {
    String sequence_collection = "seq"; // the name of the sequence collection
    String sequence_field = "seq"; // the name of the field which holds the sequence

    DBCollection seq = db.getCollection(sequence_collection); // get the collection (this will create it if needed)

    // this object represents your "query", its analogous to a WHERE clause in SQL
    DBObject query = new BasicDBObject();
    query.put("_id", seq_name); // where _id = the input sequence name

    // this object represents the "update" or the SET blah=blah in SQL
    DBObject change = new BasicDBObject(sequence_field, 1);
    DBObject update = new BasicDBObject("$inc", change); // the $inc here is a mongodb command for increment

    // Atomically updates the sequence field and returns the value for you
    DBObject res = seq.findAndModify(query, new BasicDBObject(), new BasicDBObject(), false, update, true, true);
    return res.get(sequence_field).toString();
}

Not much later I checked docs and source code for Congomongo and discovered fetch-and-modify. I rewrote the Java sample above to Clojure and later polished it using code from this commit by Krzysztof Magiera. In the end my sequence generator looks like this:

(defn next-seq [coll]
  (with-mongo db
    (:seq
	  (fetch-and-modify :sequences {:_id coll} {:$inc {:seq 1}} :return-new? true :upsert? true))))

(with-mongo db
  (insert! :books {:author "Adam Mickiewicz" :title "Dziady" :_id (next-seq :books)}))

The raw call to insert! could be wrapped in a function or macro to save some boilerplate if there are more collections. For instance:

(defn insert-with-id [coll el]
  (insert! coll (assoc el :_id (next-seq coll))))

(with-mongo db
  (insert-with-id :books {:author "Adam Mickiewicz" :title "Dziady"}))

In some circles this probably is common knowledge, but it took me a while to figure it all out.

Connection Management in MongoDB and CongoMongo

I decided to take the opportunity offered by Jacek Laskowski (in Polish) and take a closer look at interaction with MongoDB in Clojure. It has a nice, challenging learning curve as I haven’t done much practical work in Clojure and I’ve never actually dealt with Mongo before. Double win – learning two interesting things at a time.

The obvious choice for the integration is CongoMongo. It’s really easy to get it all set up and working. The official docs encourage you to simply do this:

(def conn (make-connection "mydb")

(set-connection! conn)

(insert! :robots {:name "robby"})

(fetch-one :robots)

; ... and so on

Easy. Too easy and comfortable. Coming from the old good and heavy JDBC/SQL I felt uneasy with the connection management. How does it work? Does it just open a connection and leave it dangling in the air the whole time? Might be good for a quick spike in REPL, but not for a real application which needs concurrency, is supposed to be running for days and weeks, and so on. How do you maintain it properly?

clojure.contrib.sql has with-connection. That opens the connection, runs something with it and then eventually closes it. CongoMongo has with-mongo, but all it does is bind the argument to *mongo-config* and execute body. Nothing is ever opened or closed.

That seemed insane and broken, until I took a step back and compared source of CongoMongo to documentation of underlying Java driver for MongoDB. The light dawned.

What make-connection really does is create an instance of Mongo and DB (if database name was provided). The result of this function is plain map: {:mongo #<Mongo>, :db #<DB>}.

Javadoc for Mongo say it’s a database connection with internal pooling. For most application, you should have 1 Mongo instance for the entire JVM. A page dedicated to Java Driver Concurrency explains it in more detail: The Mongo object maintains an internal pool of connections to the database. For every request to the DB (find, insert, etc) the java thread will obtain a connection from the pool, execute the operation, and release the connection..

At first I thought CongoMongo docs were misleading. The truth is, it’s just a wrapper for the Java driver. It’s fair for it to assume you know the basic principles of the underlying driver.

So what is called a “connection” here (the Mongo class) is in fact a much more sophisticated object. It maintains a connection pool and creates nice little DB objects for all the data handling, which in turn are smart enough to maintain the actual low-level connections for you. No ceremony, just gets out of the way as quickly as possible and lets you get the job done.

This is amazingly simple and elegant compared to JDBC / JEE / SQL. I guess I soon will be scratching my head over ACID, but at the moment I’m pleasantly surprised with the look of things.

Fill and Print an Array in Clojure

In a post in the “Extreme OO in Practice” series (in Polish), KozioĊ‚ek used the following example: Fill a 3x3x3 matrix with subsequent natural numbers and print it to stdout.

The example in Java is 27 lines of code (if you remove comments). It is later refactored into 55 lines that are supposed to be more readable, but personally I can’t help but gnash my teeth over it.

Anyway, shortly after reading that example, I thought of what it would look like in Clojure. Here it is:

(def array (take 3 (partition 3 (partition 3 (iterate inc 1)))))
(doall (map #(doall (map println %)) array))

That’s it. I know it’s unfair to compare this to Java and say that it’s shorter than your class and main() declaration. But it is, and it makes the full-time Java programmer in me weep. Anyway, I have two observations.

As I like to point out, Java is awfully verbose and unexpressive. Part of the story is strong typing and the nature of imperative programming. But then again, sometimes imperative programming is very clumsy.

Secondly, this solution in Clojure presents a completely different approach to solving the problem. In Java you would declare an array and then iterate over it in nested loops, incrementing the “current value” in every innermost iteration. In Clojure, you start with an infinite sequence of lazy numbers, partition (group) it in blocks (that’s one row), then group those blocks into a 2D array, and take 3 of those 2D blocks as the final 3D matrix. Same thing with printing the array. You don’t iterate over individual cells, but naturally invoke a function on a sequence.

The difference is subtle, but very important. Note how the Java code needed 4 variables for the filling (“current value” and 3 indexes) and 3 indexes for printing. There was a very clear distinction between data and what was happening to it. In Clojure you don’t need to bother with such low-level details. Code is data. Higher-order functions naturally “serve as” data in the first line, and are used to iterate over this data in the second.

Java Does Not Need Closures. Not at All.

I hope that most Java developers are aware of the Swing threading policy. It makes sense, except for that it’s so insanely hard to follow. And if you do follow it, I dare say there is no place with more ridiculous boilerplate.

Here’s two examples that recently got me scratching my head, gnashing my teeth, and eventually weep in disdain.

Case One: Display Download Progress

class Downloader {
    download() {
        progress.startDownload();
        while(hasMoreChunks()) {
            downloadChunk();
            progress.downloadedBytes(n);
        }
        progress.finishDownload();
    }
}

class ProgressDisplay extends JPanel {
    JLabel label;
    startDownload() {
        SwingUtilities.invokeLater(new Runnable() {
            public void run() {
                label.setText("Download started");
            }
        });
    }
    downloadedBytes(int n) {
        SwingUtilities.invokeLater(new Runnable() {
            public void run() {
                label.setText("Downloaded bytes: " + n);
            }
        });
    }
    finishDownload() {
        SwingUtilities.invokeLater(new Runnable() {
            public void run() {
                label.setText("Download finished");
            }
        });
    }
}

Solution? Easy! Just write your own JLabel wrapper to hide this mess. Then a lengthy JTable utility. Then a JProgressBar wrapper. Then…

Case Two: Persist Component State

Task: Save JTable state to disk. You’ve got to read table state in EDT, but writing to disk from EDT is not the the best idea.

Solution:

ExecutorService executor = Executors.newSingleThreadExecutor();

void saveState(final Table table) {
    SwingUtilities.invokeLater(new Runnable() {
        public void run() {
            TableColumns state = dumpState(table);
            executor.execute(new Runnable() {
                public void run() {
                    saveToDisk(table.getTableKey(), state);
                }
            });
        }
    });
}

Two inner classes, inheritance, all those publics and voids and parentheses. And there is no way to avoid that!

In a sane language, it could look like:

(defn save-state [table]
  (do-swing
    (let [state (dump-state table)]
      (future (save-to-disk table state)))))

All semantics. Little ceremony, almost no syntax. If you counted characters in both solutions, I guess the entire second snippet is as long as the first one somewhere to the invocation of invokeLater. Before you get to the actual logic. No strings attached, do-swing and future are provided by the language or “official” utils.

Bottom Line

I recently saw two presentations by Venkat Subramaniam on polyglot programming where he mercilessly made fun of Java. In his (vaguely remembered) words, when you write such code for the whole day, you come back home and weep. You can’t sleep because you’re tormented by the thought of the abomination you created, and that you will have to deal with tomorrow. If your children asked you what do you for living, would like them to see this?

Yaclot 0.1.2: Extended Date Conversions

I released a new version of my Clojure conversion and map transformation library to Clojars. It includes two new features:

1. Added natural conversion between Long and Date.

2. You can pass a collection of formats for converting String to Date. It attempts parsing with each of the formats and returns first result which didn’t throw ParseException.

(convert
  "2/12/11"
  (using-format ["yyyy-MM-dd" "M/dd/yy"]
    (to-type java.util.Date)))
; => #<Date Sat Feb 12 00:00:00 CET 2011>

Enjoy!

“Processing in Clojure” Made Functional

Invited to nitpicking by Jacek Laskowski in his latest post titled Processing in Clojure, I decided to write my own version of his flower-drawing application. I find it an interesting challenge and illustration for my previous post about functional thinking and the use of language and decomposition.

Full source code is below, but let’s start with what I did not like and decided to rewrite.

I find Jacek’s code rather structural and hard to comprehend. Everything happens in one big function and it takes quite a while to understand what it does and how. It’s hard to see why angle is an angle, and variables called a and value don’t make it any easier.

Thus my first two improvements are: Break it down into smaller pieces and use more informative names. For instance, value is in fact scale. I extracted drawing flower to a separate function with clear distinction to drawing petals and the central piece. I also paid more attention to variable names and hid the more complex concepts behind simple, well-named functions.

The second type of improvements is use of functional features. After a while of careful inspection I observed that angle is a simple linear sequence, and scale is result of a simple function on this sequence. I decided to replace it with a dedicated infinite sequence.

In the end, this code is longer and has more levels of abstraction, but I think it’s more functional and comprehensible. Functions are much shorter. Details no longer obscure the view, but are hidden behind more descriptive names. Now you can actually see that the applet’s draw draws a flower which consists of petals and the central piece. The central piece is a simple circle, while petals are several circles of random size and color around the center. Then only if you want you can delve into details of how their color and size are generated.

Like I mentioned in the first paragraph, I find it a nice illustration of what Paul Graham describes as building a language for the solution.

Yaclot 0.1.0 Released (Clojure Conversion and Record Transformation Library)

I released Yaclot 0.1.0 to Clojars. This version can be used to convert single values as well as records.

Converting individual values looks like:

(convert "2011-02-12" (to-type java.util.Date))
; => #<Date Sat Feb 12 00:00:00 CET 2011>

(convert "2/12/11"
    (using-format "M/dd/yy" (to-type java.util.Date)))
; => #<Date Sat Feb 12 00:00:00 CET 2011>

(convert 5000.42 (to-type String (using-format "%,.2f")))
; => "5,000.42"

Order of using-format and to-type doesn’t matter. using-format is optional (default for dates is yyyy-MM-dd).

You can also use it to convert records in a single operation:

(map-convert
  {:dt "2011-02-12" :int 42 :label "Label"}    ; Record
  {:dt  (to-type java.util.Date)               ; Desired types
   :int (to-type String)})
; => {:dt #<Date Sat Feb 12 00:00:00 CET 2011>, :int "42", :label "Label"}

The project originated from a pet web application, where I needed to deal with parsing and formatting numbers and dates between the “presentation”, “business logic” and “database” layers.

To learn more about motivation, features and syntax, take a look at the draft in my previous post.

This version supports conversions between String, Date and numeric types. Next on the road map is exception-free error handling with local bindings and validation.

See also:

Designing Yaclot: Generic Clojure Conversion Library

In real Clojure applications you often cannot avoid explicit type conversions, even though the language does not require you to explicitly specify types all the time. Another great feature of Clojure is its universal data structure: most of the time you don’t need anything more than a simple map or record. However, these records often need to be a bit different in different areas of the applications.

Rationale

Examples? You may have a java.sql.Date in database and backend, but your web front end provides you with a String. Or your backend uses rational numbers, but you need to format them into nice Strings for presentation.

Sometimes you operate on whole records. From database you may pull the following:

(def db-sample
  {:date    2011-02-01 ; java.sql.Date
   :balance 2042.00    ; java.math.BigDecimal
   :credit  1000.00    ; java.math.BigDecimal
   :roi     2.13 })    ; java.math.BigDecimal

… for display you would like to use:

(def presentation-sample
  {:date    "Feb 1, 2011" ; String
   :balance "$2,042.00"   ; String
   :credit  "$1,000.00"   ; String
   :roi     "2.13%" })    ; String

… and you need to support input from a web form as:

(def form-sample
  {:date    "2011-02-01"  ; String
   :balance "2042"        ; String
   :credit  "1000"        ; String
   :roi     nil })        ; Rational calculated
                          ; by back-end from other fields

All three have very similar structure, but implementing all these transformations can be pain.

Planned API for generic converter

I am implementing a library that will do it for you, with API similar to the following:

(def db-fmt
  {:date    (to-type java.sql.Date)
   :balance (to-type java.math.BigDecimal)
   :credit  (to-type java.math.BigDecimal)
   :roi     (to-type java.math.BigDecimal) })

(def presentation-fmt
  {:date    (using-format "MMM d, yyyy" (to-type String))
   :balance (using-format "$%.2f"       (to-type String))
   :credit  (using-format "$%.2f"       (to-type String))
   :roi     (using-format "%.2f%%"      (to-type String)) })

(def form-fmt
  {:date    (using-format "yyyy-M-d" (to-type java.sql.Date))
   :balance                          (to-type java.math.BigDecimal)
   :credit                           (to-type java.math.BigDecimal) })

(map-convert db-sample presentation-fmt)
=> ; (similar to presentation-sample)

(map-convert form-sample db-fmt)
=> ; (similar to db-sample)

Another feature is generic conversion function for individual values:

(convert "2011-02-12" (to-type java.util.Date))
; => #&lt;Date Sat Feb 12 00:00:00 CET 2011&gt;

(convert (java.util.Date. 111 1 12) (to-type String))
; => "2011-02-12"

(convert 42 (to-type String))
; => "42"

(convert "2/12/11" (using-format "M/dd/yy" (to-type java.util.Date)))
; => #&lt;Date Sat Feb 12 00:00:00 CET 2011&gt;

Ideas for the Future

In the future, this library could support pre- and post-conversion validation. For instance, check that balance is not empty and date is in the correct format before conversion, and validate that balance and credit are positive once they are numbers.

Another idea is using it as a base for yet another HTML form manipulation library. I found existing libraries somewhat disappointing as they imposed to many restrictions on me. I would like to have the ability to manipulate and lay out forms as I please, and only use the bits of the library that I need right now.

Current Status

Currently much of the API and features are designed, but only the above presented part is implemented (not even supporting formats for numbers, only for dates). Once I implement conversions between all popular/basic types I will mark it 0.1 and push to Clojars. Error handling and validation are next on the road map.

The code is available at github.

Feel free to share any comments or ideas.