Monthly Archives: January 2015

Java 8 Streams API as Friendly ForkJoinPool Facade

One of features I love the most about Java 8 is the streams API. It finally eliminates pretty much all loops from the code and lets you write code that is so much more expressive and focused.

Today I realized it can be used for something else: As a nice front-end for the ForkJoinPool.

Problem: Executors Boilerplate

Let’s say we want to run a number of tasks in parallel. Nothing fancy, let’s say each of them just prints out the name of the executing thread (so we can see it run in parallel). We want to resume execution after they’re all done.

If you want to run a bunch of tasks in parallel using an ExecutorService, you probably need to do something like the following:

ExecutorService executor = Executors.newCachedThreadPool();
for (int i = 0; i < 5; i++) {
    executor.submit(() -> System.out.println(Thread.currentThread()));
}
executor.shutdown();
try {
    executor.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException ex) {
    // TODO handle...
}

Now, that is a lot of code! But we can do better.

Solution: Stream API

In the end I came up with this utility:

void doInParallelNTimes(int times, Runnable op) {
    IntStream.range(0, times).parallel().forEach(i -> op.run());
}

Reusable and all. Call it like:

doInParallelNTimes(5, () -> System.out.println(Thread.currentThread()));

Done.

This one prints out the following. Note that it’s actually using the main thread as well – since it’s held hostage anyway and cannot resume until execution finishes.

Thread[main,5,main]
Thread[ForkJoinPool.commonPool-worker-1,5,main]
Thread[main,5,main]
Thread[ForkJoinPool.commonPool-worker-3,5,main]
Thread[ForkJoinPool.commonPool-worker-2,5,main]

Another Example: Parallel Computation

Here’s another example. Instead of doing the same thing N times, we can use the stream API to process a number of different tasks in parallel. We can create (“seed”) a stream with any collection or set of values, have a function executed on them in parallel, and finally aggregate the results (collect to a collection, reduce to a single value etc.)

Let’s see how we could calculate a sum of the first 45 Fibonacci numbers:

public class Tester {
    public static void main(String[] args) {
        Stopwatch stopwatch = Stopwatch.createStarted();
        IntStream.range(1, 45).parallel().map(Tester::fib).sum();
        System.out.println("Parallel took " + stopwatch.elapsed(MILLISECONDS) + " ms");

        stopwatch.reset();
        stopwatch.start();
        IntStream.range(1, 45).map(Tester::fib).sum();
        System.out.println("Sequential took " + stopwatch.elapsed(MILLISECONDS) + " ms");
    }

    private static int fib(int n) {
        if (n == 1 || n == 2) {
            return 1;
        } else {
            return fib(n - 1) + fib(n - 2);
        }
    }
}

Prints out:

Parallel took 3078 ms
Sequential took 7327 ms

It achieves a lot in a single line of code. First it creates a stream with descriptions of all the tasks that we want to run in parallel. Then it calls a function on all of them in parallel. Finally it returns the sum of all these results.

It’s not all that contrived. I can easily imagine creating a stream with arbitrary values (including rich Java objects) and executing a nontrivial operation on them. It doesn’t matter, orchestrating all that would still look the same.

When to do it?

I think this solution is pretty good for all the cases when you know the load upfront, and you want to fork execution to multiple threads and resume after they’re all done. I needed this for some test code, but it would probably work well in many other fork/join or divide-and-conquer scenarios.

Obviously it does not work if you want to run something in background and resume execution or if you want to have a background executor running over a long period of time.

Human Error?

I’ve just watched Sidney Decker’s “System Failure, Human Error: Who’s to Blame” talk from DevOpsDays Brisbane 2014. It’s a very nice and worthwhile talk, though there is some noise.

You can watch it here:

It covers a number of interesting stories and publications from the last 100 years of history related to failures and disasters, their causes and prevention.

Very quick summary from memory (but the video surely has more depth):

  • Shit happens. Why?
  • Due to human physical, mental or moral weaknesses – claim from early XX century, repeated till today.
  • One approach (equated to MBA): these weak and stupid people need to be told what to do by the more enlightened elites.
  • Bad apples – 20% people are responsible for 80% accidents. Just find them and hunt them down? No, because it’s impossible to account for different conditions of every case. Maybe the 20% bus drivers with the most accidents drive in busy city center? Maybe the 20% doctors with most patient deaths are infant surgeons – how can we compare them to GPs?
  • Detailed step-by-step procedures and checklists are very rarely possible. When they are, though, they can be very valuable. This happens mostly in industries and cases backed by long and thorough research – think piloting airplanes and space shuttles, surgery etc.
  • Breakthrough: Maybe these humans are not blame? Maybe the failures are really a result of bad design, conditions, routine, inconvenience?
  • Can disasters be predicted and prevented?
  • Look for deviations – “bad” things that are accepted or worked around until they become the norm.
  • Look for early signs of trouble.
  • Design so that it’s harder to do the wrong thing, and easier and more convenient to do the right thing.

A number of stories follows. Now, this is a talk from DevOps conference, and there are many takeaways in that area. But it clearly is applicable outside DevOps, and even outside software development. It’s everywhere!

  • The most robust software is one that’s tolerant, self-healing and forgiving. Things will fail for technical reasons (because physics), and they will have bugs. Predict them when possible and put in countermeasures to isolate failures and recover from them. Don’t assume your omnipotence and don’t blame the others, so it goes. See also the Systems that Run Forever Self-heal and Scale talk by Joe Armstrong and have a look at the awesome Release It! book by Michael Nygard.
  • Make it easy for your software to do the right thing. Don’t randomly spread config in 10 different places in 3 different engines. Don’t require anyone to stand on two toes of their left foot in the right phase of the moon for doing deployment. Make it mostly run by itself and Just Work, with installation and configuration as straightforward as possible.
  • Make it hard to do the wrong thing. If you have a “kill switch” or “drop database” anywhere, put many guards around it. Maybe it shouldn’t even be enabled in production? Maybe it should require a special piece of config, some secret key, something very hard to happen randomly? Don’t just put in a red button and blame the operator for pressing it. We’re all in the same team and ultimately our goal is to help our clients and users win.

The same principles apply to user interface design. Don’t randomly put in a bunch of forms and expect the users to do the right thing. If they have a workflow, learn it and tailor the solution that way. Make it harder for end users to make mistakes – separate opposite actions in the GUI, make the “negative” actions harder to execute.

Have a look at the above piece of Gmail GUI. Note how the “Send” button is big and prominent, and how far away and small the trash is. There is no way you could accidentally press one when you meant the other.

Actually, isn’t all that true for all the products that we really like to use, and the most successful ones?