Tag Archives: The Big Picture

Systems that Run Forever Self-heal and Scale

I recently saw a great presentation by Joe Armstrong called “Systems that run forever self-heal and scale” . Joe Armstrong is the inventor of Erlang and he does mention Erlang quite a lot, but the principles are very much universal and applicable with other languages and tools.

The talk is well worth watching, but here’s a few quick notes for a busy reader or my future self.

General remarks

  • If you want to run forever, you have to have more than one instance of everything. If anything is unique, then as soon as that service or machine goes down your system goes down. This may be due to unplanned outage or routine software update. Obvious but still pretty hard.

  • There are two ways to design systems: scaling up or scaling down. If you want a system for 1,000 users, you can start with design for 10 users and expand it, or start with 1,000,000 users and scale it down. You will get different design for your 1,000 users depending on where you start.

  • The hardest part is distributing data in a consistent, durable manner. Don’t even try to do it yourself, use known algorithms, libraries and products.

    Data is sacred, pay attention to it. Web services and such frameworks? Whatever, anyone can write those.

  • Distributing computations is much easier. They can be performed anywhere, resumed or retried after a failure etc. There are some more suggestions hints on how to do it.

Six rules of a reliable system

  1. Isolation – when one process crashes, it should not crash others. Naturally leads to better fault-tolerance, scalability, reliability, testability and comprehensibility. It all also means much easier code upgrades.

  2. Concurrency – pretty obvious: you need more than one computer to make a non-stop system, and that automatically means they will operate concurrently and be distributed.

  3. Failure detection – you can’t fix it if you can’t detect it. It has to work across machine and process boundaries because the entire machine and process can’t fail. You can’t heal yourself when you have a heart attack, it has to be external force.

    It implies asynchronous communication and message-driven model.

    Interesting idea: supervision trees. Supervisors on higher levels of the tree, workers in leaves.

  4. Fault identification – when it fails, you also need to know why it failed.

  5. Live code upgrade – obvioius must have for zero downtime. Once you start the system, never stop it.

  6. Stable storage – store things forever in multiple copies, distributed across many machines and places etc.

    With proper stable storage you don’t need backups. Snapshots, yes, but not backups.

Others: Fail fast, fail early, let it crash. Don’t swallow errors, don’t continue unless you really know what you’re doing. Better crash and let the higher level process decide how to deal with illegal state.

Actor model in Erlang

We’re used to two notions of running things concurrently: processes and threads. The difference? Processes are isolated, live in different places in memory and one can’t screw the other. Threads can.

Answer from Erlang: Actors. They are isolated processes, but they’re not the heavy operating system processes. They all live in the Erlang VM, rely on it for scheduling etc. They’re very light and you can easily run thousands of them on a computer.


Much of this is very natural in functional programming. Perhaps that’s what makes functional programming so popular nowadays – that in this paradigm it’s so much easier to write reliable, fault-tolerant scalable, comprehensible systems.

Culture Kills (or Wins)

This post is about one of the things that everyone is aware of to some degree. It feels familiar, but the picture becomes a lot sharper once you put it in a proper perspective.

The Project

There is an older project created a few years ago, perhaps in 2000 or 2005. At the time it was running a single service on a single server for 100 users. The architecture and tools were adequate: One project, one interface, one process. Some shortcuts – that’s fine, you need to get something out to get the money in. Now-standard tools and techniques like TDD, IoC / DI, app servers etc. were nowhere to be seen – either because they were not needed at that time, or they did not even exist in a reasonable form (Spring, Guice, JEE in 2000 or 2005?).

Five or ten years later, the load went up by a few orders of magnitude. So did the number of features. The codebase has been growing and growing, and new features have been added all the time.

Now, let’s consider two situations.

Bad Becomes Worse

What if the architecture and process remained the same from the very beginning? Single project (now many megabytes of source code). Single process with core logic driven by eight-KLOC-long abominations. Many interesting twists related to “temporary” shortcuts from the past. No IoC. No DI. No TDD. Libraries from 2000 or 2005. No agile.

It could be for different reasons. The developers are poor, careless souls who have not improved themselves for so many years and they are not aware that different ways exist (or neglect them). Or they know there are better ways, but are drowned with feature requests.

There is little rotation in the team. A few persistent core members persist, able to navigate in this sea of spaghetti. They are still able to pump out new features. Slower and slower, but anyway. That’s probably the only reason keeping the project alive. They just neglect change, because if the thing still kind of works. Why fix something that is not broken? Why spend money on improving something that’s working?

Now we have a fairly fossilized team and project. I dare say in this shape it can only get worse. Even if the product itself was somewhat interesting and stable, would you like to change your job to join the team and work on it? Dealing with tons of legacy code, in a culture that fears change, with no modern tools at your disposal? With no way to learn anything?


Very good developers usually already have a job and there is no way they would ever quit for this. You will not be able to hire them, unless you pay insane amount of money. Even then, money is not as good a motivator as genuine passion and interest.

Who would do it? Only people with poor skill, lack of experience, desperates or those who don’t give a damn. They will take forever to get up to speed in this ever growing mudball. And because they’re not top class, chances are the project won’t get any better.

We get a nasty negative feedback loop. Bad code and culture encourages more bad code. Mediocre new hires make it even worse. And so the spiral continues.

Good Gets Better

In the second scenario, the team has some caring craftsmen. They constantly read, learn, think, explore and experiment. They observe their product and process and improve them as they recognize more adequate tools and techniques. Some time along the path they broke it down into modules and instead of a monolithic mudball got an extensible service-oriented architecture. They understood inversion of control and brought it in together with a DI container, refactoring the god classes before they grew out of control. They got higher test coverage. In short, they constantly evaluate what and how they’re doing, how they can make it better, and put that to practice.

Now this team can hire pretty much anyone they like. They may decide to hire inexperienced people with the right attitude and train them. But they also are able to attract passionates who are much above average and who will make it even better.

It creates a sweet positive feedback loop. Great culture never loses the edge and it attracts people who can only make it better.

Quality Matters

That’s why quality and refactoring matter. It’s OK to take a shortcut to get something out. It’s OK to use basic tools for a basic job. But if you never improve, the project will stagnate and rot away.

Sure, fulfilling business needs is important. Having few bugs is important. Avoiding damage (to people or the business) is important. But in the long run if you just keep cranking out features and never retrospect or pay down technical debt, it will become a nasty ever-slowing grind. If you’re lucky, it will just get slower and require some babysitting in production and emergency bug fixing. If you’re less lucky, it will become inoperable and completely unmaintainable if some of the persistent spaghetti wranglers leave or are hit by a truck.

Are We Doomed?

To end this sermon with a positive accent, let’s say that while the feedback loops are strong, they are not unbreakable. Culture change is hard in either direction, but possible. If the “bad” team or its management realizes the situation in time and starts improving, they may be able to shift to the positive loop. Introduce slack or retrospectives, start discussion and slowly, but regularly improve. And if for whatever reason you abandon good practices, letting leaders go or drowning them up to the neck with work, it will start the drift towards the negative loop.

Software for Use

Here’s confession of a full time software developer: I hate most software. With passion.

Why I Hate Software

Software developers and people around the process are often very self-centered and care more about having a good time than designing a useful product. They add a ton of cool but useless and bugged features. They create their own layers of frameworks and reinvent everything every time, because writing code is so much more fun than writing, reusing or improving it.

They don’t care about edge cases, bugs, rare conditions and so on. They don’t care about performance. They don’t care about usability. They don’t care about anything but themselves.

Examples? Firefox that has to be killed with task manager because it slows to a crawl during the day on most powerful hardware. Linux which never really cared or managed to solve the issues with drivers for end user hardware. Google maps showing me tons of hotel and restaurant names instead of street names, the exact opposite of what I want when planning a trip. Eclipse or its plugins that require me to kill the IDE from task manager, waste some more time, and eventually wipe out the entire workspace, recreate it and reconfigure.

All the applications with tons of forms, popups, dialogs and whatnot. Every error message that is a page long, has a stacktrace, cryptic code and whatever internal stuff in it. All the bugs and issues in open source software, which is made in free time for fun, rarely addressing edge cases or issues happening to a few percent users because they’re not fun.

It’s common among developers to hate and misunderstand the user. It’s common even at helpdesk, support and many people who actually deal with end users. In Polish there is this wordplay “u┼╝yszkodnik”, a marriage of “u┼╝ytkownik” (user) and “szkodnik” (pest).

What Software Really Is About

Let me tell you a secret.

The only purpose of software is to serve. We don’t live in a vacuum, but are always paid by someone who has a problem to solve. We are only paid for two reasons: To save someone money, or to let them earn more money. All the stakeholders and users care about it is solving their problems.

I’ve spent quite a few years on one fairly large project that is critical for most operations of a corporation. They have a few thousand field workers and a few dozen managers above, and only a handful of people responsible for software powering all this. Important as it is, the development team is a tiny part of the entire company.

Whenever I design a form, a report, an email or whatever that the end user will ever see, the first and most important thing to do is: Get in their shoes. Understand what they really need and what problem they are trying to solve. See how we can provide it to the them so that it’s as simple, concise, self-explanatory and usable as possible. Only then we can start thinking about code and the entire backend, and even then the most important thing to keep in mind is the end user.

We’re not writing software for ourselves. Most of the time we’re not writing it for educated and exceptionally intelligent geeks either. We write it for housewives, grandmas, unqualified workers, accountants, ladies at bookshops or insurance companies, all kinds of business people.

We write it for people who don’t care about software at all and do not have a thorough understanding of it. Nor do they care care how good a time you were having while creating it. They just want to have the job done.

You’re Doing It Wrong

If someone has to ask or even think about how something works, it’s your failure. If they perform some crazy ritual like rebooting the computer or piece of software, or wipe out a work directory, that’s your fault. If they have to go through five dialogs for a job that could be done with two clicks, or are forced to switch between windows when there is a better way, it’s your failure. When they go fetch some coffee while a report that they run 5 times a day is running, it’s your fault. If there is a sequence of actions or form entries that can blow everything up, a little “don’t touch this” red button, it’s your fault. Not the end user’s.

It’s not uncommon to see a sign in Polish offices that reads (sadly, literally): “Due to introduction of a computer system, our operations are much slower. We are sorry for the inconvenience.” Now, that’s a huge, epic failure.

Better Ways

That’s quite abstract, so let me bring up a few examples.

IKEA. I know furniture does not seem as complicated as software, but it’s not that trivial either. It takes some effort to package a cabinet or a chest of drawers in a cardboard box that can be assembled by the end user. They could deliver you some wood and a picture of cabinet, and blame you for not knowing how to turn one into another. They could deliver a bunch of needlessly complicated parts without a manual, and blame the user again.

They know they need to sell and have returning customers, not just feel good themselves and blame others.

What they do is carefully design every single part and deliver a manual with large, clear pictures and not a single line of text. And it’s completely fool-proof and obvious, so that even such a carpentry ignorant as you can assemble it.

LEGO. Some sets have thousands of pieces and are pretty complex. So complex that it would be extremely difficult even for you, craftsman proficient in building stuff, to reproduce.

Again, they could deliver 5,000 pieces and a single picture to you and put the blame on you for being unable to figure it out. Again, that’s not what what they do. They want to sell and they want you to return. So they deliver a 200-page-long manual full of pictures, so detailed and fool-proof that even a child can do it.

There are good examples in software world as well. StackOverflow is nice, but only for certain kind of users. It’s great for the Internet geeks who get the concept of upvotes, gamification, focusing on tiny narrow parts and not wider discussion etc. Much less for all kinds of scientists and, you know, regular people, who seem to be the intended audience of StackExchange.

Google search and maps (for address search, intuitiveness and performance), DuckDuckGo are pretty good. Wolfram Alpha. Skyscanner and Himpunk. Much of the fool-proof Apple hardware and software.

In other words, when you know what it does and how to use it the first time you see it, and it Just Works, it’s great.


Successful startups know it. They want to sell and if they make people think or overly complicate something, people will just walk on by. I guess many startups fail because they don’t realize it. Many established brands try to do it and learn from startups, simplifying and streamlining their UIs (Amazon, MS Office, Ebay…). It’s high time we applied it to all kinds of software, including the internal corporate stuff and open source.

After all, we’re only here to serve and solve problems of real people.

That’s the way you do it.

Two Worlds: Functional Thinking after OO

It goes without saying that functional programming is very different from object-oriented. While it may take a while to grasp, it turns out that functional programming can also lead to simpler, more robust, maintainable and testable solutions.

The First Encounter

In his classic “Clean Code” Robert C. Martin describes a great (and widely adopted) way to object oriented programming. He says code should read from top to bottom like prose. Start with the most high-level concepts and break them down into lower-level pieces. Then when you work with such a source file, you may be able to more easily follow the flow and understand how it works.

It occurred to me it’s quite different from how the natural flow of functional programming in Clojure, where typically all users are below their dependencies. I was unsure if (and how) Uncle Bob’s ideas apply on this ground. When I tweeted my doubts, his answer was: “You can forward declare”. I can, but still I was not convinced.

The light dawned when I came across Paul Graham’s essay titled “Programming Bottom-Up”. The way to go in functional programming is bottom-up, not just top-down.

Language for the Problem

Lisp (and Clojure) have minimal syntax. Not much more than parenthesis. When you look at Clojure source, it’s really dense because there is no syntax obscuring the flow. What you can see at the bottom of the file, though, appears to be program in language created just for this problem.

As Paul writes, “you don’t just write your program down toward the language, you also build the language up toward your program. As you’re writing a program you may think +I wish Lisp had such-and-such an operator.+ So you go and write it. Afterward you realize that using the new operator would simplify the design of another part of the program, and so on.”

Simplicity and Abstractness

Ending up with a language tailored for the problem is not the only nice feature of functional programming. Compared to object-oriented, functional code tends to be a lot more abstract. There are no intricate nets of heavy stateful objects that very often can’t talk to each other without adaptation and need a lot of care to weave together.

Data structures are very simple (if not minimalistic), what makes them easy to use with more functions. They also are immutable, so there is no risk of unexpected side effects. On the other hand, because of simplicity of data structures functions turn out to be much more abstract and generic, and hence applicable in broader context. Add closures and higher-order functions and get a very powerful engine with unlimited applications. Think how much you can do with map alone – and why.

Another way to look at code organization is layers or levels of abstraction. In object-oriented, it usually means that a bottom layer consists of objects which provide some functionality to the higher level. Functional programming takes it one step further: “[each layer acts] as a sort of programming language for the one above.” And if there is a need for distinction to layers, it’s only because of levels of abstraction. Much rarelier because of incompatibility and never because of handcuffing encapsulation.

Maintainability and Code Organization

We all know that in object-oriented programming most of the time you should start with the purpose at hand and go for a very concrete, specialized design. “Use before reuse” is the phrase term here. Then, when you need more flexibility you decompose and make the code more abstract. Sometimes it can be a difficult, time-consuming and bug-inducing effort. It’s not the case with functional programming. “Working bottom-up is also the best way to get reusable software. The essence of writing reusable software is to separate the general from the specific, and bottom-up programming inherently creates such a separation.”

Compared to OO, refactoring of functional code is trivial. There is no state, dependencies or interactions to worry about. No way to induce unexpected side effects. Each function is a simple construct with well-defined outcome given the direct input. Thanks to that, it’s also much easier to test (and cover with automated test suites).

The Curse of Structured Programming

Object-oriented originated from structured programming and all developers inevitably have such background. It takes some training to learn how not to write 2,000-line classes and 500-line methods. It takes much more to learn how to avoid inheritance, program to an interface, compose your code of smaller units and cover them (and their interactions) with tests.

There are ways to make object-oriented programs more likely to succeed, have fewer bugs and be more maintainable. A ton of books has been written on this, including but not limited to excellent and invaluable works by Robert C. Martin, Martin Fowler, Eric Evans, and so on, and so forth. That’s a lot of prose! It turns out that object-oriented programming actually is very difficult and needs a lot craftsmanship and attention.

Functional programming is free from many of these issues. What’s more, learning it yields great return to an object-oriented programmer. It can teach you a lot about algebraic thinking, but also breaking code down into smaller pieces, intention-revealing names and interfaces, avoiding side effects and probably many other aspects.

The Heart of Software

The heart of software is solving business problems. Have you heard it? Yes, so have I. The fact that it’s so rarely recognized or practiced is just inexplicable.

It’s not about programming. Nor is it about technology. Nor is it about languages, testing, architectures, networking, whatever. Sure, these are nice tools. But they are only tools. It’s amazing how much focus is paid to tools and how little to the problem at hand. What would you say to a carpenter who delivered wobbly (but still expensive) furniture for your dream kitchen only because the tools he had were bought at a “as much as you can carry for $10” bargain? You don’t care about his tools. You want a functional, robust product.

The purpose of frameworks, libraries and architecture is to serve, not command. When a framework imposes uncomfortable restrictions or affects the way your business logic and model is shaping up, in the best case it’s unsuitable for the problem. Don’t use it. Choosing an unsuitable or unknown tool only because it looks nice in resume is a cardinal sin deserving capital professional punishment.

Whatever you do, focus on the big picture. In case of software, it should be solving the business problem of your customer.

By no means is it a novel idea, but still it is extremely important and too rarely understood. The reason why it’s here is that it’s going to be an introduction to a series of articles on Domain-Driven Design. It’s a great approach to software development itself (like the rant above), but also to object-oriented programming and design. That’s what makes all the trendy OO* techniques shine. It’s a shame how often it is overlooked by teachers. If there’s one book every OO/business developer has to read, it’s “Domain-Driven Design” by Eric Evans.

OO* without DDD (or at least its bulk concepts) is like giving a diploma in architecture to everyone who can hold a pencil and draw on a sheet of paper. Regardless of whether what she produces is a usable, solid construction.

Why Is Science Cool?

It can be observed that not only programming, but also science, technology and engineering, are losing their popularity. Americans even consider it a national security risk.

It may be so because today is more about consumption (or taking) rather than creation (or expressing). We prefer to take everything for granted and avoid whatever seems unpleasant. Science is pretty high on the list – it requires discipline and concentration, and that seems to be too much.

What can we do? Show that science is not about books and study of no apparent purpose. Instead, show how it affects our daily life and can actually be cool. Don’t start with dull books and formulas. Show the goal and applications, fascinate, then teach and explain.

“Run, rabbit, run.
Dig that hole, forget the sun.”

I was happy to discover the TV & radio campaign which promotes maths in Poland. It hits the sweet spot. It consists of several professional short adverts which describe how mathematical concepts apply to apparently maths-free fields such as music, art, architecture, biology etc.

For instance, they tell how the quality of music is affected by theoretical intervals, or how everything in natural environment seems to follow the golden ratio. They also involve some renown professionals who describe how maths helped them in their work.

Now, if only the teachers followed. I wish they told me that before all the formulas.