Tag Archives: Life

Learning to Fail

Back at university, when I dealt with much low-level problem solving and very basic libararies and constructs, I learned to pay attention to what can possibly go wrong. A lot. Implementing reliable, hang-proof communication over plain sockets? I remember it today, a trivial loop of “core logic” and a ton of guards around it.

Now I suspect I am not the only person who got used to all the convenient higher-level abstractions so much that he began to forget this approach. Thing is, real software is a little bit more complex, and the fact that our libraries kind of deal with most low-level problems for us doesn’t mean there are no reasons to fail.


As I’m reading “Release It!” by Michael T. Nygard, I keep nodding in agreement: Been there, done this, suffered that. I’ve just started, but it’s already shown quite a few interesting examples of failure and error handling.

Michael describes a spectacular outage of an airline software. Its experienced designers expected many kinds of failures and avoided many obvious issues. There was a nice layered architecture, with proper redundancy on every level from clients and terminals, through servers, through database. All was well, yet on a routine maintenance in database the entire system just hung. It did not kill anyone, but delayed flights and serious financial losses have an impact too.

The root cause turned out to be one swallowed exception on servers talking to the database, thrown by JDBC driver when the virtual IP of the database server was remapped. If you don’t have proper handling for such situations, one such leakage can lock the entire server as all of its threads wait for the connection or for each other. Since there were no proper timeouts anywhere in the server or above, eventually everything hung.

Now it’s easy to say: It’s obvious, thou shalt not swallow exceptions, you moron, and walk on. Or is it?

The thing is, an unexpected or improperly handled error can always happen. In hardware. Or a third party component. Or core library of your programming language. Or even you or your colleague can screw up and fail to predict something. It. Just. Happens.

Real Life

Let’s take a look at two examples from real life.

Everyone gets in the car thinking: I’m an awesome driver, accidents happen but not to me. Yet somehow we are grateful for having airbags, carefully designed crumple zones, and all kinds of automatic systems that prevent or mitigate effects of accidents.

If you were offered two cars at the same cost, which would you choose? One is in pimp-my-ride style with extremely comfortable seats, sat TV, bright pink wheels and whatever unessential features. But it breaks down every so often based on its mood or the moon cycle, and would certainly kill you if you hit a hedgehog. The other is just comfortable enough, completely boring, no cool features to show off at all. But it will serve you 500,000 kilometers without a single breakdown and save your life when you hit a tree. Obvious, right?

Another example. My brother-in-law happens to be a construction manager at a pretty big power plant. He recently took me on a trip and explained some basics on how it works, and one thing really struck me.

The power station consists of a dozen separate generation units and is designed to survive all kinds of failures. I was impressed, and still am, that in power plant business it’s normal to say stuff like: If this block goes dark, this and this happens, that one takes over, whatever. No big deal. Let’s put it in a perspective. A damn complicated piece of engineering that can detect any potentially dangerous conditions, alarm, shut down and fail over just like that. From small and trivial things like variations in pressure or temperature, through conditions that could blow the whole thing up. And it is so reliable that when people talk about such conditions, rare and severe as they are, they say it in the same tone as “in case of rain the picnic will be held at Ms. Johnson’s”.

Software Again

In his “After the Disaster” post, Uncle Bob asked: “How many times per day do you put your life in the hands of an ?if? statement written by some twenty-two year old at three in the morning, while strung out on vodka and redbull?”

I wish it was a rhetorical question.

We are pressed hard to focus on adding shiny new features, as fast as possible. That’s what makes our bosses and their bosses shine and what brings money to the table. But not only them, even we (the developers) naturally take most pride in all those features and find them the most exciting part of our work.

Remember that we’re here to serve. While pumping out features is fun, remember that those people simply rely on you. Even if you don’t directly cause death or injury, your outages can still affects lives. Think more like a car or power station designer, your position is really closer to theirs than to a lone hippie who’s building a little wobbly shack for himself.

When an outage happens and also causes financial loss, you will be to blame. If that reasoning does not work, do it for yourself – pay attention now to avoid pain in future, be it regular panic calls at 3 AM or your boss yelling at you.

More Stuff

Michael T. Nygard ends that airline example with a very valuable advice. Obvious as it may seem, it feels different if you realize it and engrave it deep in your mind. Expect failure everywhere, and plan for it. Even if your tools handle some failures, they can’t do everything for you. Even if you have at least two of each thing (no single point of failure), you can still suffer from bad design. Be paranoid. Place crumple zones on every integration point with other systems, and even different components of your system, in order to prevent cracks from propagating. Optimistic monoliths fail hard.

Want something more concrete? Go read “Release It!”, it’s full of great and concrete examples. There’s a reason why it fits in a book and not in a blog post.

Two Stories of Ignaz Semmelweis

Here’s two versions of the story of Ignaz Semmelweis. You may have heard it as it’s quite popular in the programming & craftsmanship community. I heard the two versions about a year apart and together they’re even more interesting.

First Story

Ignaz Semmelweis was a Hungarian physician. He believed that by washing their hands consistently and well enough, doctors can significantly reduce the infection and mortality rate among mothers and their newborn children.

Unfortunately, those findings were in conflict with what most doctors believed at the time. They were prejudiced and had their old, established point of view and would not listen to the poor man.

Semmelweis did not manage to convince his peers to his findings. He died a miserable death in an asylum.

Second Story

Ignaz Semmelweis was a Hungarian physician. He believed that by washing their hands consistently and well enough, doctors can significantly reduce the infection and mortality rate among mothers and their newborn children.

Unfortunately, in the process of achieving his superior results, he managed to alienate or offend his entire profession. Things eventually got so bad that his colleagues started to deliberately avoid washing their hands just to defy him.

Semmelweis was a genius but he was also a lunatic, and that made him a failed genius. He died a miserable death in an asylum.

Which is True?

I heard the first story for the first time on the last year’s 33rd Degree. The way my inner wannabe-craftsman understood it at that time was: poor guy ahead of his time was tormented to death by his prejudiced, petrified community. He was right, and his stubborn peers made his life miserable.

I read the second version in Apprenticeship Patterns by Dave Hoover and Adewale Oshineye. Here the lesson is completely different. There is no way you can get your point across if you alienate or offend your peers. You have to be very patient and careful, especially if your ideas do not go hand in hand with what the community believes at the point.

Even if your peers are willing to listen, and you have convincing evidence or arguments, you can never play the one and only enlightened man in the world. It goes the other way round, too. Learn to listen, even if some ideas don’t exactly go hand in hand with your point of view. See if there is any value you can take from them. And even if you disagree, have some respect and don’t play the “we know better, be gone ye stupid lunatic” role.

I suspect both versions of the story are true. People are full of judgement and prejudice. Such is human nature. It’s rarely a wall that you can take down with a ram. More often you find yourself sitting on a giant elephant and trying to convince it to go in a specific direction. The only way you can succeed is preparing and cutting the way through the jungle and pointing the elephant into the opening.

This elephant rider metaphor comes from an awesome talk by Venkat Subramaniam. Heck, I would say his presence alone makes 33rd Degree 2012 a must-see.

Why Is Science Cool?

It can be observed that not only programming, but also science, technology and engineering, are losing their popularity. Americans even consider it a national security risk.

It may be so because today is more about consumption (or taking) rather than creation (or expressing). We prefer to take everything for granted and avoid whatever seems unpleasant. Science is pretty high on the list – it requires discipline and concentration, and that seems to be too much.

What can we do? Show that science is not about books and study of no apparent purpose. Instead, show how it affects our daily life and can actually be cool. Don’t start with dull books and formulas. Show the goal and applications, fascinate, then teach and explain.

“Run, rabbit, run.
Dig that hole, forget the sun.”

I was happy to discover the TV & radio campaign which promotes maths in Poland. It hits the sweet spot. It consists of several professional short adverts which describe how mathematical concepts apply to apparently maths-free fields such as music, art, architecture, biology etc.

For instance, they tell how the quality of music is affected by theoretical intervals, or how everything in natural environment seems to follow the golden ratio. They also involve some renown professionals who describe how maths helped them in their work.

Now, if only the teachers followed. I wish they told me that before all the formulas.