Martin Kleppmann’s Designing Data Intensive Applications is sure to have you quietly applauding on your couch, if your mind works anything the same as mine does (namely, assuming that everything that can go wrong will).

Modern applications are more than just one computer in front of a database, and this book takes you on a tour of concepts and techniques that will help you answer questions like:

  • What are the various flavors of NoSQL? What are they good at?
  • How do databases provide some of the guarantees we rely on?
  • How do databases provide some of the guarantees we rely on while being highly distributed?
  • Why is it so hard for computers to reach consensus? When do they need to?
  • I’ve heard of <Hadoop, Spark, Pig, Hive, H-Base, ...> But what are they for?

The book is grouped into sections that each address a theme, such as batch processing, data interchange formats, or data replication in a distributed environment. I suppose you could skip around to the chapters that interest you, but I found a straight through read was fine, and that the chapters built on each other and helped you get in the right mindset.

I also found it to be well written, and strikes a good balance between the theoretical and practical. You don’t need a PHd in computer It is also filled to the brim with links and references to other resources. For this reason, I would recommend the digital version of the book. The many links, callouts, and citations are easier to consume with the click of a button.

Paranoid Engineers, Keep This Book Under Your Pillow

Often while reading, I would think of a situation that sounded tricky to get right, such as “What happens when your single leader database cluster ends up with two leaders?”. Without fail, Kleppmann would address it in the next paragraph.

<bad data thing> can’t happen though, right?

Fire!

But it will. And now you’ll know how to prevent it next time.

Who Should Read This

You should read this book if you:

  • Write code that manipulates data
  • Operate a service that has data
  • Manage a team or product that deals with data
  • Are a raging pessimist who just wants to be told everything will be ok (with respect to your data)

In short, all computing professionals should read it.

This book will not teach you the specifics of deploying any particular database or server, and that’s ok. After reading it you’ll feel armed with a broad knowledge of techniques and ideas that will help you to tackle the problems you’ll face as your data scales and evolves. You won’t be an expert, but you’ll know which manpage to read next.

And, in my experience, knowing where to look next is half the battle.