


Part 2 dives deeper into the distributed concepts such as transactions (which are obviously harder on multiple systems), replication, partitioning, consistency and consensus. Part 1 of the book explains basic concepts such as databases, data models, query languages and storage. Most of the book is spent on explaining concepts and how they impact large scale distributed data systems. I expected more of this design patterns, but unfortunately that’s not the case (in the final chapter there are some general design principles, but that’s it). The author then gives an example on how this could be solved. If they post a message, you suddenly need to update millions of timelines. At the start of the book, there’s a use case of how a distributed system could support a platform like Twitter, where some users have millions of followers.

I learned a lot about the challenges of distributed systems (scalability, transactions, consistency etc.), but for a book of which the title starts with “designing”, it doesn’t actually talk about designing that much. There’s some jargon in there, and although Martin does a great deal of effort to explain concepts thoroughly, some (basic) concepts are just left as-is. It’s quite technical, and I wouldn’t recommend it to anyone who doesn’t have a basic grasp of databases (relational or NoSQL). It’s quite a big book (around 545 pages), but I enjoyed it. The author has worked at companies such as LinkedIn, where he has built large distributed systems to handle data, so I guess he knows what he’s talking about 🙂 (he’s also a researcher at Cambridge University) The book Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann was recommended to me by a colleague.
