Unlike most CS courses, there is no "one answer" to understanding distributed systems. Such systems are always designed with a particular purpose in mind and the purpose often decides the design and tradeoffs.
What does this mean for learning distributed systems?
- Teaching or understanding the subject much more complicated. There is no one system to use as "the example" or as "the place to start learning".
- Textbooks are often incomplete. The system or purpose specific books only cover relevant topics and general books can end up too abstract or too theoretical.
- Since there are no clear beginning and ending to learning, it can be difficult to tell when to start reading material covering the next level of difficulty.
Learning a technical subject is difficult enough. With the above constraints, figuring out what to read or watch can be confusing. I've curated some of the material that I think best covers this subject for a particular technical level.
For the Novice
Video: Distributed Systems in One Lesson by Tim Berglund- This video is an excellent tutorial and starting point.
- Alternative for those who prefer reading: A Thorough Introduction to Distributed Systems
Book: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann
- Big Data has become one of the big drivers of systems design. This book does a good job covering systems design from that approach.
Book: Web Scalability for Startup Engineers by Artur Ejsmont
- Disclaimer: I have not read this book yet.
Articles: Base DS
- Some good article providing an alternative perspective on the design patterns of distributed systems.
What Next?
So, maybe you've just gotten started or read up on distributed systems. Where do you go next? There is no one right answer, but there are many resources available. Try one or all of them and see what works best for you:
- Read/watch something from the "Advanced" section below
- Study a specific system. You can find lots of information on the more popular distributed systems, like Hadoop.
- Start contributing to an Open Source project
Intermediate to Advanced Resources
Hopefully, you know the best way you learn or know the restrictions you have on time to learn and the list of resources below can help you get further on your path to learning.
Textbooks
I haven't found a comprehensive textbook yet. These are the best non-specialized textbooks I've found:
- Synchronization Algorithms and Concurrent Programming, Gadi Taubenfield
- Designing Distributed Systems, Brendan Burns
- Distributed Systems: Principles and Paradigms, Andrew Tanenbaum & Maarten Van Steen
- Principles of Distributed Database Systems, Third Edition, Tamer Ozsu, Patrick Valduriez
- Guide to Reliable Distributed Systems, Kenneth Birman
- Network Distributed Computing: Fitscapes and Fallacies by Max K. Goff
Conference Videos
Videos are nice. They are usually around an hour and cover a specific subject of interest. There are also times where a verbal explanation is much more clear than several pages of text. Below is my list of conferences and the like that I think have a decent amount of videos covering distributed systems:@Scale (Videos from 2016 and later) (YouTube videos from 2015 and earlier)
- This is currently my favorite set of conferences. There are several tracks of talks and you can almost always find something of interest.
- USENIX hosts a lot of conferences. For issues related to distributed systems, FAST, LISA, NSDI, and SRECon have the most relevant videos,
GOTO Conferences
General Articles
Cloud Design Patterns from Microsoft Azure docs
- The "Challenges in Cloud Development" section is a good summary of real production distributed systems challenges and the patterns they list are a very nice alternative view of distributed systems.
The "Classics"
Once you have some good experience with distributed systems, you'll start to see common patterns in both the design and in the mistakes. Some of the classic articles on the subject cover these succinctly, but hide many years of wisdom behind the truths.
Eight Fallacies of Distributed Computing
- This is the first of the classics. There are many good follow up articles that try to explain (1, 2) and disprove (1) these fallacies in better detail.