This book's central argument is that effectively designing data-intensive applications requires understanding the fundamental principles behind various data processing and storage technologies, rather than just navigating buzzwords. It examines the pros and cons of tools like relational databases, NoSQL datastores, stream and batch processors, and message brokers. The book teaches software engineers and architects how to apply these underlying ideas in practice to make full use of data in modern applications.
Readers will learn to identify the strengths and weaknesses of different tools and navigate the trade-offs surrounding consistency, scalability, fault tolerance, and complexity. It also covers understanding the distributed systems research that underpins modern databases and offers insights into the architectures of major online services, enabling more effective system operation and informed decision-making.
Key concepts
- Scalability — The ability of a system to handle a growing amount of work.
- Consistency — Ensuring that data remains accurate and uniform across different parts of a system.
- Reliability — The ability of a system to perform its intended function correctly and consistently over time.
- Fault tolerance — The ability of a system to continue operating even when one or more of its components fail.
- Relational databases — A type of database that uses tables to store data in structured ways.
- NoSQL datastores — A category of databases that do not use the traditional table-based relational model.