Designing data-intensive applications : the big ideas behind reliable, scalable, and maintainable systems

Author / Creator
Kleppmann, Martin, author
Available as

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and mainteinability. In addition...

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and mainteinability. In addition, we have an overwhelming variet of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive gjuide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.

Checking for availability...

Martin Kleppmann
  • Sebastopol, CA : O'Reilly Media, 2017
  • ©2017
Physical Details
  • 1 online resource (xix, 590 pages)
9781449373320, 1491903066, 9781491903063, 1449373321, 9781491903117, 1491903104, 9781491903100, 1491903112
ocn976434277, ocn893895983, on1290492286

  • Copyright; Table of Contents; Preface; Who Should Read This Book?; Scope of This Book; Outline of This Book; References and Further Reading; O'Reilly Safari; How to Contact Us; Acknowledgments; Part I. Foundations of Data Systems; Chapter 1. Reliable, Scalable, and Maintainable Applications; Thinking About Data Systems; Reliability; Hardware Faults; Software Errors; Human Errors; How Important Is Reliability?; Scalability; Describing Load; Describing Performance; Approaches for Coping with Load; Maintainability; Operability: Making Life Easy for Operations; Simplicity: Managing Complexity
  • Evolvability: Making Change EasySummary; Chapter 2. Data Models and Query Languages; Relational Model Versus Document Model; The Birth of NoSQL; The Object-Relational Mismatch; Many-to-One and Many-to-Many Relationships; Are Document Databases Repeating History?; Relational Versus Document Databases Today; Query Languages for Data; Declarative Queries on the Web; MapReduce Querying; Graph-Like Data Models; Property Graphs; The Cypher Query Language; Graph Queries in SQL; Triple-Stores and SPARQL; The Foundation: Datalog; Summary; Chapter 3. Storage and Retrieval
  • Data Structures That Power Your DatabaseHash Indexes; SSTables and LSM-Trees; B-Trees; Comparing B-Trees and LSM-Trees; Other Indexing Structures; Transaction Processing or Analytics?; Data Warehousing; Stars and Snowflakes: Schemas for Analytics; Column-Oriented Storage; Column Compression; Sort Order in Column Storage; Writing to Column-Oriented Storage; Aggregation: Data Cubes and Materialized Views; Summary; Chapter 4. Encoding and Evolution; Formats for Encoding Data; Language-Specific Formats; JSON, XML, and Binary Variants; Thrift and Protocol Buffers; Avro; The Merits of Schemas
  • Modes of DataflowDataflow Through Databases; Dataflow Through Services: REST and RPC; Message-Passing Dataflow; Summary; Part II. Distributed Data; Chapter 5. Replication; Leaders and Followers; Synchronous Versus Asynchronous Replication; Setting Up New Followers; Handling Node Outages; Implementation of Replication Logs; Problems with Replication Lag; Reading Your Own Writes; Monotonic Reads; Consistent Prefix Reads; Solutions for Replication Lag; Multi-Leader Replication; Use Cases for Multi-Leader Replication; Handling Write Conflicts; Multi-Leader Replication Topologies
  • Leaderless ReplicationWriting to the Database When a Node Is Down; Limitations of Quorum Consistency; Sloppy Quorums and Hinted Handoff; Detecting Concurrent Writes; Summary; Chapter 6. Partitioning; Partitioning and Replication; Partitioning of Key-Value Data; Partitioning by Key Range; Partitioning by Hash of Key; Skewed Workloads and Relieving Hot Spots; Partitioning and Secondary Indexes; Partitioning Secondary Indexes by Document; Partitioning Secondary Indexes by Term; Rebalancing Partitions; Strategies for Rebalancing; Operations: Automatic or Manual Rebalancing; Request Routing
Check for Hathi data