What every software engineer should know about modern databases

Michał Konarski bio photo By Michał Konarski

These are the slides I use for my talk “What every software engineer should know about modern databases”, together with sources and additional links.

Welcome screen

Four database types

SQL query

ACID in relational database

  • ACID (Wikipedia)
  • M. Kleppmann, Designing Data-Intensive Applications, chapter 7. Transactions

Reasons for scalability

  • “There are various reasons why you might want to distribute a database across multiple machines: Scalability (data volume, read load, or write load) (…), Fault tolerance/high availability, (…), Latency (…)”
    M. Kleppmann, Designing Data-Intensive Applications, Part II. Distributed Data

Horizontal scaling

Partitioning and replication

  • “Partitioning is usually combined with replication so that copies of each partition are stored on multiple nodes. This means that, even though each record belongs to exactly one partition, it may still be stored on several different nodes for fault tolerance.”
    M. Kleppmann, Designing Data-Intensive Applications, chapter 5. Replication

Relational databases summary

Wide-column stores

  • Wide-column store (Wikipedia)
  • “Traditionally production systems store their state in relational databases. For many of the more common usage patterns of state persistence, however, a relational database is a solution that is far from ideal. (…) Although many advances have been made in the recent years, it is still not easy to scale-out databases or use smart partitioning schemes for load balancing.”
    2007 Dynamo paper
  • 2006 Bigtable paper

Wide-column store schema

  • Apache Cassandra stores data in tables, with each table consisting of rows and columns. CQL (Cassandra Query Language) is used to query the data stored in tables. Apache Cassandra data model is based around and optimized for querying. Cassandra does not support relational data modeling intended for relational databases.
    Cassandra documentation - Introduction to Data Modeling
  • Storage model in Bigtable

Wide-column stores summary

Document databases - ideal?

  • Mongo hits a sweet spot between the powerful queryability of a relational database and the distributed nature of other databases, like HBase. Project founder Dwight Merriman has said that MongoDB is the database he wishes he’d had at DoubleClick, where as the CTO he had to house large-scale data while still being able to satisfy ad hoc queries.
    Luc Perkins, Seven Databases in Seven Weeks
  • Was MongoDB ever the right choice?
  • Everything You Know About MongoDB is Wrong!, Myth 5. Sharding

Document databases summary

Data warehouses summary

Things I didn't mention

db48

db49

Share this post