What every software engineer should know about modern databases

Welcome screen

Four database types

Relational databases

Relational database (Wikipedia)

Relational schema

Database normalization (Wikipedia)

SQL query

modern-sql.com - latest features in SQL

ACID in relational database

ACID (Wikipedia)
M. Kleppmann, Designing Data-Intensive Applications, chapter 7. Transactions

Scalability

Scalability (Wikipedia)

Reasons for scalability

“There are various reasons why you might want to distribute a database across multiple machines: Scalability (data volume, read load, or write load) (…), Fault tolerance/high availability, (…), Latency (…)”
M. Kleppmann, Designing Data-Intensive Applications, Part II. Distributed Data

Vertical and horizontal scaling

Scalability (Wikipedia)

Horizontal scaling

Replication (wikipedia)
Partitioning (wikipedia)
M. Kleppmann, Designing Data-Intensive Applications, chapter 5. Replication
M. Kleppmann, Designing Data-Intensive Applications, chapter 6. Partitioning

Partitioning and replication

“Partitioning is usually combined with replication so that copies of each partition are stored on multiple nodes. This means that, even though each record belongs to exactly one partition, it may still be stored on several different nodes for fault tolerance.”
M. Kleppmann, Designing Data-Intensive Applications, chapter 5. Replication

Can relational database scale vertically?

Scaling Your Amazon RDS Instance Vertically and Horizontally

Can relational database scale horizontally?

Replication:

Scaling Your Amazon RDS Instance Vertically and Horizontally
(note that the horizontal part is only about replication)
High Availability, Load Balancing, and Replication in PostgreSQL

Partitioning:

Sharding in Oracle Database
Limitation in Oracle Database sharding
Citus - an extension to PostgreSQL that adds partitioning
MySQL sharding approaches? (Stack Overflow)

Relational databases summary

Wide-column stores

Wide-column store (Wikipedia)
“Traditionally production systems store their state in relational databases. For many of the more common usage patterns of state persistence, however, a relational database is a solution that is far from ideal. (…) Although many advances have been made in the recent years, it is still not easy to scale-out databases or use smart partitioning schemes for load balancing.”
2007 Dynamo paper
2006 Bigtable paper

Wide-column store schema

Apache Cassandra stores data in tables, with each table consisting of rows and columns. CQL (Cassandra Query Language) is used to query the data stored in tables. Apache Cassandra data model is based around and optimized for querying. Cassandra does not support relational data modeling intended for relational databases.
Cassandra documentation - Introduction to Data Modeling
Storage model in Bigtable

db19

Partition key and sort key

Wide-column store schema with attributes

Know your access patterns

db23

Wide-column stores summary

Document databases

Document-oriented database (Wikipedia)

Documents

Documents in MongoDB

Collections

Aggregation framework

Replication and partitioning

ACID?

Document databases - ideal?

Mongo hits a sweet spot between the powerful queryability of a relational database and the distributed nature of other databases, like HBase. Project founder Dwight Merriman has said that MongoDB is the database he wishes he’d had at DoubleClick, where as the CTO he had to house large-scale data while still being able to satisfy ad hoc queries.
Luc Perkins, Seven Databases in Seven Weeks
Was MongoDB ever the right choice?
Everything You Know About MongoDB is Wrong!, Myth 5. Sharding

Document databases summary