Weak Isolation in Relational Databases (evanjones.ca)

[ 2009-December-10 09:18 ]

The ACID model is a key feature of traditional relational database systems. The I stands for Isolation, meaning that transactions cannot see intermediate results from other transactions. The traditional definition of isolation is serializability, where the results of processing a set of transactions is equivalent to executing them one at a time in some order. Effectively, each transaction can pretend to be executing by itself, when in reality multiple transactions can run concurrently. This is very useful, since applications do not need to worry about concurrency, making it much easier to write correct code. However, most database systems do not actually provide this model. Many systems, notably Postgres and Oracle, provide Snapshot Isolation, which is weaker. The primary difference is that snapshot isolation only checks for write/write conflicts, while serializability enforces read/write conflicts. Even worse, MySQL with InnoDB by default uses what it calls the REPEATABLE READ isolation level, but it has reads which are not repeatable. Similarly, Postgres defaults to the READ COMMITTED isolation level. The good news is that if you explicitly tell MySQL to use the SERIALIZABLE isolation level, it really is serializable. This article will provide a few concrete examples of where these isolation levels differ.

Example Database

I will demonstrate the differences using both MySQL with InnoDB and Postgres. The database is composed of a single table (table00) with two integer columns (id, counter). The SQL used to create the table is: CREATE TABLE table00 (id INTEGER PRIMARY KEY, counter INTEGER);

Snapshot Isolation Write Skew

The classic difference between serializable and snapshot isolation occurs with read/write conflicts. Consider the example table with values (id = 0, counter = 1), (id = 1, counter = 1). Imagine the application has a rule where at least one counter must be non-zero. Two transactions will run, trying to set both counter values to zero, if permitted:

Connection 1	Connection 2
BEGIN;	BEGIN;
SELECT * FROM table00;	SELECT * FROM table00;
UPDATE table00 SET counter = 0 WHERE id = 0;	UPDATE table00 SET counter = 0 WHERE id = 1;
COMMIT;	COMMIT;

Both connections see two non-zero counter values, so they both decide they can set their own counter to zero. Under snapshot isolation, such as with Postgres using the SERIALIZABLE isolation level, the above sequence of operations is permitted and results in both counters being zero, violating the application's requirements. This anomaly is called write skew. Serializable isolation corrects this by either blocking the updates due to the reads holding locks, or by aborting the "incorrectly" ordered transaction. Locking, used by MySQL when using the SERIALIZABLE isolation level, will cause a deadlock to be detected in this example, aborting one transaction.

Snapshot isolation does detect write/write conflicts. If both updates above were for id = 0, the database will detect the conflict and abort one of them (MySQL with REPEATABLE READ will block). However, that may not work for applications that rely on the semantics of values from multiple rows. As another example, consider a transaction that counts the number of rows in a table, then inserts a row with that value. With serializable semantics, the table will have incrementing values (eg. 0, 1, 2, ...). With snapshot isolation, it is possible that some values will be duplicated and some will be skipped (eg. 0, 1, 1, 3, 3, 5, ...). One way to resolve this problem is to use SELECT FOR UPDATE when needed, which acquires write locks on the accessed rows. A good explanation of snapshot isolation anomalies can be found in A Read-Only Transaction Anomaly Under Snapshot Isolation (PDF).

InnoDB Repeatable Read: No Phantoms

By default, MySQL with InnoDB runs transactions at the REPEATABLE READ isolation level (see the InnoDB manual). This is supposed to mean that a transaction only reads committed values, and once it has read a value, future reads will see the same value. However, the actual implementation seems equivalent to Postgres's default READ COMMITTED isolation level, so this discussion applies to both.

According to the SQL standard, this isolation level permits phantom reads. However, phantom reads will not happen. For example, executing the following transactions does what you expect:

Connection 1	Connection 2
BEGIN;	BEGIN;
SELECT * FROM table00 WHERE id < 100;
	INSERT INTO table00 VALUES (98, 0), (99, 1);
SELECT * FROM table00 WHERE id < 100;
	COMMIT;
SELECT * FROM table00 WHERE id < 100;
COMMIT;
SELECT * FROM table00 WHERE id < 100;

In this case, the first connection always sees the original state of the database. After both transactions commit, the inserted rows are visible. This is stronger than required for REPEATABLE READ, which permits the third SELECT to return the newly inserted rows.

Write Skew

In the default REPEATABLE READ isolation level, InnoDB does not detect read/write conflicts. This is the same problems as mentioned above for snapshot isolation write skew. However, MySQL takes a unique approach to resolving write/write conflicts. Traditional snapshot isolation systems (eg. Postgres and Oracle) will abort the second writer once the first writer commits, as described above. InnoDB instead performs the update on the most recent version of the row. This can lead to suprising results. Consider a transaction that implements an increment by first reading the value with a SELECT, computing the new value, then writing it with UPDATE. This works with traditional snapshot isolation, since a conflicting UPDATE will be aborted. With InnoDB, this can "miss" increments because the UPDATE will happily overwrite the most recent value. As before, using SELECT FOR UPDATE can help avoid these problems. If MySQL is set to use SERIALIZABLE isolation (eg. by executing SET TRANSACTION ISOLATION SERIALIZABLE), conflicting read/write accesses cause transactions to block.

Updates See Latest Committed Data

The strangest anomaly with InnoDB is that reads performed as part of an update always see the most recently committed version. This is true if the update contains a read, such as SET counter = counter + 1 or if using a SELECT FOR UPDATE statement. This means that reads are not repeatable, which is not permitted by the SQL specification:

Connection 1	Connection 2
BEGIN;	BEGIN;
SELECT * FROM table00 WHERE id = 0;	SELECT * FROM table00 WHERE id = 0;
UPDATE table00 SET counter = 1 WHERE id = 0;
	SELECT * FROM table00 WHERE id = 0;
COMMIT;
	SELECT * FROM table00 WHERE id = 0;
	SELECT * FROM table00 WHERE id = 0 FOR UPDATE;
	SELECT * FROM table00 WHERE id = 0;
	COMMIT;

In this case, connection 2 sees the following sequence of counter values: 0, 0, 0, 1, 0. Hence, the reads performed as part of the SELECT FOR UPDATE violate the requirements of the repeatable read isolation level. However, once a write lock is obtained as part of an update, the transaction will always see the same value. Other transactions attempting to write the same row will block.

Postgres Read Committed

Postgres supports two isolation modes: READ COMMITTED and SERIALIZABLE. By default, it uses the read committed mode. In this mode, each statement sees a consistent snapshot of the database, as of the beginning of the query. This means that within a single transaction, each read sees the last committed value. Updates will block waiting for other writers, then will be evaluated on the most recent value. This works very similarly to InnoDB's SERIALIZABLE mode.

Conclusions

Concurrency is hard. The serializable model of transaction isolation makes it easier, because it hides much of the complexity. Unfortunately, many database systems do not provide this level of isolation by default. MySQL defaults to REPEATABLE READ, which does not actually have repeatable reads, and Postgres defaults to READ COMMITTED. I think this is a problem. Applications should explicitly use the stronger SERIALIZABLE isolation mode by default, since it is easier to understand and to write correct code. Weaker isolation levels are useful for performance under certain circumstances, but they need to be used carefully because they can produce incorrect results. This is a tricky issue, because without explicitly testing concurrent operations, the application can seem correct, only to produce occasional errors in production.