Databases should encrypt unique IDs (evanjones.ca)

[ 2013-May-28 11:54 ]

Traditionally, SQL databases provide auto-increment unique ids or sequences. This means each table's ids follow the sequence 1, 2, 3, …. This has one big advantage: the database uses efficient "appends" to B-Tree indexes (amortized O(1) instead of O(log_b(N)), since you can avoid the search. It also keeps nodes full). However, it has a few disadvantages. First, id sequences are repeated across tables, so it is easy to use the wrong id in your code (e.g. accidentally use a user id to do a lookup in the photos table). Second, identifiers are guessable, so if you are missing an access control check, users can just change the parameter to get access to someone else's data (called insecure direct object references or authorization bypass through user-controlled key). Finally, third-parties can learn information about your system by observing the ids (e.g. you can estimate the number of accounts on Github by looking up the id of an account you just created, or learn the time a MongoDB item was created). One way to make these errors more difficult is to use random ids, which need to be much larger than sequential ids (e.g. 128-bits/16 bytes), and require slow random inserts. A second solution is to make IDs unique for each user, which requires writing application-level code. I think the best solution would be for databases to automatically encrypt ids with a unique key per table.

You could implement this at the application level: Create a table in your database to store keys. For each table in your database, generate a unique 128-bit AES key. Each time you retrieve an object from that table, encrypt the id with the key (which outputs a pseudo-random integer of the same length). When you do a query on the id column, decrypt the id value(s). This provides many of the same benefits of random ids, but allows you to use smaller values, and the underlying database still uses efficient sequential ids. However, the best part is that databases could hide this from the application, so new applications would use pseudo-random unique ids. You could even introducing this change in a backwards-compatible fashion to existing applications, although it would require some tricks with id ranges.

A security-paranoid variant would be to generate unique keys for each session. This would ensure that ids can't be accidentally re-used across sessions. However, this requires thinking about the "lifetime" or "scope" for ids that are returned to external systems, so it can't be done without an application developer thinking carefully about it. This is basically what some experts recommend, and is effectively equivalent to using what are sometimes called "indirect reference maps."