Trouble with transactions

about | archive


[ 2013-May-13 07:23 ]

I spent my PhD at MIT researching high-performance distributed databases that support transactions. I am interested in this subject because transactions make it easy to write correct programs due to two nice properties. First, either an entire set of related changes are applied or none of them are, which simplifies error handling. Second, they allow you to pretend that concurrent updates never happen, and the database sorts it out for you. My personal theory is that most web applications are riddled with concurrency bugs that transactions would prevent, but because concurrent conflicting updates are so rare in the real world, no one notices (I would love to see some concrete evidence to support or disprove this theory, if anyone wants a research project).

However, since transitioning from building databases to using databases, I've learned that transactions can actually cause problems, even if you ignore potential performance and scalability problems. I gave a lightning talk at Ricon East 2013 with my rough thoughts on this subject (PDF slides) (video, but there were technical difficulties with the slides). I would love to hear opinions about using transactions in real applications (both problems and advantages), so I can flesh this out into a full length, intelligent article. In brief, the problems caused by transactions that we have run into are:

My rough conclusion: Transactions are useful and do simplify programs. However, they don't completely eliminate the need to think, and you need to be careful about how you use them. Perhaps most importantly, I'm starting to think that most database APIs could be improved to avoid these pitfalls, and make transactions easier to use, or at least harder to use incorrectly. Any feedback is welcomed.