Thread-Safe Random Numbers (evanjones.ca)

[ 2009-February-11 21:36 ]

The C standard library function rand is not recommended to be used because older implementations were not very random. The newer function random is typically a better choice, particularly since you can call setstate to give it a large state array, which helps to generate better random numbers. However, neither of these are guaranteed to be thread-safe. That is, if you call them simultaneously from multiple threads, your random numbers may not be very random. In particular, Mac OS X and FreeBSD exhibit this behavior, as shown here by a simple test program (FreeBSD implementation). On Linux, rand and random acquire a mutex to ensure this doesn't happen (Linux implementation). This has its own disadvantage: it means that random could be a bottleneck for a multi-threaded program.

To avoid these issues, use a random number generator that allows passing in the state. This way, you can keep one state value per thread, and generate good quality random numbers without requiring synchronization. However, the C standard library does not provide any choices. The most portable choice is to use rand_r. However, this only maintains an unsigned int worth of state, which is not sufficient for "good" random numbers. On Linux, glibc provides random_r, which is ideal as it lets you use a large per-thread state. However, Mac OS X and FreeBSD do not provide it. Another alternative is to use nrand48, which exists on all the modern Unix platforms that I have tried (Solaris, Mac OS X, FreeBSD, Linux). While it only keeps 48-bits of state, that is probably good enough for most applications.

To test the thread safety of the random number generators, I wrote a small test program which spawns a number of child threads, each of which seed and generate a set of numbers. It then compares all the numbers that are generated and counts the number of times a match is found. For the parameters I used, a perfect random number generator would expect to find 3.5 matches (p = 4.65661e-10). The table below shows the actual number of matches, and the measured probability.

Random function	Linux	Mac OS X
`random`	3 (p = 3.99138e-10)	257 (p = 3.41928e-08)
`rand`	5 (p = 6.66523e-10)	37622 (p = 5.00546e-06)
`rand_r`	0 (p = 0)	0 (p = 0)
`nrand48`	6 (p = 7.98276e-10)	2 (p = 2.66092e-10)

An interesting note is that on both Mac OS X and Linux, rand_r produces zero matches very reliably, even though it should produce an average of 3.5. I think this is due to a combination of the way I am seeding the generator and the algorithm that is used. Either way, it seems to suggest that you should not use rand_r. My conclusion: It seems like nrand48 is ideal for my application.

Unfortunately, nrand48 is generally not completely thread-safe. If one thread calls lcong48 in order to change the generator parameters, this update affects nrand48 which could be unsafe. This is easy enough to avoid in smaller applications. However, in glibc's implementation, the first call to nrand48 initializes the parameters in a unsafe way (see drand48-iter.c). I believe that despite the race between initializing and reading the generator parameters, this should be harmless. At worst, your first few numbers will not be as "random" as they should. Eventually all processors should see the initialized value, and then will function correctly. On FreeBSD and Mac OS X (it uses FreeBSD's C library), nrand48 seems safe, unless you call lcong48 simultaneously (see _rand48.c). However, the Solaris implementation acquires a lock, and thus it should not be used in multi-threaded programs (see drand48.c). I will continue to use nrand48 in small test programs, but it seems that the best solution for truly portable applications is to include your own random number generator implementation, and not rely on the system's C library.

Test program: rand_thread.c