Python Worker Queues/Thread Pools

about | archive


[ 2005-June-27 12:27 ]

One of the most common designs for a parallel program is a work queue, also known as a thread pool. Work items are placed into a queue. A pool of worker threads processes items from the queue and places the results into an output queue, which allows them to be processed in a serial fashion. This structure is popular because it is simple to understand, permits the amount of parallelism to be scaled easily, and is easy to integrate into a serial program. It is also fairly easy to get the locking and synchronization implemented correctly, since the queues control the interaction between the threads. A diagram of this structure is shown below.

Diagram of the work queue structure

Python has a serialized queue module which is designed to be used for parallel producer/consumer structures. Unfortunately, it does not solve the entire problem. My workqueue module provides an implementation of a work queue that handles the entire job, including spawning and shutting down Python threads. I've been using it to distribute tasks over a cluster of servers using the XML-RPC module. It is a pretty simple way to parallelize a Python program. Be warned, however, that Python's thread model is pretty strange and can only concurrently execute C code, not Python code (aside: I can't seem to find any document that explains Python's threads, so maybe I need to write one).