A Quick Guide of Multithreading with Queue in Python


As described in the beginning of [1]: "The Queue module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads. The Queue class in this module implements all the required locking semantics." Therefore the implementation of multi-threading with Queue is quite handy and approachable to beginners in my opinion.
I strongly recommend the references [2][3] for some head-first understandings on this topic. Note that after python 2.4, it is in general recommended the threading module for multi-thread programming instead of the thread module in previous version. Don't get confused when you're crawling the web for information.

Also note there's a well known Global Interpreter Lock (GIL) issue in CPython [4]. We'll discuss this later.

Multi-threading with Queue in Python

1. Producer-Consumer Model

Say you want to parallelize a piece of code with multi-threads. Briefly speaking, the big idea is as follows:

a) We create some workers (consumers/threads) and wait for the tasks to arrive.
b) We build a queue, through which the main thread (producer) "feed" the tasks to the workers.
c) The workers just keep grabbing tasks one by one from the queue.
d) The main thread wait until the queue is emptied.

Let's first see how an example might look like, then I'll explain how it works. The script below creates 2 threads, with the tasks of simply printing out index numbers.
(Note: a similar example can be found in [1])
import threading, Queue

def foo(queue):
    while True:
        (argument) = queue.get()
        print argument
        queue.task_done()

numThreads = 2
myQueue = Queue.Queue() # build a queue

for i in range(numThreads): # create some workers (consumers/threads) and wait for the tasks to arrive
    worker = threading.Thread(target=foo, args=(myQueue,))
    worker.daemon = True
    worker.start()

# main thread (producer) "feed" the tasks to the workers through the queue
for index in range(5): 
    myQueue.put((index))

# main thread wait until the queue is emptied
myQueue.join()

2. Define the task

The difference between ordinary Python function and functions written for multi-threading with queue is that, the latter takes only 1 queue as its argument, and then inside the function it extracts the arguments it needs from that queue:
def foo(queue):
    while True:
        (argument) = queue.get()
Before leaving the function, don't forget to notice the queue that this task is done and can be de-queued:
        queue.task_done()

3. Create workers

We initialize a thread by threading.Thread with 2 parameters target and args. target indicates the function name the thread is going to execute, and args assigns the input argument of that function. As we said previously the input argument now is simply a queue. You can imagine a worker sitting in front of a tube, just waiting for us (the main thread) to feed the real inputs in.
    worker = threading.Thread(target=foo, args=(myQueue,))
Now it is important to assign this worker as daemon. The main behavioral difference between daemon and non-daemon threads are that, the former would terminate together with the main thread, while the latter wouldn't [5].
    worker.daemon = True
Of course, don't forget to start the thread:
    worker.start()

4. Queue and Wait

After everything's ready, now we just feed the arguments into the queue:
myQueue.put((......))
, and wait until the queue is emptied (i.e. all the tasks are done) by threads:
myQueue.join()

Discussion

Stopping a thread

Killing or terminating a thread in Python is neither recommended nor as simple as you might have expected [6]. It's still possible, though, to stop a thread by putting an threading.Event() flag into the task function argument in addition to the queue, and set the flag when needed [7]:
def foo(queue, stop_event):
    while(not stop_event.is_set()):
        ......
stop_event = threading.Event()
worker = threading.Thread(target=foo, args=(myQueue, stop_event))
......
stop_event.set()
Note, however, that if after stopping the thread you start again to send inputs into that queue, the stopped thread will be "awakened" and begin to execute the tasks.

Global Interpreter Lock (GIL)

GIL is a semaphore-like mechanism in CPython. Despite it guarantees thread safety, it's signaling overhead actually make programs run slower with multi-thread than single-thread, and slower on multi-core machine than single-core, especially for CPU bounded programs. [8]

So the bottom line is, if you're just working with I/O bounded programs and you're not writing C extension, then you should be fine [4]; otherwise you might need to turn off GIL, or just switch to multiprocessing module. [9]

Related posts

There're lots of examples on the web in which people wrap the thread object with yet another self-defined thread class. See [10][11] for examples. Also see [12] for more detailed explanation on multi-threading/multi-processing in Python.

References

[1] 8.10. Queue — A synchronized queue class
https://docs.python.org/2/library/queue.html
[2] threading – Manage concurrent threads
http://pymotw.com/2/threading/
[3] Queue – A thread-safe FIFO implementation
http://pymotw.com/2/Queue/
[4] What is a global interpreter lock (GIL)?
http://stackoverflow.com/questions/1294382/what-is-a-global-interpreter-lock-gil
[5] multithreading - Python thread daemon property
http://stackoverflow.com/questions/4330111/python-thread-daemon-property
[6] Is there any way to kill a Thread in Python?
http://stackoverflow.com/questions/323972/is-there-any-way-to-kill-a-thread-in-python
[7] python - Stopping a thread after a certain amount of time
http://stackoverflow.com/questions/6524459/stopping-a-thread-after-a-certain-amount-of-time
[8] Inside the Python GIL, by David Beazley
video: https://www.youtube.com/watch?v=ph374fJqFPE
slides: http://www.dabeaz.com/python/GIL.pdf
[9] 16.6. multiprocessing — Process-based “threading” interface
https://docs.python.org/2/library/multiprocessing.html
[10] Practical threaded programming with Python
http://www.ibm.com/developerworks/aix/library/au-threadingpython/
[11] Python Multithreaded Programming
http://www.tutorialspoint.com/python/python_multithreading.htm
[12] Concurrency in Python by Mosky
https://speakerdeck.com/mosky/concurrency-in-python

Comments

Popular posts from this blog

From Restricted Boltzmann Machine to Deep Neural Network -- Missing Links Explained

Bayesian and Frequentist -- two worldviews of machine learning