Dictionaries and Threads

Johannes Stezenbach yawyi at gmx.de
Sat May 27 08:28:32 EDT 2000


David Bolen <db3l at fitlinxx.com> wrote:
>aahz at netcom.com (Aahz Maruch) writes:
>> In article <3926A8EF.C3C1914B at jslove.net>, Jay Love  <jsliv at jslove.net> wrote:
>> >
>> >Are dictionary lookups threadsafe?
>> >
>> >ie, can I lookup and retrieve an item in a dictionary while another
>> >thread is adding an item?
>> 
>> In general yes, if you do it in single lines of code:
>
>Actually, I've been wondering about this sort of thing myself.  If I
>understand correctly, under normal execution the interpreter will
>relinquish control and permit a context switch every 'n' bytecode
>instructions (n is configurable, but defaults to something like 10).
>
>So is this suggestion just because the average line of code can be
>encoded in less than 10 bytecodes (but what if the first 9 bytecodes
>were used up in prior instructions) or is there some underlying
>protection going on with data access?

If I analysed the implementation of eval_code2() (in Python/ceval.c)
correctly, then
- thread switches can occur after every single byte code if
  PyThreadState.ticker has run to zero (ticker then is reset
  to the value provided by sys.setcheckinterval)
- there is no thread switch protection within a single line of python code
- thread switches can also occur after signal handlers have been
  invoked, independent of sys.setcheckinterval
- setting sys.setcheckinterval temporarily to some high value in
  an attempt to protect the following lines from thread switches
  does not work, because the new checkinterval won't be used before
  the current PyThreadState.ticker has run to zero.

Now to the question wheter dict.get(key, default) is thread save:

>>> def f(d, k):
...   return d.get(k, None)
... 
>>> import dis
>>> dis.dis(f)
          0 SET_LINENO          1

          3 SET_LINENO          2
          6 LOAD_FAST           0 (d)
          9 LOAD_ATTR           1 (get)
         12 LOAD_FAST           1 (k)
         15 LOAD_GLOBAL         3 (None)
         18 CALL_FUNCTION       2
         21 RETURN_VALUE   
         22 LOAD_CONST          0 (None)
         25 RETURN_VALUE   
>>> 

The critical operation is the CALL_FUNCTION, which is a single
opcode which calls the builtin dict_get(), which in turn calls
PyObject_Compare(), which might call back into python code for
__cmp__ of __rcmp__ which might allow other threads to run...
But this does not necessarily mean that this operation is unsafe,
it only means that dict.get() might return something which another
thread has just removed from dict, which is OK in the
presence of a race condition.

Conclusion:
While all this details about Python thread safety are interesting,
they are IMHO irrelevant for Python programmers. If code like:
>>> if dict.has_key(key):
...     return dict[key]
can fail with a KeyError then you need to use semaphors to protect
dict. Or better, change your algorithm / program structure
so this can't happen.
In no case will you get core dumps from Python because of thread
safety violations.

Johannes




More information about the Python-list mailing list