threading issues with statcache

Tim Peters tim.one at home.com
Sat Jan 27 21:05:29 EST 2001


[posted & mailed]

[Randall Kern]
> Looking at the code for the statcache module from py 1.5.2, it
> looks like it isn't thread safe.

It is if you serialize all calls to it yourself <wink>.

> While writing my own substitute, I realized I am unclear
> on when names are looked up.

Every time they're (dynamically) referenced.  No exceptions (e.g., global,
local, builtin are all the same in this respect; call "len(x)" inside a
loop, and "len" is looked up anew on each iteration -- and so is "x", for
that matter).

> In particular, given two function like these (copied from statcache):
>
> cache = {}
> def stat(path):
>     if cache.has_key(path):
>         return cache[path]
>
>     cache[path] = ret = os.stat(path)
>     return ret
>
> def reset():
>     global cache
>     cache = {}
>
>
> If the symbol 'cache' is looked up _once_ per function,

It is not.

> then these two functions may be used across multiple threads.  If
> it is looked up for every reference, than it would be possible
> to call reset() between a TRUE has_key() and the return in stat(),
> which would result in a KeyError.

Yup!  Good eye.  Looks like statcache is full of insecurities like that.
Some of them are easy to fix; e.g.,

def stat(path):
    ret = cache.get(path, None)
    if ret is None:
        cache[path] = ret = os.stat(path)
    return ret

I'll try to make time to fix this stuff for 2.1a2 (btw, 1.5.2 is ancient --
move up to 2.0!  it's good practice for upgrading to 2.1, in which statcache
will be thread-safe <wink>).

> When are global variable's bound?

Sorry, don't think I understand this question.  Any vrbl, whether global or
local, is bound when and only when a binding stmt is executed in which the
vrbl appears as a binding target.  If it would make your life easier,
consider changing the body of reset to:

def reset():
    cache.clear()

Then nothing in statcache will ever rebind the name "cache" -- but you'd
still be vulnerable to all the same race conditions (i.e., the rebinding is
not the cause of the problems, it's that *content* may disappear between the
time one stmt thinks it exists and a later stmt *acts* on that belief).

unraveling-the-thread-ly y'rs  - tim





More information about the Python-list mailing list