[issue24020] threading.local() must be run at module level (doc improvement)
New submission from Ethan Furman: In order to work correctly, threading.local() must be run in global scope, yet that tidbit is missing from both the docs and the _threading_local.py file. Something like: .. note:: threading.local() must be run at global scope to function properly. That would have saved me hours of time. Thank goodness for SO! ;) ---------- assignee: docs@python messages: 241713 nosy: docs@python, ethan.furman priority: normal severity: normal status: open title: threading.local() must be run at module level (doc improvement) versions: Python 2.7, Python 3.4, Python 3.5 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
eryksun added the comment: Could you clarify what the problem is? I have no apparent problem using threading.local in a function scope: import threading def f(): tlocal = threading.local() tlocal.x = 0 def g(): tlocal.x = 1 print('tlocal.x in g:', tlocal.x) t = threading.Thread(target=g) t.start() t.join() print('tlocal.x in f:', tlocal.x) >>> f() tlocal.x in g: 1 tlocal.x in f: 0 ---------- nosy: +eryksun _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Raymond Hettinger added the comment: Also, don't use a ".. note::", regular sentences work fine, especially in documentation that is already very short. ---------- nosy: +rhettinger _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Ethan Furman added the comment: Raymond, okay, thanks. Eryksun, I've written a FUSE file system (for $DAYJOB) and when I switched over to using threads I would occasionally experience errors such as 'thread.local object does not have attribute ...'; as soon as I found the SO answer and moved the call to 'threading.local()' to the global scope, the problem vanished. To reliably detect the problem I started approximately 10 threads, each getting an os.listdir() 1,000 times of an area on the FUSE. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Paul Moore added the comment: Link to the SO answer? Does it explain *why* this is a requirement? ---------- nosy: +paul.moore _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Ethan Furman added the comment: http://stackoverflow.com/q/1408171/208880 No, it just says (towards the top): ----------------------------------
One important thing that everybody seems to neglect to mention is that writing threadLocal = threading.local() at the global level is required. Calling threading.local() within the worker function will not work.
It is now my experience that "will not work" (reliably) is accurate. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Paul Moore added the comment: That seems to merely be saying that each threading.local() object is distinct, so if you want to share threadlocal data between workers, creating local objects won't work. I think I see what the confusion is (although I can't quite explain it yet, I'll need to think some more about it) but "threading.local() needs to be run at global scope" is not accurate (for example, if I understand correctly, a class attribute which is a threading.local value would work fine, and it's not "global scope". Basically, each time you call threading.local() you get a brand new object. It looks like a dictionary, but in fact it's a *different* dictionary for each thread. Within one thread, though, you can have multiple threading.local() objects, and they are independent. The "wrong" code in the SO discussion created a new threading-local() object as a local variable in a function, and tried to use it to remember state from one function call to the next (like a C static variable). That would be just as wrong in a single-threaded program where you used dict() instead of threading.local(), and for the same reasons. I don't know what your code was doing, so it may well be that the problem you were encountering was more subtle than the one on the wont_work() function. But "threading.local() must be run in global scope" is *not* the answer (even if doing that resulted in your problem going away). ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Paul Moore added the comment: I should also say, I'll try to work up a doc patch for this, once I've got my head round how to explain it :-) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Eric Snow added the comment: FYI, I've used thread-local namespaces with success in several different ways and none of them involved binding the thread-local namespace to global scope. I don't think anything needs to be fixed here. The SO answer is misleading and perhaps even wrong. The problem it describes is about sharing the thread-local NS *between function calls*. Persisting state between function calls is not a new or mysterious problem, nor unique to thread-local namespaces. In the example they give, rather than a global they could have put it into a default arg or into a class: def hi(threadlocal=threading.local()): ... class Hi: threadlocal = threading.local() def __call__(self): ... # change threadlocal to self.threadlocal hi = Hi() This is simply a consequence of Python's normal scoping rules (should be unsurprising) and the fact that threading.local is a class (new instance per call) rather than a function (with the assumption of a singleton namespace per thread). At most the docs could be a little more clear that threading.local() produces a new namespace each time. However, I don't think even that is necessary and suggest closing this as won't fix. ---------- nosy: +eric.snow _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Paul Moore added the comment: You're right, the SO answer is simply wrong. I've posted a (hopefully clearer) answer. If anyone wants to check it for accuracy, that'd be great. Agreed this can probably be closed as "not a bug". On first reading, I thought the docs could do with clarification, but now I think that was just because I had been confused by the SO posting :-) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Changes by Eric Snow <ericsnowcurrently@gmail.com>: ---------- resolution: -> not a bug stage: -> resolved status: open -> closed type: -> behavior _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Ethan Furman added the comment: Here's a basic outline of what I was trying: ------------------------------------------- CONTEXT = None class MyFUSE(Fuse): def __call__(self, op, path, *args): global CONTEXT ... CONTEXT = threading.local() # set several CONTEXT vars ... # dispatch to correct function to handle 'op' Under stress, I would eventually get threading.local objects that were missing attributes. Points to consider: - I have no control over the threads; they just arrive wanting their 'op's fulfilled - the same thread can be a repeat customer, but with the above scenario they would/should get a new threading.local each time Hmmm... could my problem be that even though function locals are thread-safe, the globals are not, so trying to create a threading.local via a global statement was clobbering other threading.local instances? While that would make sense, I'm still completely clueless why having a single global statement, which (apparently) creates a single threading.local object, could be distinct for all the threads... unless, of course, it can detect which thread is accessing it and react appropriately. Okay, that's really cool. So I was doing two things wrong: - calling threading.local() inside a function (although this would probably work if I then passed that object around, as I do not need to persist state across function calls -- wait, that would be the same as using an ordinary, function-local dict, wouldn't it?) - attempting to assign the threading.local object to a global variable from inside a function (this won't work, period) Many thanks for helping me figure that out. Paul, in your SO answer you state: --------------------------------- Just like an ordinary object, you can create multiple threading.local instances in your code. They can be local variables, class or instance members, or global variables. - Local variables are already thread-safe, aren't they? So there would be no point in using threading.local() there. - Instance members (set from __init__ of someother method): wouldn't that be the same problem I was having trying to update a non-threadsafe global with a new threading.local() each time? It seems to me the take-away here is that you only want to create a threading.local() object /once/ -- if you are creating the same threading.local() object more than once, you're doing it wrong. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Paul Moore added the comment: On 21 April 2015 at 23:11, Ethan Furman <report@bugs.python.org> wrote:
Hmmm... could my problem be that even though function locals are thread-safe, the globals are not, so trying to create a threading.local via a global statement was clobbering other threading.local instances? While that would make sense, I'm still completely clueless why having a single global statement, which (apparently) creates a single threading.local object, could be distinct for all the threads... unless, of course, it can detect which thread is accessing it and react appropriately. Okay, that's really cool.
You're not creating a single threading object. You're creating one each call() and overwriting the old one.
So I was doing two things wrong: - calling threading.local() inside a function (although this would probably work if I then passed that object around, as I do not need to persist state across function calls -- wait, that would be the same as using an ordinary, function-local dict, wouldn't it?)
Yes, a dict should be fine if you're only using it within the one function call.
- attempting to assign the threading.local object to a global variable from inside a function (this won't work, period)
It does work, it's just there isn't *the* object, there's lots and you keep overwriting. The thread safety issue is that if you write over the global in one thread, before another thread has finished, you lose the second thread's values (because they were on the old, lost, namespace. So basically you'd see unpredictable, occasional losses of all your CONTEXT vars in a thread.
Many thanks for helping me figure that out.
(If you did :-) - hope the clarifications above helped).
Paul, in your SO answer you state: --------------------------------- Just like an ordinary object, you can create multiple threading.local instances in your code. They can be local variables, class or instance members, or global variables.
- Local variables are already thread-safe, aren't they? So there would be no point in using threading.local() there.
Not unless you're going to return them from your function, or something like that. But yes, it's unlikely they will be needed there. I only mentioned it to avoid giving any impression that "only set at global scope" was important.
- Instance members (set from __init__ of someother method): wouldn't that be the same problem I was having trying to update a non-threadsafe global with a new threading.local() each time?
You'd set vars on the namespace held in the instance variable. It's much like the local variable case, except you are more likely to pass an instance around between threads.
It seems to me the take-away here is that you only want to create a threading.local() object /once/ -- if you are creating the same threading.local() object more than once, you're doing it wrong.
Well, sort of. You can only create *any* object once :-) There seems to be a confusion (in the SO thread and with you, maybe) that threading.local objects are somehow singletons in that you "create" them repeatedly and get the same object. That's just wrong - they are entirely normal objects, that you can set arbitrary attributes on. The only difference is that each thread sees an independent set of attributes on the object. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Eric Snow added the comment: @Ethan, it may help you to read through the module docstring in Lib/_threading_local.py. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
Eric Snow added the comment: Think of threading.local this way: instances of threading.local are shared between all the threads, but the effective "__dict__" of each instance is per-thread. Basically, the object stores a dict for each thread. In __getattribute__, __setattr__, and __delattr__ it swaps the dict for the current thread into place and then does proceeds normally. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue24020> _______________________________________
participants (5)
-
Eric Snow
-
eryksun
-
Ethan Furman
-
Paul Moore
-
Raymond Hettinger