Issue 14417: consequences of new dict runtime error

Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers. I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method. Would it be possible to modify the fix so that the lookup is retried a non-trivial but finite number of times, so that normal code will work and only pathological code will break? I know that I really don't want to think about having to audit the (significantly threaded) application I'm currently working on to make sure it is "3.3 safe". Dict lookup operations are *common*, and we've never had to think about whether or not they were thread-safe before (unless there were inter-thread synchronization issues involved, of course). Nor am I sure the locking dict type suggested by Jim on the issue would help, since a number of the dicts we are using are produced by library code. So we'd have to wait for those libraries to be ported to 3.3.... --David

On Thu, Mar 29, 2012 at 12:58 PM, R. David Murray <rdmurray@bitdance.com> wrote:
Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers.
I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method.
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
Would it be possible to modify the fix so that the lookup is retried a non-trivial but finite number of times, so that normal code will work and only pathological code will break?
FWIW a similar approach was rejected as a fix for the hash DoS attack.
I know that I really don't want to think about having to audit the (significantly threaded) application I'm currently working on to make sure it is "3.3 safe". Dict lookup operations are *common*, and we've never had to think about whether or not they were thread-safe before (unless there were inter-thread synchronization issues involved, of course). Nor am I sure the locking dict type suggested by Jim on the issue would help, since a number of the dicts we are using are produced by library code. So we'd have to wait for those libraries to be ported to 3.3....
Agreed that this is somewhat scary. -- --Guido van Rossum (python.org/~guido)

On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum <guido@python.org> wrote:
On Thu, Mar 29, 2012 at 12:58 PM, R. David Murray <rdmurray@bitdance.com> wrote:
Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers.
I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method.
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key. Are there any other places in Python where substituting a duck-typed Python class or a Python subclass can cause a runtime error in previously working code?
Would it be possible to modify the fix so that the lookup is retried a non-trivial but finite number of times, so that normal code will work and only pathological code will break?
FWIW a similar approach was rejected as a fix for the hash DoS attack.
Yes, but in this case the non-counting version breaks just as randomly, but more often. So arguing that counting here is analogous to counting in the DoS attack issue is an argument for removing the fix entirely :) The counting version could use a large enough count (since the count used to be infinite!) that only code that would be having pathological performance anyway would raise the runtime error, rather than any code (that uses python __eq__ on keys) randomly raising a runtime error, which is what we have now. --David

On Thu, 29 Mar 2012 16:31:03 -0400, "R. David Murray" <rdmurray@bitdance.com> wrote:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum <guido@python.org> wrote:
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
I just did a quick grep on our project. We are only defining __eq__ and __hash__ a couple places, but both are objects that could easily get used as dict keys (there is a good chance that's *why* those methods are defined) accessed by more than one thread. I haven't done the audit to find out :) The libraries we depend on have many more definitions of __eq__ and __hash__, and we'd have to check them too. (Including SQLAlchemy, and I wouldn't want that job.) So our intuition that this is not common may be wrong. --David

On 03/29/2012 04:48 PM, R. David Murray wrote:
On Thu, 29 Mar 2012 16:31:03 -0400, "R. David Murray"<rdmurray@bitdance.com> wrote:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum<guido@python.org> wrote:
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
I just did a quick grep on our project. We are only defining __eq__ and __hash__ a couple places, but both are objects that could easily get used as dict keys (there is a good chance that's *why* those methods are defined) accessed by more than one thread. I haven't done the audit to find out :)
The libraries we depend on have many more definitions of __eq__ and __hash__, and we'd have to check them too. (Including SQLAlchemy, and I wouldn't want that job.)
So our intuition that this is not common may be wrong.
--David _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/animelovin%40gmail.com
Hm, as far I understand this seems like an issue for gnu PTH, not python job, which should transparently handles thread safety issues based on the host/machine capabilities. Therefore I hope the fix in python don't affect thread-unsafe apps to raise spurious RuntimeErrors when a dict get modified across a SMP-aware platform... :-)

On Thu, Mar 29, 2012 at 1:48 PM, R. David Murray <rdmurray@bitdance.com> wrote:
On Thu, 29 Mar 2012 16:31:03 -0400, "R. David Murray" <rdmurray@bitdance.com> wrote:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum <guido@python.org> wrote:
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
I just did a quick grep on our project. We are only defining __eq__ and __hash__ a couple places, but both are objects that could easily get used as dict keys (there is a good chance that's *why* those methods are defined) accessed by more than one thread. I haven't done the audit to find out :)
Of course, that doesn't mean they're likely to be used as keys in a dict that is read and written concurrently by multiple threads.
The libraries we depend on have many more definitions of __eq__ and __hash__, and we'd have to check them too. (Including SQLAlchemy, and I wouldn't want that job.)
So our intuition that this is not common may be wrong.
But how often does one share a dictionary between threads with the understanding that multiple threads can read and write it? Here's a different puzzle. Has anyone written a demo yet that provokes this RuntimeError, without cheating? (Cheating would be to mutate the dict from *inside* the __eq__ or __hash__ method.) If you're serious about revisiting this, I'd like to see at least one example of a program that is broken by the change. Otherwise I think the status quo in the 3.3 repo should prevail -- I don't want to be stymied by superstition. -- --Guido van Rossum (python.org/~guido)

On Sun, Apr 1, 2012 at 2:09 AM, Guido van Rossum <guido@python.org> wrote:
Here's a different puzzle. Has anyone written a demo yet that provokes this RuntimeError, without cheating? (Cheating would be to mutate the dict from *inside* the __eq__ or __hash__ method.) If you're serious about revisiting this, I'd like to see at least one example of a program that is broken by the change. Otherwise I think the status quo in the 3.3 repo should prevail -- I don't want to be stymied by superstition.
I attached an attempt to *deliberately* break the new behaviour to the tracker issue. It isn't actually breaking for me, so I'd like other folks to look at it to see if I missed something in my implementation, of if it's just genuinely that hard to induce the necessary bad timing of a preemptive thread switch. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, 01 Apr 2012 03:03:13 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sun, Apr 1, 2012 at 2:09 AM, Guido van Rossum <guido@python.org> wrote:
Here's a different puzzle. Has anyone written a demo yet that provokes this RuntimeError, without cheating? (Cheating would be to mutate the dict from *inside* the __eq__ or __hash__ method.) If you're serious about revisiting this, I'd like to see at least one example of a program that is broken by the change. Otherwise I think the status quo in the 3.3 repo should prevail -- I don't want to be stymied by superstition.
I attached an attempt to *deliberately* break the new behaviour to the tracker issue. It isn't actually breaking for me, so I'd like other folks to look at it to see if I missed something in my implementation, of if it's just genuinely that hard to induce the necessary bad timing of a preemptive thread switch.
Thanks, Nick. It looks reasonable to me, but I've only given it a quick look so far (I'll try to think about it more deeply later today). If it is indeed hard to provoke, then I'm fine with leaving the RuntimeError as a signal that the application needs to add some locking. My concern was that we'd have working production code that would start breaking. If it takes a *lot* of threads or a *lot* of mutation to trigger it, then it is going to be a lot less likely to happen anyway, since such programs are going to be much more careful about locking anyway. --David

Try reducing sys.setcheckinterval(). --Guido van Rossum (sent from Android phone) On Mar 31, 2012 10:45 AM, "R. David Murray" <rdmurray@bitdance.com> wrote:
On Sun, 01 Apr 2012 03:03:13 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sun, Apr 1, 2012 at 2:09 AM, Guido van Rossum <guido@python.org> wrote:
Here's a different puzzle. Has anyone written a demo yet that provokes this RuntimeError, without cheating? (Cheating would be to mutate the dict from *inside* the __eq__ or __hash__ method.) If you're serious about revisiting this, I'd like to see at least one example of a program that is broken by the change. Otherwise I think the status quo in the 3.3 repo should prevail -- I don't want to be stymied by superstition.
I attached an attempt to *deliberately* break the new behaviour to the tracker issue. It isn't actually breaking for me, so I'd like other folks to look at it to see if I missed something in my implementation, of if it's just genuinely that hard to induce the necessary bad timing of a preemptive thread switch.
Thanks, Nick. It looks reasonable to me, but I've only given it a quick look so far (I'll try to think about it more deeply later today).
If it is indeed hard to provoke, then I'm fine with leaving the RuntimeError as a signal that the application needs to add some locking. My concern was that we'd have working production code that would start breaking. If it takes a *lot* of threads or a *lot* of mutation to trigger it, then it is going to be a lot less likely to happen anyway, since such programs are going to be much more careful about locking anyway.
--David

On Apr 1, 2012 8:54 AM, "Benjamin Peterson" <benjamin@python.org> wrote:
2012/3/31 Guido van Rossum <guido@python.org>:
Try reducing sys.setcheckinterval().
setcheckinterval() is a no-op since the New-GIL. sys.setswitchinterval has superseded it
Ah, that's at least one thing wrong with my initial attempt - I was still thinking in terms of "number of bytecodes executed". Old habits die hard :) -- Sent from my phone, thus the relative brevity :)

On Sat, Mar 31, 2012 at 7:45 PM, R. David Murray <rdmurray@bitdance.com>wrote:
On Sun, 01 Apr 2012 03:03:13 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sun, Apr 1, 2012 at 2:09 AM, Guido van Rossum <guido@python.org> wrote:
Here's a different puzzle. Has anyone written a demo yet that provokes this RuntimeError, without cheating? (Cheating would be to mutate the dict from *inside* the __eq__ or __hash__ method.) If you're serious about revisiting this, I'd like to see at least one example of a program that is broken by the change. Otherwise I think the status quo in the 3.3 repo should prevail -- I don't want to be stymied by superstition.
I attached an attempt to *deliberately* break the new behaviour to the tracker issue. It isn't actually breaking for me, so I'd like other folks to look at it to see if I missed something in my implementation, of if it's just genuinely that hard to induce the necessary bad timing of a preemptive thread switch.
Thanks, Nick. It looks reasonable to me, but I've only given it a quick look so far (I'll try to think about it more deeply later today).
If it is indeed hard to provoke, then I'm fine with leaving the RuntimeError as a signal that the application needs to add some locking. My concern was that we'd have working production code that would start breaking. If it takes a *lot* of threads or a *lot* of mutation to trigger it, then it is going to be a lot less likely to happen anyway, since such programs are going to be much more careful about locking anyway.
--David _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
Hm I might be missing something, but if you have multiple threads accessing a dict, already this program: http://paste.pocoo.org/show/575776/ raises RuntimeError. You'll get slightly more obscure cases than changing a size raise RuntimeError during iteration under PyPy. As far as I understood, if you're mutating while iterating, you *can* get a runtime error. This does not even have a custom __eq__ or __hash__. Are you never iterating over dicts? Cheers, fijal

I'm confused. Are you saying that that program always raised RuntimeError, or that it started raising RuntimeError with the new behavior (3.3 alpha 2)? On Tue, Apr 3, 2012 at 2:47 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Sat, Mar 31, 2012 at 7:45 PM, R. David Murray <rdmurray@bitdance.com> wrote:
On Sun, 01 Apr 2012 03:03:13 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sun, Apr 1, 2012 at 2:09 AM, Guido van Rossum <guido@python.org> wrote:
Here's a different puzzle. Has anyone written a demo yet that provokes this RuntimeError, without cheating? (Cheating would be to mutate the dict from *inside* the __eq__ or __hash__ method.) If you're serious about revisiting this, I'd like to see at least one example of a program that is broken by the change. Otherwise I think the status quo in the 3.3 repo should prevail -- I don't want to be stymied by superstition.
I attached an attempt to *deliberately* break the new behaviour to the tracker issue. It isn't actually breaking for me, so I'd like other folks to look at it to see if I missed something in my implementation, of if it's just genuinely that hard to induce the necessary bad timing of a preemptive thread switch.
Thanks, Nick. It looks reasonable to me, but I've only given it a quick look so far (I'll try to think about it more deeply later today).
If it is indeed hard to provoke, then I'm fine with leaving the RuntimeError as a signal that the application needs to add some locking. My concern was that we'd have working production code that would start breaking. If it takes a *lot* of threads or a *lot* of mutation to trigger it, then it is going to be a lot less likely to happen anyway, since such programs are going to be much more careful about locking anyway.
--David _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
Hm
I might be missing something, but if you have multiple threads accessing a dict, already this program: http://paste.pocoo.org/show/575776/ raises RuntimeError. You'll get slightly more obscure cases than changing a size raise RuntimeError during iteration under PyPy. As far as I understood, if you're mutating while iterating, you *can* get a runtime error.
This does not even have a custom __eq__ or __hash__. Are you never iterating over dicts?
Cheers, fijal
-- --Guido van Rossum (python.org/~guido)

Never mind, I got it. This always raised RuntimeError. I see this should be considered support in favor of keeping the change, since sharing dicts between threads without locking is already fraught with RuntimeErrors. At the same time, has anyone looked at my small patch (added to the issue) that restores the retry loop without recursion? On Tue, Apr 3, 2012 at 3:17 PM, Guido van Rossum <guido@python.org> wrote:
I'm confused. Are you saying that that program always raised RuntimeError, or that it started raising RuntimeError with the new behavior (3.3 alpha 2)?
On Tue, Apr 3, 2012 at 2:47 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Sat, Mar 31, 2012 at 7:45 PM, R. David Murray <rdmurray@bitdance.com> wrote:
On Sun, 01 Apr 2012 03:03:13 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sun, Apr 1, 2012 at 2:09 AM, Guido van Rossum <guido@python.org> wrote:
Here's a different puzzle. Has anyone written a demo yet that provokes this RuntimeError, without cheating? (Cheating would be to mutate the dict from *inside* the __eq__ or __hash__ method.) If you're serious about revisiting this, I'd like to see at least one example of a program that is broken by the change. Otherwise I think the status quo in the 3.3 repo should prevail -- I don't want to be stymied by superstition.
I attached an attempt to *deliberately* break the new behaviour to the tracker issue. It isn't actually breaking for me, so I'd like other folks to look at it to see if I missed something in my implementation, of if it's just genuinely that hard to induce the necessary bad timing of a preemptive thread switch.
Thanks, Nick. It looks reasonable to me, but I've only given it a quick look so far (I'll try to think about it more deeply later today).
If it is indeed hard to provoke, then I'm fine with leaving the RuntimeError as a signal that the application needs to add some locking. My concern was that we'd have working production code that would start breaking. If it takes a *lot* of threads or a *lot* of mutation to trigger it, then it is going to be a lot less likely to happen anyway, since such programs are going to be much more careful about locking anyway.
--David _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
Hm
I might be missing something, but if you have multiple threads accessing a dict, already this program: http://paste.pocoo.org/show/575776/ raises RuntimeError. You'll get slightly more obscure cases than changing a size raise RuntimeError during iteration under PyPy. As far as I understood, if you're mutating while iterating, you *can* get a runtime error.
This does not even have a custom __eq__ or __hash__. Are you never iterating over dicts?
Cheers, fijal
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)

R. David Murray, 29.03.2012 22:31:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum wrote:
On Thu, Mar 29, 2012 at 12:58 PM, R. David Murray wrote:
Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers.
I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method.
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
The thing is: the assumption that arbitrary dict lookups are GIL-atomic has *always* been false. Only those that do not involve Python code execution for the hash key calculation or the object comparison are. That includes the built-in strings and numbers (and tuples of them), which are by far the most common dict keys. Looking up arbitrary user provided objects is definitely not guaranteed to be atomic. Stefan

On Thu, 29 Mar 2012 23:00:20 +0200, Stefan Behnel <stefan_ml@behnel.de> wrote:
R. David Murray, 29.03.2012 22:31:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum wrote:
On Thu, Mar 29, 2012 at 12:58 PM, R. David Murray wrote:
Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers.
I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method.
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
The thing is: the assumption that arbitrary dict lookups are GIL-atomic has *always* been false. Only those that do not involve Python code execution for the hash key calculation or the object comparison are. That includes the built-in strings and numbers (and tuples of them), which are by far the most common dict keys. Looking up arbitrary user provided objects is definitely not guaranteed to be atomic.
Well, I'm afraid I was using the term 'thread safety' rather too loosely there. What I mean is that if you do a dict lookup, the lookup either returns a value or a KeyError, and that if you get back an object that object has internally consistent state. The problem this fix introduces is that the lookup may fail with a RuntimeError rather than a KeyError, which it has never done before. I think that is what Guido means by code that uses objects with python eq/hash *and* assumes threadsafe lookup. If mutation of the objects or dict during the lookup is a concern, then the code would use locks and wouldn't have the problem. But there are certainly situations where it doesn't matter if the dictionary mutates during the lookup, as long as you get either an object or a KeyError, and thus no locks are (currently) needed. Maybe I'm being paranoid about breakage here, but as with most backward compatibility concerns, there are probably more bits of code that will be affected than our intuition indicates. --David

On 03/29/2012 06:07 PM, R. David Murray wrote:
On Thu, 29 Mar 2012 23:00:20 +0200, Stefan Behnel<stefan_ml@behnel.de> wrote:
R. David Murray, 29.03.2012 22:31:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum wrote:
On Thu, Mar 29, 2012 at 12:58 PM, R. David Murray wrote:
Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers.
I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method.
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
The thing is: the assumption that arbitrary dict lookups are GIL-atomic has *always* been false. Only those that do not involve Python code execution for the hash key calculation or the object comparison are. That includes the built-in strings and numbers (and tuples of them), which are by far the most common dict keys. Looking up arbitrary user provided objects is definitely not guaranteed to be atomic.
Well, I'm afraid I was using the term 'thread safety' rather too loosely there. What I mean is that if you do a dict lookup, the lookup either returns a value or a KeyError, and that if you get back an object that object has internally consistent state. The problem this fix introduces is that the lookup may fail with a RuntimeError rather than a KeyError, which it has never done before.
I think that is what Guido means by code that uses objects with python eq/hash *and* assumes threadsafe lookup. If mutation of the objects or dict during the lookup is a concern, then the code would use locks and wouldn't have the problem. But there are certainly situations where it doesn't matter if the dictionary mutates during the lookup, as long as you get either an object or a KeyError, and thus no locks are (currently) needed.
Maybe I'm being paranoid about breakage here, but as with most backward compatibility concerns, there are probably more bits of code that will be affected than our intuition indicates.
--David _______________________________________________
what this suppose to mean exactly? To "mutate" is a bit odd concept for a programming language I suppose. Also I suppose I must be missing something which makes you feel like this is an OT post when the problem seem most likely to be exclusively in python 3.3, another reason I guess to not upgrade yet all that massively using 2to3. :-) cheers, Etienne

Etienne, I have not understood either of your messages in this thread. They just did not make sense to me. Do you actually understand the issue at hand? --Guido On Friday, March 30, 2012, Etienne Robillard wrote:
On 03/29/2012 06:07 PM, R. David Murray wrote:
On Thu, 29 Mar 2012 23:00:20 +0200, Stefan Behnel<stefan_ml@behnel.de> wrote:
R. David Murray, 29.03.2012 22:31:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum wrote:
On Thu, Mar 29, 2012 at 12:58 PM, R. David Murray wrote:
Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers.
I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method.
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
The thing is: the assumption that arbitrary dict lookups are GIL-atomic has *always* been false. Only those that do not involve Python code execution for the hash key calculation or the object comparison are. That includes the built-in strings and numbers (and tuples of them), which are by far the most common dict keys. Looking up arbitrary user provided objects is definitely not guaranteed to be atomic.
Well, I'm afraid I was using the term 'thread safety' rather too loosely there. What I mean is that if you do a dict lookup, the lookup either returns a value or a KeyError, and that if you get back an object that object has internally consistent state. The problem this fix introduces is that the lookup may fail with a RuntimeError rather than a KeyError, which it has never done before.
I think that is what Guido means by code that uses objects with python eq/hash *and* assumes threadsafe lookup. If mutation of the objects or dict during the lookup is a concern, then the code would use locks and wouldn't have the problem. But there are certainly situations where it doesn't matter if the dictionary mutates during the lookup, as long as you get either an object or a KeyError, and thus no locks are (currently) needed.
Maybe I'm being paranoid about breakage here, but as with most backward compatibility concerns, there are probably more bits of code that will be affected than our intuition indicates.
--David ______________________________**_________________
what this suppose to mean exactly? To "mutate" is a bit odd concept for a programming language I suppose. Also I suppose I must be missing something which makes you feel like this is an OT post when the problem seem most likely to be exclusively in python 3.3, another reason I guess to not upgrade yet all that massively using 2to3. :-)
cheers, Etienne ______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org>
-- --Guido van Rossum (python.org/~guido)

Hi Guido, I'm sorry for being unclear! I just try actually to learn what thoses consequences for theses 'unattended' mutations in dictionary key lookups could be, :-) however, it seems now that I might have touch a nerve without realizing it. I would therefore appreciate more light on this "issue" if you like to enlighten us all. :D Regards, Etienne On 03/30/2012 10:47 AM, Guido van Rossum wrote:
Etienne, I have not understood either of your messages in this thread. They just did not make sense to me. Do you actually understand the issue at hand?
--Guido
On Friday, March 30, 2012, Etienne Robillard wrote:
On 03/29/2012 06:07 PM, R. David Murray wrote:
On Thu, 29 Mar 2012 23:00:20 +0200, Stefan Behnel<stefan_ml@behnel.de> wrote:
R. David Murray, 29.03.2012 22:31:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum wrote:
On Thu, Mar 29, 2012 at 12:58 PM, R. David Murray wrote:
Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers.
I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method.
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
The thing is: the assumption that arbitrary dict lookups are GIL-atomic has *always* been false. Only those that do not involve Python code execution for the hash key calculation or the object comparison are. That includes the built-in strings and numbers (and tuples of them), which are by far the most common dict keys. Looking up arbitrary user provided objects is definitely not guaranteed to be atomic.
Well, I'm afraid I was using the term 'thread safety' rather too loosely there. What I mean is that if you do a dict lookup, the lookup either returns a value or a KeyError, and that if you get back an object that object has internally consistent state. The problem this fix introduces is that the lookup may fail with a RuntimeError rather than a KeyError, which it has never done before.
I think that is what Guido means by code that uses objects with python eq/hash *and* assumes threadsafe lookup. If mutation of the objects or dict during the lookup is a concern, then the code would use locks and wouldn't have the problem. But there are certainly situations where it doesn't matter if the dictionary mutates during the lookup, as long as you get either an object or a KeyError, and thus no locks are (currently) needed.
Maybe I'm being paranoid about breakage here, but as with most backward compatibility concerns, there are probably more bits of code that will be affected than our intuition indicates.
--David ______________________________ _________________
what this suppose to mean exactly? To "mutate" is a bit odd concept for a programming language I suppose. Also I suppose I must be missing something which makes you feel like this is an OT post when the problem seem most likely to be exclusively in python 3.3, another reason I guess to not upgrade yet all that massively using 2to3. :-)
cheers, Etienne ______________________________ _________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/ mailman/listinfo/python-dev <http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/ mailman/options/python-dev/ guido%40python.org <http://mail.python.org/mailman/options/python-dev/guido%40python.org>
-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>)

On Sat, Mar 31, 2012 at 1:27 AM, Etienne Robillard <animelovin@gmail.com> wrote:
Hi Guido,
I'm sorry for being unclear! I just try actually to learn what thoses consequences for theses 'unattended' mutations in dictionary key lookups could be, :-)
however, it seems now that I might have touch a nerve without realizing it. I would therefore appreciate more light on this "issue" if you like to enlighten us all. :D
Etienne, For those that need to understand the issue in order to further consider the consequences of the change, RDM has already explained the problem quite clearly. If you'd like a more in-depth explanation, please ask the question again over on core-mentorship@python.org. It's not an appropriate topic for the main development list. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

"Multiple threads can agree by convention not to mutate a shared dict, there's no great need for enforcement. Multiple processes can't share dicts." its not sure I get completely the meaning of "mutate"... And if possible, I would like also the rational for the 2nd phrase while we're at it as it seem a little unclear too. Sorry also if this is OT... :) Regards, Etienne http://www.python.org/dev/peps/pep-0416/ On 03/30/2012 10:47 AM, Guido van Rossum wrote:
Etienne, I have not understood either of your messages in this thread. They just did not make sense to me. Do you actually understand the issue at hand?
--Guido
On Friday, March 30, 2012, Etienne Robillard wrote:
On 03/29/2012 06:07 PM, R. David Murray wrote:
On Thu, 29 Mar 2012 23:00:20 +0200, Stefan Behnel<stefan_ml@behnel.de> wrote:
R. David Murray, 29.03.2012 22:31:
On Thu, 29 Mar 2012 13:09:17 -0700, Guido van Rossum wrote:
On Thu, Mar 29, 2012 at 12:58 PM, R. David Murray wrote:
Some of us have expressed uneasiness about the consequences of dict raising an error on lookup if the dict has been modified, the fix Victor made to solve one of the crashers.
I don't know if I speak for the others, but (assuming that I understand the change correctly) my concern is that there is probably a significant amount of threading code out there that assumes that dict *lookup* is a thread-safe operation. Much of that code will, if moved to Python 3.3, now be subject to random runtime errors for which it will not be prepared. Further, code which appears safe can suddenly become unsafe if a refactoring of the code causes an object to be stored in the dictionary that has a Python equality method.
My original assessment was that this only affects dicts whose keys have a user-implemented __hash__ or __eq__ implementation, and that the number of apps that use this *and* assume the threadsafe property would be pretty small. This is just intuition, I don't have hard facts. But I do want to stress that not all dict lookups automatically become thread-unsafe, only those that need to run user code as part of the key lookup.
You are probably correct, but the thing is that one still has to do the code audit to be sure...and then make sure that no one later introduces such an object type as a dict key.
The thing is: the assumption that arbitrary dict lookups are GIL-atomic has *always* been false. Only those that do not involve Python code execution for the hash key calculation or the object comparison are. That includes the built-in strings and numbers (and tuples of them), which are by far the most common dict keys. Looking up arbitrary user provided objects is definitely not guaranteed to be atomic.
Well, I'm afraid I was using the term 'thread safety' rather too loosely there. What I mean is that if you do a dict lookup, the lookup either returns a value or a KeyError, and that if you get back an object that object has internally consistent state. The problem this fix introduces is that the lookup may fail with a RuntimeError rather than a KeyError, which it has never done before.
I think that is what Guido means by code that uses objects with python eq/hash *and* assumes threadsafe lookup. If mutation of the objects or dict during the lookup is a concern, then the code would use locks and wouldn't have the problem. But there are certainly situations where it doesn't matter if the dictionary mutates during the lookup, as long as you get either an object or a KeyError, and thus no locks are (currently) needed.
Maybe I'm being paranoid about breakage here, but as with most backward compatibility concerns, there are probably more bits of code that will be affected than our intuition indicates.
--David ______________________________ _________________
what this suppose to mean exactly? To "mutate" is a bit odd concept for a programming language I suppose. Also I suppose I must be missing something which makes you feel like this is an OT post when the problem seem most likely to be exclusively in python 3.3, another reason I guess to not upgrade yet all that massively using 2to3. :-)
cheers, Etienne ______________________________ _________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/ mailman/listinfo/python-dev <http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/ mailman/options/python-dev/ guido%40python.org <http://mail.python.org/mailman/options/python-dev/guido%40python.org>
-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>)

you wish...are you also truth allergic or irritated by the consequences of free speech ? Please stop giving me orders. You don't even know me and this is at all not necessary and good netiquette if you want to bring a point to ponder. Sorry for others who thinks this is not OT as I its probably related to pep-416 refusal. Cheers! Etienne On 03/30/2012 11:54 AM, Stefan Behnel wrote:
Etienne Robillard, 30.03.2012 17:45:
Sorry also if this is OT... :)
Yes, it is. Please do as Nick told you.
Stefan

Etienne Robillard, 30.03.2012 18:08:
are you also truth allergic or irritated by the consequences of free speech ?
Please note that "free speech" is a concept that is different from asking beginner's computer science questions on the core developer mailing list of a software development project. This is not the right forum to do so, and you should therefore move your "free speech" to one that is more appropriate. Nick has pointed you to one such forum and you would be well advised to use it - that's all I was trying to say. I hope it's clearer now. Stefan

your reasoning is pathetic at best. i pass... Thanks for the tip :-) Cheers, Etienne On 03/30/2012 12:18 PM, Stefan Behnel wrote:
Etienne Robillard, 30.03.2012 18:08:
are you also truth allergic or irritated by the consequences of free speech ?
Please note that "free speech" is a concept that is different from asking beginner's computer science questions on the core developer mailing list of a software development project. This is not the right forum to do so, and you should therefore move your "free speech" to one that is more appropriate. Nick has pointed you to one such forum and you would be well advised to use it - that's all I was trying to say. I hope it's clearer now.
Stefan
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/animelovin%40gmail.com

Etienne Robillard wrote:
your reasoning is pathetic at best. i pass... Thanks for the tip :-)
The Python Developer list is for the discussion of developing Python, not for teaching basic programming. You are being rude, and a smiley does not make you less rude. I am adding you to my kill-file so I no longer see messages from you because you refuse to follow the advice you have been given. ~Ethan~

On 03/30/2012 02:23 PM, Ethan Furman wrote:
Etienne Robillard wrote:
your reasoning is pathetic at best. i pass... Thanks for the tip :-)
The Python Developer list is for the discussion of developing Python, not for teaching basic programming.
You are being rude, and a smiley does not make you less rude.
I am adding you to my kill-file so I no longer see messages from you because you refuse to follow the advice you have been given.
~Ethan~
Add me to whatever file you want, but I believe the consequences for the new dict runtime errors just won't be resolved this way, and neither by systematically blocking alternative opinions as OT will help, because thats typically oppression, not free speech. :-) Cheers, :-) Etienne
participants (8)
-
Benjamin Peterson
-
Ethan Furman
-
Etienne Robillard
-
Guido van Rossum
-
Maciej Fijalkowski
-
Nick Coghlan
-
R. David Murray
-
Stefan Behnel