set.add could return True or False
set.add(x) could return True if x was added to the set, and False if x was already in the set. Adding an element that is already present often constitutes an error in my code. As I understand, set.add is an atomic operation. Having set.add return a boolean will also allow EAFP-style code with regard to handling duplicates, the long winded form of which is currently: if a not in b: b.add(a) <-- race condition do_c() Which can be improved to: if b.add(a): do_c() Advantages: * Very common code pattern. * More concise. * Allows interpreter atomicity to be exploited, often removing the need for additional locking. * Faster because it avoids double contain check, and can avoid locking.
You still can get race condition: if b.add(a): # some thread removes a from b before do_c call... do_c() Explicit lock is still required for your case. On Wed, Mar 14, 2012 at 10:36 AM, Matt Joiner <anacrolix@gmail.com> wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
Adding an element that is already present often constitutes an error in my code.
As I understand, set.add is an atomic operation. Having set.add return a boolean will also allow EAFP-style code with regard to handling duplicates, the long winded form of which is currently:
if a not in b: b.add(a) <-- race condition do_c()
Which can be improved to:
if b.add(a): do_c()
Advantages: * Very common code pattern. * More concise. * Allows interpreter atomicity to be exploited, often removing the need for additional locking. * Faster because it avoids double contain check, and can avoid locking. _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
This is an example only. On Mar 15, 2012 1:46 AM, "Andrew Svetlov" <andrew.svetlov@gmail.com> wrote:
You still can get race condition:
if b.add(a): # some thread removes a from b before do_c call... do_c()
Explicit lock is still required for your case.
On Wed, Mar 14, 2012 at 10:36 AM, Matt Joiner <anacrolix@gmail.com> wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
Adding an element that is already present often constitutes an error in my code.
As I understand, set.add is an atomic operation. Having set.add return a boolean will also allow EAFP-style code with regard to handling duplicates, the long winded form of which is currently:
if a not in b: b.add(a) <-- race condition do_c()
Which can be improved to:
if b.add(a): do_c()
Advantages: * Very common code pattern. * More concise. * Allows interpreter atomicity to be exploited, often removing the need for additional locking. * Faster because it avoids double contain check, and can avoid locking. _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On 15Mar2012 02:26, Matt Joiner <anacrolix@gmail.com> wrote: | This is an example only. [... racy stuff ...] Then please provide a realistic nonracy example. And also, what's your use case for "Adding an element that is already present often constitutes an error in my code"? And how is it not addressed by: b.update( (a,) ) Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. - Martin Golding, DoD #0236, martin@plaza.ds.adp.com
I've submitted patches and benchmarks for this here: http://bugs.python.org/issue14320 On Fri, Mar 16, 2012 at 10:19 AM, Cameron Simpson <cs@zip.com.au> wrote:
On 15Mar2012 02:26, Matt Joiner <anacrolix@gmail.com> wrote: | This is an example only. [... racy stuff ...]
Then please provide a realistic nonracy example.
And also, what's your use case for "Adding an element that is already present often constitutes an error in my code"? And how is it not addressed by:
b.update( (a,) )
Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. - Martin Golding, DoD #0236, martin@plaza.ds.adp.com
On 03/14/2012 06:46 PM, Andrew Svetlov wrote:
You still can get race condition:
if b.add(a): # some thread removes a from b before do_c call... do_c()
Explicit lock is still required for your case.
A common use for this pattern is to do some piece of work only the first time a given value is seen. The race condition you mention is unlikely to matter there because either: - Values are never removed from b, and so there is no race, or - Values are only ever removed by do_c, so there is no race, or - Whether a is actually *in* b is irrelevant to do_c. The original race *does* matter, because do_c() may be called multiple times for the same value. Changing set.add() to return True if the value was actually added fixes this race condition without using locks. In fact, you could even use the proposed feature to *implement* a form of (non-reentrant, nonblocking) locking. If you need a lot of locks, it would even be cheaper (at least memory-wise) than using threading.Lock objects. Having to use a "real" lock to avoid the race condition makes that approach a lot less attractive. - Jacob
On Wed, Mar 14, 2012 at 10:36 AM, Matt Joiner<anacrolix@gmail.com> wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
Adding an element that is already present often constitutes an error in my code.
As I understand, set.add is an atomic operation. Having set.add return a boolean will also allow EAFP-style code with regard to handling duplicates, the long winded form of which is currently:
if a not in b: b.add(a)<-- race condition do_c()
Which can be improved to:
if b.add(a): do_c()
Advantages: * Very common code pattern. * More concise. * Allows interpreter atomicity to be exploited, often removing the need for additional locking. * Faster because it avoids double contain check, and can avoid locking. _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
This, and more so removing duplication, are my reasons for suggesting it. A lot of my code uses assertions to verify assumptions about uniqueness of elements. This requires checking that an element isn't already present before adding it. Doing these checks on sets shared by threads requires locking, which is a high cost when most operations allow "cheating" with the GIL. But this is a minor boon.
On 2012-03-14, at 18:36 , Matt Joiner wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
That does not mesh with the usual Python semantics of methods either having a side-effect (mutation) or returning a value. Why would that happen with sets but not with e.g. dicts?
Adding an element that is already present often constitutes an error in my code.
Then thrown an error when that happens?
As I understand, set.add is an atomic operation. Having set.add return a boolean will also allow EAFP-style code with regard to handling duplicates, the long winded form of which is currently:
if a not in b: b.add(a) <-- race condition do_c()
Which can be improved to:
if b.add(a): do_c()
Advantages: * Very common code pattern. * More concise. * Allows interpreter atomicity to be exploited, often removing the need for additional locking. * Faster because it avoids double contain check, and can avoid locking.
Nope, as Andrew noted it's possible that an other thread has *removed* the element from the set before do_c (so you've got your race condition right there, assuming do_c expects `a` to be in `b`) And since you're using a set `b.add(a)` is a noop if `a` is already in `b` so there's no race condition at the point you note. The race condition is instead in the condition itself, you can have two different threads finding out the value is not in the set and ultimately executing do_c.
On Wed, Mar 14, 2012 at 10:53 AM, Masklinn <masklinn@masklinn.net> wrote:
On 2012-03-14, at 18:36 , Matt Joiner wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
That does not mesh with the usual Python semantics of methods either having a side-effect (mutation) or returning a value. Why would that happen with sets but not with e.g. dicts?
The rule is a bit more complicated than that (e.g. consider list.pop()). It's gets fleshed out well in: http://bugs.python.org/issue12192 set.remove() arguably "returns" the same sort of indication as that which is proposed, in that it either raises or doesn't raise KeyError depending on whether the value was present. But yeah, these boolean return values aren't of huge utility, particularly in the multithreaded case. Cheers, Chris
I should not have emphasized the atomicity here, this was not intended to be the main reason. On Mar 15, 2012 2:11 AM, "Chris Rebert" <pyideas@rebertia.com> wrote:
On Wed, Mar 14, 2012 at 10:53 AM, Masklinn <masklinn@masklinn.net> wrote:
On 2012-03-14, at 18:36 , Matt Joiner wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
That does not mesh with the usual Python semantics of methods either having a side-effect (mutation) or returning a value. Why would that happen with sets but not with e.g. dicts?
The rule is a bit more complicated than that (e.g. consider list.pop()). It's gets fleshed out well in: http://bugs.python.org/issue12192
set.remove() arguably "returns" the same sort of indication as that which is proposed, in that it either raises or doesn't raise KeyError depending on whether the value was present.
But yeah, these boolean return values aren't of huge utility, particularly in the multithreaded case.
Cheers, Chris
I think it would actually be reasonable to return a true/false value here (there are plenty of non-threaded uses for this), except I think it's too late to change now -- it would just encourage folks from writing code that works with Python 3.3 but is very subtly broken with earlier versions. So, -1. --Guido On Wed, Mar 14, 2012 at 11:29 AM, Matt Joiner <anacrolix@gmail.com> wrote:
I should not have emphasized the atomicity here, this was not intended to be the main reason.
On Mar 15, 2012 2:11 AM, "Chris Rebert" <pyideas@rebertia.com> wrote:
On Wed, Mar 14, 2012 at 10:53 AM, Masklinn <masklinn@masklinn.net> wrote:
On 2012-03-14, at 18:36 , Matt Joiner wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
That does not mesh with the usual Python semantics of methods either having a side-effect (mutation) or returning a value. Why would that happen with sets but not with e.g. dicts?
The rule is a bit more complicated than that (e.g. consider list.pop()). It's gets fleshed out well in: http://bugs.python.org/issue12192
set.remove() arguably "returns" the same sort of indication as that which is proposed, in that it either raises or doesn't raise KeyError depending on whether the value was present.
But yeah, these boolean return values aren't of huge utility, particularly in the multithreaded case.
Cheers, Chris
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
There's not much chance of that now, most projects that could cause trouble with this are using six, or moving up from 2. By comparison many other things in Python 3.3 are very non backwards friendly like delegating generators, new exception hierarchy, and a crap load of new API calls. Now is the hour! :> On Mar 15, 2012 2:33 AM, "Guido van Rossum" <guido@python.org> wrote:
I think it would actually be reasonable to return a true/false value here (there are plenty of non-threaded uses for this), except I think it's too late to change now -- it would just encourage folks from writing code that works with Python 3.3 but is very subtly broken with earlier versions. So, -1.
--Guido
On Wed, Mar 14, 2012 at 11:29 AM, Matt Joiner <anacrolix@gmail.com> wrote:
I should not have emphasized the atomicity here, this was not intended to be the main reason.
On Mar 15, 2012 2:11 AM, "Chris Rebert" <pyideas@rebertia.com> wrote:
On Wed, Mar 14, 2012 at 10:53 AM, Masklinn <masklinn@masklinn.net>
wrote:
On 2012-03-14, at 18:36 , Matt Joiner wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
That does not mesh with the usual Python semantics of methods either having a side-effect (mutation) or returning a value. Why would that happen with sets but not with e.g. dicts?
The rule is a bit more complicated than that (e.g. consider list.pop()). It's gets fleshed out well in: http://bugs.python.org/issue12192
set.remove() arguably "returns" the same sort of indication as that which is proposed, in that it either raises or doesn't raise KeyError depending on whether the value was present.
But yeah, these boolean return values aren't of huge utility, particularly in the multithreaded case.
Cheers, Chris
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
On Mar 15, 2012 1:53 AM, "Masklinn" <masklinn@masklinn.net> wrote:
On 2012-03-14, at 18:36 , Matt Joiner wrote:
set.add(x) could return True if x was added to the set, and False if x was already in the set.
That does not mesh with the usual Python semantics of methods either having a side-effect (mutation) or returning a value. Why would that happen with sets but not with e.g. dicts?
Because dict insertions are by operator?
Adding an element that is already present often constitutes an error in
my code.
Then thrown an error when that happens?
As I understand, set.add is an atomic operation. Having set.add return a boolean will also allow EAFP-style code with regard to handling duplicates, the long winded form of which is currently:
if a not in b: b.add(a) <-- race condition do_c()
Which can be improved to:
if b.add(a): do_c()
Advantages: * Very common code pattern. * More concise. * Allows interpreter atomicity to be exploited, often removing the need for additional locking. * Faster because it avoids double contain check, and can avoid locking.
Nope, as Andrew noted it's possible that an other thread has *removed* the element from the set before do_c (so you've got your race condition right there, assuming do_c expects `a` to be in `b`)
And since you're using a set `b.add(a)` is a noop if `a` is already in `b` so there's no race condition at the point you note. The race condition is instead in the condition itself, you can have two different threads finding out the value is not in the set and ultimately executing do_c.
There's still a performance cost in looking up the already present value a second time.
On Mar 14, 2012, at 11:34 AM, Matt Joiner wrote:
There's still a performance cost in looking up the already present value a second time.
The performance cost is near zero. The relevant memory accesses with be in cache (which make accessing them almost free). Besides, the price you pay for storing and testing the boolean value isn't free either. Thinking like a C programmer won't help you in the optimization game with sets. And, it's certainly not worth complexifying the API. Raymond P.S. There have been previous threads on this same subject and they've all ended with rejecting the proposal.
participants (8)
-
Andrew Svetlov
-
Cameron Simpson
-
Chris Rebert
-
Guido van Rossum
-
Jacob Holm
-
Masklinn
-
Matt Joiner
-
Raymond Hettinger