
For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty. That's quite popular problem, there is a discussion of how to perform it on stackoverflow ( http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-a...) - all suggestions are either slow or not very elegant. What do you think about it?

On 6 October 2016 at 14:45, Filipp Bakanov <filipp@bakanov.su> wrote:
It's not a problem I've needed to solve often, if at all (in real-world code). But even if we assume it is worth having as a builtin, what would you propose as the implementation? The stackoverflow discussion highlights a lot of approaches, all with their own trade-offs. One problem with a builtin is that it would have to work on all iterables, which is likely to preclude a number of the faster solutions (which rely on the argument being an actual list). It's an interesting optimisation problem, and the discussion gives some great insight into how to micro-optimise an operation like this, but I'd question whether it needs to be a language/stdlib feature. Paul

On Thu, Oct 06, 2016 at 03:01:36PM +0100, Paul Moore wrote:
I've needed it several times, but can't really remember what for anymore, which makes me think it's not really that important. A motivating reason for adding it to the builtins would be that it can be written in C instead of Python, and hence be a lot faster. The single slowest solution is actually the fastest when the difference is detected very soon (case s3), all others are `O(n)` and not `O(first-mismatch)`. Though, that means it could also be written in C and provided to PyPI, at the cost of asking others to install an extra package.

On 06/10/2016 15:43, Sjoerd Job Postmus wrote:
This should be on the bug tracker as "release blocker" as we clearly need something that is fast that isn't that important. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

It is a real problem. People are used to write `seq == [1, 2, 3]` and it passes unnoticed (even with type checkers) that if seq changes to e.g. a tuple, it will cause subtle bugs. It is inconvenient to write `len(seq) == 3 and seq == [1, 2, 3]` and people often don't notice the need to write it. (I'd like to note that it makes sense for this operation to be written as *iter1 == *lst although it requires a significant change to the language, so a Sequence.equal function makes sense) Elazar On Thu, Oct 6, 2016 at 5:02 PM Paul Moore <p.f.moore@gmail.com> wrote:

On Thu, Oct 06, 2016 at 04:45:01PM +0300, Filipp Bakanov wrote:
I haven't checked the link, but just off the top of my head, how's this? def all_equal(iterable): it = iter(iterable) sentinel = object() first = next(it, sentinel) return all(x == first for x in it) I think that's neat, elegant, fast, and short enough that I don't mind writing it myself when I need it (although I wouldn't mind adding it to my own personal toolbox). +0.3 to adding it the standard library. +0.1 to adding it to built-ins -0.1 on adding it to built-ins under the name "equal", as that will confuse too many people. -- Steve

On 6 October 2016 at 18:09, Filipp Bakanov <filipp@bakanov.su> wrote:
Interestingly, the recipe given there was not mentioned in the stackoverflow thread. Testing it against Steven's example given above: recipe.py: from itertools import groupby def all_equal_1(iterable): "Returns True if all the elements are equal to each other" g = groupby(iterable) return next(g, True) and not next(g, False) def all_equal_2(iterable): it = iter(iterable) sentinel = object() first = next(it, sentinel) return all(x == first for x in it) Results:
So the itertools recipe is just under twice as fast for all-different values, and over 10 times faster if all the values are the same. The moral here is probably to check the itertools recipes, they are really well coded. If you really feel that it's worth promoting this recipe to an actual itertools function, you should probably create a tracker item for it, aimed at Python 3.7, with a patch implementing it. My feeling is that Raymond (who's in charge of the itertools module) won't think it's worth including - he's typically very cautious about adding itertools unless they have proven broad value. But that's just my guess, and the only way to know for sure is to ask. BTW, given that this *is* already an itertools recipe, it seems clear to me that the only reasonable place to put it if it does go into core Python would be the itertools module. Paul

On 6 October 2016 at 23:45, Filipp Bakanov <filipp@bakanov.su> wrote:
If the items are hashable, you can already just dump them in a set: len(set(iterable)) <= 1 If they're not hashable or you want to exit ASAP on larger inputs, you'll want an algorithm that works the same way any/all do: def all_same(iterable): itr = itr(iterable) try: first = next(itr) except StopIteration: return True return all(x == first for x in itr) (Checking the SO question, both of those are given in the first answer) If you know you have a sequence, you can also do: not seq or all(x == seq[0] for x in seq) Exactly which of those options makes sense is going to depend on what format your data is in, and what other operations you're planning to do with it - without a context of use in the SO question, it sounds more like someone seeking help with their algorithms and data structures homework than it does a practical programming problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 6 October 2016 at 14:45, Filipp Bakanov <filipp@bakanov.su> wrote:
It's not a problem I've needed to solve often, if at all (in real-world code). But even if we assume it is worth having as a builtin, what would you propose as the implementation? The stackoverflow discussion highlights a lot of approaches, all with their own trade-offs. One problem with a builtin is that it would have to work on all iterables, which is likely to preclude a number of the faster solutions (which rely on the argument being an actual list). It's an interesting optimisation problem, and the discussion gives some great insight into how to micro-optimise an operation like this, but I'd question whether it needs to be a language/stdlib feature. Paul

On Thu, Oct 06, 2016 at 03:01:36PM +0100, Paul Moore wrote:
I've needed it several times, but can't really remember what for anymore, which makes me think it's not really that important. A motivating reason for adding it to the builtins would be that it can be written in C instead of Python, and hence be a lot faster. The single slowest solution is actually the fastest when the difference is detected very soon (case s3), all others are `O(n)` and not `O(first-mismatch)`. Though, that means it could also be written in C and provided to PyPI, at the cost of asking others to install an extra package.

On 06/10/2016 15:43, Sjoerd Job Postmus wrote:
This should be on the bug tracker as "release blocker" as we clearly need something that is fast that isn't that important. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

It is a real problem. People are used to write `seq == [1, 2, 3]` and it passes unnoticed (even with type checkers) that if seq changes to e.g. a tuple, it will cause subtle bugs. It is inconvenient to write `len(seq) == 3 and seq == [1, 2, 3]` and people often don't notice the need to write it. (I'd like to note that it makes sense for this operation to be written as *iter1 == *lst although it requires a significant change to the language, so a Sequence.equal function makes sense) Elazar On Thu, Oct 6, 2016 at 5:02 PM Paul Moore <p.f.moore@gmail.com> wrote:

On Thu, Oct 06, 2016 at 04:45:01PM +0300, Filipp Bakanov wrote:
I haven't checked the link, but just off the top of my head, how's this? def all_equal(iterable): it = iter(iterable) sentinel = object() first = next(it, sentinel) return all(x == first for x in it) I think that's neat, elegant, fast, and short enough that I don't mind writing it myself when I need it (although I wouldn't mind adding it to my own personal toolbox). +0.3 to adding it the standard library. +0.1 to adding it to built-ins -0.1 on adding it to built-ins under the name "equal", as that will confuse too many people. -- Steve

On 6 October 2016 at 18:09, Filipp Bakanov <filipp@bakanov.su> wrote:
Interestingly, the recipe given there was not mentioned in the stackoverflow thread. Testing it against Steven's example given above: recipe.py: from itertools import groupby def all_equal_1(iterable): "Returns True if all the elements are equal to each other" g = groupby(iterable) return next(g, True) and not next(g, False) def all_equal_2(iterable): it = iter(iterable) sentinel = object() first = next(it, sentinel) return all(x == first for x in it) Results:
So the itertools recipe is just under twice as fast for all-different values, and over 10 times faster if all the values are the same. The moral here is probably to check the itertools recipes, they are really well coded. If you really feel that it's worth promoting this recipe to an actual itertools function, you should probably create a tracker item for it, aimed at Python 3.7, with a patch implementing it. My feeling is that Raymond (who's in charge of the itertools module) won't think it's worth including - he's typically very cautious about adding itertools unless they have proven broad value. But that's just my guess, and the only way to know for sure is to ask. BTW, given that this *is* already an itertools recipe, it seems clear to me that the only reasonable place to put it if it does go into core Python would be the itertools module. Paul

On 6 October 2016 at 23:45, Filipp Bakanov <filipp@bakanov.su> wrote:
If the items are hashable, you can already just dump them in a set: len(set(iterable)) <= 1 If they're not hashable or you want to exit ASAP on larger inputs, you'll want an algorithm that works the same way any/all do: def all_same(iterable): itr = itr(iterable) try: first = next(itr) except StopIteration: return True return all(x == first for x in itr) (Checking the SO question, both of those are given in the first answer) If you know you have a sequence, you can also do: not seq or all(x == seq[0] for x in seq) Exactly which of those options makes sense is going to depend on what format your data is in, and what other operations you're planning to do with it - without a context of use in the SO question, it sounds more like someone seeking help with their algorithms and data structures homework than it does a practical programming problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (9)
-
Chris Angelico
-
Ethan Furman
-
Filipp Bakanov
-
Mark Lawrence
-
Nick Coghlan
-
Paul Moore
-
Sjoerd Job Postmus
-
Steven D'Aprano
-
אלעזר