Mailman 3 Add "equal" builtin function - Python-ideas

Add "equal" builtin function

Filipp Bakanov

Oct. 6, 2016

1:45 p.m.

For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty. That's quite popular problem, there is a discussion of how to perform it on stackoverflow ( http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-a...) - all suggestions are either slow or not very elegant. What do you think about it?

Attachments:

attachment.htm (text/html — 663 bytes)

Show replies by date

Paul Moore

October 2016

2:01 p.m.

On 6 October 2016 at 14:45, Filipp Bakanov <filipp@bakanov.su> wrote:

...

For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty. That's quite popular problem, there is a discussion of how to perform it on stackoverflow (http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-a...) - all suggestions are either slow or not very elegant. What do you think about it?

It's not a problem I've needed to solve often, if at all (in real-world code). But even if we assume it is worth having as a builtin, what would you propose as the implementation? The stackoverflow discussion highlights a lot of approaches, all with their own trade-offs. One problem with a builtin is that it would have to work on all iterables, which is likely to preclude a number of the faster solutions (which rely on the argument being an actual list). It's an interesting optimisation problem, and the discussion gives some great insight into how to micro-optimise an operation like this, but I'd question whether it needs to be a language/stdlib feature. Paul

Sjoerd Job Postmus

2:43 p.m.

On Thu, Oct 06, 2016 at 03:01:36PM +0100, Paul Moore wrote:

...

On 6 October 2016 at 14:45, Filipp Bakanov <filipp@bakanov.su> wrote:

...
For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty. That's quite popular problem, there is a discussion of how to perform it on stackoverflow (http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-a...) - all suggestions are either slow or not very elegant. What do you think about it?

It's not a problem I've needed to solve often, if at all (in real-world code). But even if we assume it is worth having as a builtin, what would you propose as the implementation? The stackoverflow discussion highlights a lot of approaches, all with their own trade-offs. One problem with a builtin is that it would have to work on all iterables, which is likely to preclude a number of the faster solutions (which rely on the argument being an actual list).

It's an interesting optimisation problem, and the discussion gives some great insight into how to micro-optimise an operation like this, but I'd question whether it needs to be a language/stdlib feature.

Paul

I've needed it several times, but can't really remember what for anymore, which makes me think it's not really that important. A motivating reason for adding it to the builtins would be that it can be written in C instead of Python, and hence be a lot faster. The single slowest solution is actually the fastest when the difference is detected very soon (case s3), all others are `O(n)` and not `O(first-mismatch)`. Though, that means it could also be written in C and provided to PyPI, at the cost of asking others to install an extra package.

Mark Lawrence

6:52 p.m.

On 06/10/2016 15:43, Sjoerd Job Postmus wrote:

...

On Thu, Oct 06, 2016 at 03:01:36PM +0100, Paul Moore wrote:

...
On 6 October 2016 at 14:45, Filipp Bakanov <filipp@bakanov.su> wrote:

...
For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty. That's quite popular problem, there is a discussion of how to perform it on stackoverflow (http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-a...) - all suggestions are either slow or not very elegant. What do you think about it?

It's not a problem I've needed to solve often, if at all (in real-world code). But even if we assume it is worth having as a builtin, what would you propose as the implementation? The stackoverflow discussion highlights a lot of approaches, all with their own trade-offs. One problem with a builtin is that it would have to work on all iterables, which is likely to preclude a number of the faster solutions (which rely on the argument being an actual list).

It's an interesting optimisation problem, and the discussion gives some great insight into how to micro-optimise an operation like this, but I'd question whether it needs to be a language/stdlib feature.

Paul

I've needed it several times, but can't really remember what for anymore, which makes me think it's not really that important. A motivating reason for adding it to the builtins would be that it can be written in C instead of Python, and hence be a lot faster.

This should be on the bug tracker as "release blocker" as we clearly need something that is fast that isn't that important. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

אלעזר

2:45 p.m.

It is a real problem. People are used to write `seq == [1, 2, 3]` and it passes unnoticed (even with type checkers) that if seq changes to e.g. a tuple, it will cause subtle bugs. It is inconvenient to write `len(seq) == 3 and seq == [1, 2, 3]` and people often don't notice the need to write it. (I'd like to note that it makes sense for this operation to be written as *iter1 == *lst although it requires a significant change to the language, so a Sequence.equal function makes sense) Elazar On Thu, Oct 6, 2016 at 5:02 PM Paul Moore <p.f.moore@gmail.com> wrote:

...

On 6 October 2016 at 14:45, Filipp Bakanov <filipp@bakanov.su> wrote:

...
For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty. That's quite popular problem, there is a discussion of how to perform it on stackoverflow ( http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-a... ) - all suggestions are either slow or not very elegant. What do you think about it?

It's not a problem I've needed to solve often, if at all (in real-world code). But even if we assume it is worth having as a builtin, what would you propose as the implementation? The stackoverflow discussion highlights a lot of approaches, all with their own trade-offs. One problem with a builtin is that it would have to work on all iterables, which is likely to preclude a number of the faster solutions (which rely on the argument being an actual list).

It's an interesting optimisation problem, and the discussion gives some great insight into how to micro-optimise an operation like this, but I'd question whether it needs to be a language/stdlib feature.

Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Sjoerd Job Postmus

2:52 p.m.

On Thu, Oct 06, 2016 at 02:45:11PM +0000, אלעזר wrote:

...

It is a real problem. People are used to write `seq == [1, 2, 3]` and it passes unnoticed (even with type checkers) that if seq changes to e.g. a tuple, it will cause subtle bugs. It is inconvenient to write `len(seq) == 3 and seq == [1, 2, 3]` and people often don't notice the need to write it.

(I'd like to note that it makes sense for this operation to be written as

*iter1 == *lst

although it requires a significant change to the language, so a Sequence.equal function makes sense)

Elazar

I think you're mistaken about the suggestion. It's not about a function def equal(it1: Iterable, it2: Iterable) -> bool: but about a function def equal(it: Iterable) -> bool: .

אלעזר

2:56 p.m.

On Thu, Oct 6, 2016 at 5:53 PM Sjoerd Job Postmus <sjoerdjob@sjoerdjob.com> wrote:

...

On Thu, Oct 06, 2016 at 02:45:11PM +0000, אלעזר wrote:

...
It is a real problem. People are used to write `seq == [1, 2, 3]` and it passes unnoticed (even with type checkers) that if seq changes to e.g. a tuple, it will cause subtle bugs. It is inconvenient to write `len(seq) == 3 and seq == [1, 2, 3]` and people often don't notice the need to write it.

(I'd like to note that it makes sense for this operation to be written as

*iter1 == *lst

although it requires a significant change to the language, so a Sequence.equal function makes sense)

Elazar

I think you're mistaken about the suggestion.

You are right of course. Sorry. Elazar

Steven D'Aprano

3:23 p.m.

On Thu, Oct 06, 2016 at 04:45:01PM +0300, Filipp Bakanov wrote:

...

For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty. That's quite popular problem, there is a discussion of how to perform it on stackoverflow ( http://stackoverflow.com/questions/3844801/check-if-all-elements-in-a-list-a...) - all suggestions are either slow or not very elegant.

I haven't checked the link, but just off the top of my head, how's this? def all_equal(iterable): it = iter(iterable) sentinel = object() first = next(it, sentinel) return all(x == first for x in it) I think that's neat, elegant, fast, and short enough that I don't mind writing it myself when I need it (although I wouldn't mind adding it to my own personal toolbox). +0.3 to adding it the standard library. +0.1 to adding it to built-ins -0.1 on adding it to built-ins under the name "equal", as that will confuse too many people. -- Steve

Chris Angelico

3:42 p.m.

On Fri, Oct 7, 2016 at 2:23 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

+0.3 to adding it the standard library.

+0.1 to adding it to built-ins

-0.1 on adding it to built-ins under the name "equal", as that will confuse too many people.

I'll go further: -0.5 on adding to built-ins. +0.5 on adding it to itertools or the itertools recipes. ChrisA

Filipp Bakanov

5:09 p.m.

Seems like itertools recipes already have "all_equal" function. What do you think about moving it from recipes to itertools? I suggest a C implementation with optimisations for builtin collections. 2016-10-06 18:42 GMT+03:00 Chris Angelico <rosuav@gmail.com>:

...

On Fri, Oct 7, 2016 at 2:23 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...
+0.3 to adding it the standard library.

+0.1 to adding it to built-ins

-0.1 on adding it to built-ins under the name "equal", as that will confuse too many people.

I'll go further: -0.5 on adding to built-ins. +0.5 on adding it to itertools or the itertools recipes.

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

אלעזר

5:20 p.m.

The name might be a little confusing; it can be understood as comparing two sequences, so passing two sequences may seem reasonable to a reviewer. Elazar בתאריך יום ה׳, 6 באוק' 2016, 20:15, מאת Filipp Bakanov ‏<filipp@bakanov.su>:

...

Seems like itertools recipes already have "all_equal" function. What do you think about moving it from recipes to itertools? I suggest a C implementation with optimisations for builtin collections.

2016-10-06 18:42 GMT+03:00 Chris Angelico <rosuav@gmail.com>:

On Fri, Oct 7, 2016 at 2:23 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...
+0.3 to adding it the standard library.

+0.1 to adding it to built-ins

-0.1 on adding it to built-ins under the name "equal", as that will confuse too many people.

I'll go further: -0.5 on adding to built-ins. +0.5 on adding it to itertools or the itertools recipes.

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Paul Moore

7:15 p.m.

On 6 October 2016 at 18:09, Filipp Bakanov <filipp@bakanov.su> wrote:

...

Seems like itertools recipes already have "all_equal" function. What do you think about moving it from recipes to itertools? I suggest a C implementation with optimisations for builtin collections.

Interestingly, the recipe given there was not mentioned in the stackoverflow thread. Testing it against Steven's example given above: recipe.py: from itertools import groupby def all_equal_1(iterable): "Returns True if all the elements are equal to each other" g = groupby(iterable) return next(g, True) and not next(g, False) def all_equal_2(iterable): it = iter(iterable) sentinel = object() first = next(it, sentinel) return all(x == first for x in it) Results:

...

# Itertools recipe, all different py -m perf timeit -s "from recipe import all_equal_1, all_equal_2; x = range(1000); y = [0] * 1000" "all_equal_1(x)" .................... Median +- std dev: 596 ns +- 10 ns

...

# Itertools recipe, all the same py -m perf timeit -s "from recipe import all_equal_1, all_equal_2; x = range(1000); y = [0] * 1000" "all_equal_1(y)" .................... Median +- std dev: 7.17 us +- 0.05 us

...

# Steven's recipe, all different py -m perf timeit -s "from recipe import all_equal_1, all_equal_2; x = range(1000); y = [0] * 1000" "all_equal_2(x)" .................... Median +- std dev: 998 ns +- 12 ns

...

# Steven's recipe, all the same py -m perf timeit -s "from recipe import all_equal_1, all_equal_2; x = range(1000); y = [0] * 1000" "all_equal_2(y)" .................... Median +- std dev: 84.3 us +- 0.9 us

So the itertools recipe is just under twice as fast for all-different values, and over 10 times faster if all the values are the same. The moral here is probably to check the itertools recipes, they are really well coded. If you really feel that it's worth promoting this recipe to an actual itertools function, you should probably create a tracker item for it, aimed at Python 3.7, with a patch implementing it. My feeling is that Raymond (who's in charge of the itertools module) won't think it's worth including - he's typically very cautious about adding itertools unless they have proven broad value. But that's just my guess, and the only way to know for sure is to ask. BTW, given that this *is* already an itertools recipe, it seems clear to me that the only reasonable place to put it if it does go into core Python would be the itertools module. Paul

Ethan Furman

3:39 p.m.

On 10/06/2016 06:45 AM, Filipp Bakanov wrote:

...

For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty.

That's quite popular problem, there is a discussion of how to perform it on stackoverflow - all suggestions are either slow or not very elegant.

What do you think about it?

I don't know if it's common enough to warrant being a built-in, but I know I've needed it several times, and wrote my own. -- ~Ethan~

Nick Coghlan

2:55 p.m.

On 6 October 2016 at 23:45, Filipp Bakanov <filipp@bakanov.su> wrote:

...

For now there are many usefull builtin functions like "any", "all", etc. I'd like to propose a new builtin function "equal". It should accept iterable, and return True if all items in iterable are the same or iterable is emty.

If the items are hashable, you can already just dump them in a set: len(set(iterable)) <= 1 If they're not hashable or you want to exit ASAP on larger inputs, you'll want an algorithm that works the same way any/all do: def all_same(iterable): itr = itr(iterable) try: first = next(itr) except StopIteration: return True return all(x == first for x in itr) (Checking the SO question, both of those are given in the first answer) If you know you have a sequence, you can also do: not seq or all(x == seq[0] for x in seq) Exactly which of those options makes sense is going to depend on what format your data is in, and what other operations you're planning to do with it - without a context of use in the SO question, it sounds more like someone seeking help with their algorithms and data structures homework than it does a practical programming problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

3083

Age (days ago)

3084

Last active (days ago)

List overview

Download

13 comments

9 participants

participants (9)

Chris Angelico
Ethan Furman
Filipp Bakanov
Mark Lawrence
Nick Coghlan
Paul Moore
Sjoerd Job Postmus
Steven D'Aprano
אלעזר

Add "equal" builtin function

Mark Lawrence

tags

participants (9)