Re: dict_items.__getitem__?
I think there definitely should be a more obvious way to do this (specifically the first and last keys/values/items of a dictionary — I'm ambivalent about the others, since they won't always be fast, as discussed). An anti-pattern you see quite often on Stack Overflow to get the first key of a dictionary is something like the following: first_key = list(mydict.keys())[0] Whereas obviously, a much better way (especially if it's a very large dictionary) is to do: first_key = next(iter(mydict)) [Christopher Barker]
I'll leave it exercise for the reader to find that thead
For reference, the (very long) previous thread is here: https://mail.python.org/archives/list/python-ideas@python.org/thread/S7UMTWK.... [Inada Naoki]
I think we can add `itertools.first()` for this idiom, and `itertools.last()` for `next(iter(reversed(x)))` idiom.
I like this idea, a lot. Another possibility I've been wondering about was whether several methods should be added to the dict interface: dict.first_key = lambda self: next(iter(self)) dict.first_val = lambda self: next(iter(self.values())) dict.first_item = lambda self: next(iter(self.items())) dict.last_key = lambda self: next(reversed(self)) dict.last_val = lambda self: next(reversed(self.values())) dict.last_item = lambda self: next(reversed(self.items())) But I think I like a lot more the idea of adding general ways of doing these things to itertools. Best, Alex
On 5 Oct 2021, at 05:30, Christopher Barker <pythonchb@gmail.com> wrote: On Mon, Oct 4, 2021 at 5:46 PM Erik Demaine <edemaine@mit.edu> wrote:
Have folks thought about allowing indexing dictionary views as in the following code, where d is a dict object?
d.keys()[0] d.keys()[-1] d.values()[0] d.values()[-1] d.items()[0] d.items()[-1] # item that would be returned by d.popitem()
since dicts were made order-preserving, indexing the keys, items, etc does make some sense.
I've also often wanted to get an arbitrary item/key from a dictionary, and
This is indeed one of the use cases identified.
I found some related discussion in https://mail.python.org/archives/list/python-ideas@python.org/thread/QVTGZD6... but not this exact idea.
That's a pretty different idea but this exact idea has been discussed on this list relatively recently. I still like it, but there wan't much general support.
I'll leave it exercise for the read to find that thead, but it is there, and I suggest you look for it if you want to further pursue this idea.
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RAEDZP... Code of Conduct: http://python.org/psf/codeofconduct/
Alex Waygood writes:
Whereas obviously,
The temptation to insist "see, YAGNI!" at this point I shall resist.
a much better way (especially if it's a very large dictionary) is to do:
first_key = next(iter(mydict))
[Inada Naoki]
I think we can add `itertools.first()` for this idiom, and `itertools.last()` for `next(iter(reversed(x)))` idiom.
I was +0 sympathetic to that until you posted this obvious extension of the idea:
I like this idea, a lot. Another possibility I've been wondering about was whether several methods should be added to the dict interface: dict.first_key = lambda self: next(iter(self)) dict.first_val = lambda self: next(iter(self.values())) dict.first_item = lambda self: next(iter(self.items())) dict.last_key = lambda self: next(reversed(self)) dict.last_val = lambda self: next(reversed(self.values())) dict.last_item = lambda self: next(reversed(self.items()))
That's an awful lot of new methods to take advantage of what for many applications of dict (in fact, *all* of my applications ever[1]) is an irrelevant ordering. And anyone who wants it can do it themselves: def first_key(dct): return next(iter(dct)) def first_val(dct): return next(iter(dct.values())) def first_item(dct): return next(iter(dct.items())) def last_key(dct): return next(reversed(iter(dct))) def last_val(dct): return next(reversed(iter(dct.values()))) def last_item(dct): return next(reversed(iter(dct.items()))) These defs do something undefined on unordered mappings (ie, not based on dict), and may be dangerous in that sense. OTOH, I suspect the methods will do the wrong thing with many ordered mappings based on dict that support orders other than insertion order.
But I think I like a lot more the idea of adding general ways of doing these things to itertools.
If you want to convince others, you really need to be more specific about the requirements that lead you to this conclusion. In the current implementation, positional indexing is time-expensive, much more so than keeping an auxiliary list and using dct[lst[ndx]]. It could also allow timing attacks if used in security-sensitive code. And I wonder if any code is being written obscurely to ensure that keys get added to dicts in the "right" order, instead of keeping an explict, efficient, and reorderable auxiliary list of keys? Footnotes: [1] It's true that some code I import is probably improved because the developers don't waste time tracking down test failures in dict- based data structures. :-)
The temptation to insist "see, YAGNI!" at this point I shall resist.
*You* might not need it, but I've seen it come up a lot on Stack Overflow, and all too often people end up going for the much less efficient solution. I personally have also written code with practical applications using `next(iter(mydict))`.
I was +0 sympathetic to that until you posted this obvious extension of the idea
I don't see these dict methods as an "obvious extension" of the idea; I see them as unneeded/irrelevant *if* a more general way of doing this is added to itertools.
That's an awful lot of new methods to take advantage of what for many applications of dict (in fact, *all* of my applications ever[1]) is an irrelevant ordering.
I agree that it's a lot of methods to add. That's precisely why I prefer Inada Naoki's suggestion of additions to itertools, and why I've never bothered starting a thread on the subject myself. But since the topic was being discussed after Erik started this thread, I thought I might mention the alternative solution that's been in the back of my head for a while.
[A]nyone who wants it can do it themselves:
As I said in my previous message, the existing idiom (`next(iter(mydict))`) appears to be extremely non-obvious for beginners in Python, causing them to often go for much more inefficient options. No one is claiming "there is no way to do this" at present; the argument is that the best way to do this is, at present, *not sufficiently obvious*.
These defs do something undefined on unordered mappings (ie, not based on dict), and may be dangerous in that sense.
Excellent point (though note that my suggestion was to extend the dict interface, not the interface for all mappings).
In the current implementation, positional indexing is time-expensive
Yes, that's exactly why I don't support Erik's original suggestion of positional indexing for dict.keys(), dict.values() and dict.items(), and why I *only* support easy ways of fetching the first/last object from the view.
I wonder if any code is being written obscurely to ensure that keys get added to dicts in the "right" order
I personally have used dictionaries as ordered sets in the past (skip to the last code snippet in this answer: https://codereview.stackexchange.com/a/264437/24517), and would object to this being called an "obscure" use of the data structure. It is, by now, both well-known and well-advertised in the documentation that dictionaries are guaranteed to maintain insertion order. It's obviously not how the data structure has historically been used, but I don't see how that makes it an invalid use of the data structure *now*. Why shouldn't users be expected to exploit an advertised feature of the language?
On 6 Oct 2021, at 10:29, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Alex Waygood writes:
Whereas obviously,
The temptation to insist "see, YAGNI!" at this point I shall resist.
a much better way (especially if it's a very large dictionary) is to do: first_key = next(iter(mydict))
[Inada Naoki]
I think we can add `itertools.first()` for this idiom, and `itertools.last()` for `next(iter(reversed(x)))` idiom.
I was +0 sympathetic to that until you posted this obvious extension of the idea:
I like this idea, a lot. Another possibility I've been wondering about was whether several methods should be added to the dict interface: dict.first_key = lambda self: next(iter(self)) dict.first_val = lambda self: next(iter(self.values())) dict.first_item = lambda self: next(iter(self.items())) dict.last_key = lambda self: next(reversed(self)) dict.last_val = lambda self: next(reversed(self.values())) dict.last_item = lambda self: next(reversed(self.items()))
That's an awful lot of new methods to take advantage of what for many applications of dict (in fact, *all* of my applications ever[1]) is an irrelevant ordering.
And anyone who wants it can do it themselves:
def first_key(dct): return next(iter(dct)) def first_val(dct): return next(iter(dct.values())) def first_item(dct): return next(iter(dct.items())) def last_key(dct): return next(reversed(iter(dct))) def last_val(dct): return next(reversed(iter(dct.values()))) def last_item(dct): return next(reversed(iter(dct.items())))
These defs do something undefined on unordered mappings (ie, not based on dict), and may be dangerous in that sense. OTOH, I suspect the methods will do the wrong thing with many ordered mappings based on dict that support orders other than insertion order.
But I think I like a lot more the idea of adding general ways of doing these things to itertools.
If you want to convince others, you really need to be more specific about the requirements that lead you to this conclusion. In the current implementation, positional indexing is time-expensive, much more so than keeping an auxiliary list and using dct[lst[ndx]]. It could also allow timing attacks if used in security-sensitive code. And I wonder if any code is being written obscurely to ensure that keys get added to dicts in the "right" order, instead of keeping an explict, efficient, and reorderable auxiliary list of keys?
Footnotes: [1] It's true that some code I import is probably improved because the developers don't waste time tracking down test failures in dict- based data structures. :-)
On Wed, Oct 06, 2021 at 11:11:09AM +0100, Alex Waygood wrote:
The temptation to insist "see, YAGNI!" at this point I shall resist.
*You* might not need it, but I've seen it come up a lot on Stack Overflow, and all too often people end up going for the much less efficient solution. I personally have also written code with practical applications using `next(iter(mydict))`.
Under what circumstances do you care what the first key in a dict is, without going on to care about the second, third, fourth etc? They are surely extremely niche, or artificial, or both, e.g. the Stackoverflow problem you link to: "find the first non-repeating character in a string -- using only one loop". Why the *first* rather than any, or all? In any case, the presence of one or two uses for a piece of functionality doesn't mandate that we make this a builtin. Your solution with next() is perfectly adequate. The other suggested methods are even more obscure. Why have a method for returning the first value, without knowing the key? "I don't know what the first key is, and I don't care, but I know that whatever it is, it maps to the value 17." Now what are you going to do with that knowledge? This seems like a method in desperate need of a use-case. [...]
I agree that it's a lot of methods to add. That's precisely why I prefer Inada Naoki's suggestion of additions to itertools
Whether they are added to dict or itertools, there are still nine of them, and they are pretty much near clones of each other: # first_ and last_ whatsits next([iter|reversed](obj.[keys|values|items]())) if you will excuse the misuse of hybrid Python/BNF syntax :-) -- Steve
Whether they are added to dict or itertools, there are still nine of them
No, the suggestion was to add two functions to itertools (first() and last(), which would work with any iterable, not just dicts), rather than adding nine methods to the dict interface. This was precisely why I was saying that I liked the itertools solution more.
On 6 Oct 2021, at 15:01, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 06, 2021 at 11:11:09AM +0100, Alex Waygood wrote:
The temptation to insist "see, YAGNI!" at this point I shall resist.
*You* might not need it, but I've seen it come up a lot on Stack Overflow, and all too often people end up going for the much less efficient solution. I personally have also written code with practical applications using `next(iter(mydict))`.
Under what circumstances do you care what the first key in a dict is, without going on to care about the second, third, fourth etc?
They are surely extremely niche, or artificial, or both, e.g. the Stackoverflow problem you link to: "find the first non-repeating character in a string -- using only one loop". Why the *first* rather than any, or all?
In any case, the presence of one or two uses for a piece of functionality doesn't mandate that we make this a builtin. Your solution with next() is perfectly adequate.
The other suggested methods are even more obscure. Why have a method for returning the first value, without knowing the key?
"I don't know what the first key is, and I don't care, but I know that whatever it is, it maps to the value 17."
Now what are you going to do with that knowledge? This seems like a method in desperate need of a use-case.
[...]
I agree that it's a lot of methods to add. That's precisely why I prefer Inada Naoki's suggestion of additions to itertools
Whether they are added to dict or itertools, there are still nine of them, and they are pretty much near clones of each other:
# first_ and last_ whatsits next([iter|reversed](obj.[keys|values|items]()))
if you will excuse the misuse of hybrid Python/BNF syntax :-)
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/T3TOFA... Code of Conduct: http://python.org/psf/codeofconduct/
+1. These would be handy for any iterable. It'll works on dict keys and values; bonus. On Wed, 2021-10-06 at 15:42 +0100, Alex Waygood wrote:
Whether they are added to dict or itertools, there are still nine of them
No, the suggestion was to add two functions to itertools (first() and last(), which would work with any iterable, not just dicts), rather than adding nine methods to the dict interface. This was precisely why I was saying that I liked the itertools solution more.
On 6 Oct 2021, at 15:01, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 06, 2021 at 11:11:09AM +0100, Alex Waygood wrote:
The temptation to insist "see, YAGNI!" at this point I shall resist.
*You* might not need it, but I've seen it come up a lot on Stack Overflow, and all too often people end up going for the much less efficient solution. I personally have also written code with practical applications using `next(iter(mydict))`.
Under what circumstances do you care what the first key in a dict is, without going on to care about the second, third, fourth etc?
They are surely extremely niche, or artificial, or both, e.g. the Stackoverflow problem you link to: "find the first non-repeating character in a string -- using only one loop". Why the *first* rather than any, or all?
In any case, the presence of one or two uses for a piece of functionality doesn't mandate that we make this a builtin. Your solution with next() is perfectly adequate.
The other suggested methods are even more obscure. Why have a method for returning the first value, without knowing the key?
"I don't know what the first key is, and I don't care, but I know that whatever it is, it maps to the value 17."
Now what are you going to do with that knowledge? This seems like a method in desperate need of a use-case.
[...]
I agree that it's a lot of methods to add. That's precisely why I prefer Inada Naoki's suggestion of additions to itertools
Whether they are added to dict or itertools, there are still nine of them, and they are pretty much near clones of each other:
# first_ and last_ whatsits next([iter|reversed](obj.[keys|values|items]()))
if you will excuse the misuse of hybrid Python/BNF syntax :-)
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python- ideas@python.org/message/T3TOFAFBPGY44LOVKSMVZJGBNQ7MUNEL/ Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OCVRRJ... Code of Conduct: http://python.org/psf/codeofconduct/
Alex Waygood writes:
the suggestion was to add two functions to itertools (first() and last(), which would work with any iterable,
OK, that wasn't obvious to me either, but good enough. I see the analogy to str's startswith and endswith, but I'm still -0 on these. The names suggest indexing by position, ie, a non- destructive operation. For sequences, they are non-destructive, but they're just alternative spellings of [0] and [-1], with the added bonus (?!) of extra overhead. But with iterators, they will be destructive, and first() is just a synonym for next() (with a tiny bit of extra overhead). IMO, that's icky, YMMV. I'm not terribly sympathetic to the argument that "Python beginners use inefficient alternatives". A Python beginner who needs this to be both efficient and polymorphic over sequences and iterators is already 10 feet down in the deep end, head-first. I'm sympathetic to the *beginner* -- I've found myself in that situation a half-dozen times with a half-dozen different languages. Thing is, one more idiom isn't going to kill the camel. In learning all the other stuff they'll need to know about iterables, the odds are good they'll pick this up en passant. I've had that experience, too, about 5 of 6 times, and in the last case Sempai noticed me and had pity. Steve
On Wed, Oct 06, 2021 at 03:42:28PM +0100, Alex Waygood wrote:
Whether they are added to dict or itertools, there are still nine of them
No, the suggestion was to add two functions to itertools (first() and last(), which would work with any iterable, not just dicts), rather than adding nine methods to the dict interface. This was precisely why I was saying that I liked the itertools solution more.
Okay. In another post, I've explained why I don't think that putting first() in itertools would actually be useful (TL;DR: newbies wouldn't know it was there, and to people experienced enough to know it was there, it's likely easier to just use next() directly). When I started writing this post, I was going to argue that we should put first() and last() in itertools as recipes, for their pedagogical value: def first(iterable): return next(iter(iterable)) def last(iterable): return next(reversed(iterable)) I had got so far as to open bugs.python.org to create a ticket. But I've convinced myself that they aren't even worthwhile as recipes. The problem is that if we are being precise and accurate, and we should be, the names are simply *wrong* and the behaviour will be confusing to those we're supposedly trying to help. The problem is that first() does not return the first item of the iterator, but the *next item still available*. a = list('12345') b = iter('12345') first(a) == first(a) # True first(b) == first(b) # False If that behaviour makes sense to you, and is what you expected, then congratulations, you're probably a very sophisticated Pythonista who understands that iterators mutate after each item is retrieved, in which case you probably aren't going to get any benefit at all from a named first() function. But if you are one of those newbies who (we're told) need a named function, then you will probably be totally gobsmacked that sometimes first() will return the first item, and always the first item, and sometimes it will return the first, second, third... items in sequence. If we are to be precise and accurate, *sequences* have a first item, but iterators only have a next item. -- Steve
The problem is that first() does not return the first item of the iterator, but the *next item still available*.
a = list('12345') b = iter('12345')
first(a) == first(a) # True first(b) == first(b) # False
This is an excellent point, and something I hadn't considered. Unless someone can think of a good workaround that doesn't make the implementation hideously complex, I retract my support for adding `first()` and `last()` to itertools.
On 10 Oct 2021, at 05:09, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 06, 2021 at 03:42:28PM +0100, Alex Waygood wrote:
Whether they are added to dict or itertools, there are still nine of them
No, the suggestion was to add two functions to itertools (first() and last(), which would work with any iterable, not just dicts), rather than adding nine methods to the dict interface. This was precisely why I was saying that I liked the itertools solution more.
Okay.
In another post, I've explained why I don't think that putting first() in itertools would actually be useful (TL;DR: newbies wouldn't know it was there, and to people experienced enough to know it was there, it's likely easier to just use next() directly).
When I started writing this post, I was going to argue that we should put first() and last() in itertools as recipes, for their pedagogical value:
def first(iterable): return next(iter(iterable))
def last(iterable): return next(reversed(iterable))
I had got so far as to open bugs.python.org to create a ticket.
But I've convinced myself that they aren't even worthwhile as recipes. The problem is that if we are being precise and accurate, and we should be, the names are simply *wrong* and the behaviour will be confusing to those we're supposedly trying to help.
The problem is that first() does not return the first item of the iterator, but the *next item still available*.
a = list('12345') b = iter('12345')
first(a) == first(a) # True first(b) == first(b) # False
If that behaviour makes sense to you, and is what you expected, then congratulations, you're probably a very sophisticated Pythonista who understands that iterators mutate after each item is retrieved, in which case you probably aren't going to get any benefit at all from a named first() function.
But if you are one of those newbies who (we're told) need a named function, then you will probably be totally gobsmacked that sometimes first() will return the first item, and always the first item, and sometimes it will return the first, second, third... items in sequence.
If we are to be precise and accurate, *sequences* have a first item, but iterators only have a next item.
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/F6TG5E... Code of Conduct: http://python.org/psf/codeofconduct/
I'm not a huge fan. Sure, dicts are ordered now, but I doubt that many people use that feature. I honestly still think of them as unordered ;) Let's talk code clarity. After all, to quote GvR, "Code is more often read than written." (I may have gotten the wording wrong, I just wrote it off the top of my head.) To me, the presence of a dict suggests that order doesn't matter. If you want order, communicate that by using `collections.OrderedDict`, a fully-featured dict subclass where the point is the order! You can get the first or last key/item pairs with `.popitem()`. It works! OrderedDict documentation: https://docs.python.org/3.10/library/collections.html#collections.OrderedDic... We could add indexing to OrderedDict, which would return key/value pairs. (While we're talking about collections, why don't we return a namedtuple ;) ) As for adding functions to `itertools`, sure, I'm for it. We don't need people writing `next(iter(iterable))` just to get the first item. -- Finn On Wed, Oct 6, 2021, 8:02 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 06, 2021 at 11:11:09AM +0100, Alex Waygood wrote:
The temptation to insist "see, YAGNI!" at this point I shall resist.
*You* might not need it, but I've seen it come up a lot on Stack Overflow, and all too often people end up going for the much less efficient solution. I personally have also written code with practical applications using `next(iter(mydict))`.
Under what circumstances do you care what the first key in a dict is, without going on to care about the second, third, fourth etc?
They are surely extremely niche, or artificial, or both, e.g. the Stackoverflow problem you link to: "find the first non-repeating character in a string -- using only one loop". Why the *first* rather than any, or all?
In any case, the presence of one or two uses for a piece of functionality doesn't mandate that we make this a builtin. Your solution with next() is perfectly adequate.
The other suggested methods are even more obscure. Why have a method for returning the first value, without knowing the key?
"I don't know what the first key is, and I don't care, but I know that whatever it is, it maps to the value 17."
Now what are you going to do with that knowledge? This seems like a method in desperate need of a use-case.
[...]
I agree that it's a lot of methods to add. That's precisely why I prefer Inada Naoki's suggestion of additions to itertools
Whether they are added to dict or itertools, there are still nine of them, and they are pretty much near clones of each other:
# first_ and last_ whatsits next([iter|reversed](obj.[keys|values|items]()))
if you will excuse the misuse of hybrid Python/BNF syntax :-)
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/T3TOFA... Code of Conduct: http://python.org/psf/codeofconduct/
On Oct 6, 2021, at 10:53 AM, Finn Mason <finnjavier08@gmail.com> wrote:
I'm not a huge fan. Sure, dicts are ordered now, but I doubt that many people use that feature. I honestly still think of them as unordered ;)
There’s tons of code that relies on dicts being ordered. Some just don’t know it. For example, dataclasses relies on ordered dicts. I think we can rely on dicts being ordered as a language guarantee for the rest of time. Eric
Let's talk code clarity. After all, to quote GvR, "Code is more often read than written." (I may have gotten the wording wrong, I just wrote it off the top of my head.) To me, the presence of a dict suggests that order doesn't matter. If you want order, communicate that by using `collections.OrderedDict`, a fully-featured dict subclass where the point is the order! You can get the first or last key/item pairs with `.popitem()`. It works!
OrderedDict documentation: https://docs.python.org/3.10/library/collections.html#collections.OrderedDic...
We could add indexing to OrderedDict, which would return key/value pairs. (While we're talking about collections, why don't we return a namedtuple ;) ) As for adding functions to `itertools`, sure, I'm for it. We don't need people writing `next(iter(iterable))` just to get the first item.
-- Finn
On Wed, Oct 6, 2021, 8:02 AM Steven D'Aprano <steve@pearwood.info> wrote: On Wed, Oct 06, 2021 at 11:11:09AM +0100, Alex Waygood wrote:
The temptation to insist "see, YAGNI!" at this point I shall resist.
*You* might not need it, but I've seen it come up a lot on Stack Overflow, and all too often people end up going for the much less efficient solution. I personally have also written code with practical applications using `next(iter(mydict))`.
Under what circumstances do you care what the first key in a dict is, without going on to care about the second, third, fourth etc?
They are surely extremely niche, or artificial, or both, e.g. the Stackoverflow problem you link to: "find the first non-repeating character in a string -- using only one loop". Why the *first* rather than any, or all?
In any case, the presence of one or two uses for a piece of functionality doesn't mandate that we make this a builtin. Your solution with next() is perfectly adequate.
The other suggested methods are even more obscure. Why have a method for returning the first value, without knowing the key?
"I don't know what the first key is, and I don't care, but I know that whatever it is, it maps to the value 17."
Now what are you going to do with that knowledge? This seems like a method in desperate need of a use-case.
[...]
I agree that it's a lot of methods to add. That's precisely why I prefer Inada Naoki's suggestion of additions to itertools
Whether they are added to dict or itertools, there are still nine of them, and they are pretty much near clones of each other:
# first_ and last_ whatsits next([iter|reversed](obj.[keys|values|items]()))
if you will excuse the misuse of hybrid Python/BNF syntax :-)
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/T3TOFA... Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3RKWSN... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Oct 06, 2021 at 11:17:07AM -0400, Eric V. Smith wrote:
I think we can rely on dicts being ordered as a language guarantee for the rest of time.
Indeed. That's official and documented. "Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6." https://docs.python.org/3/library/stdtypes.html#mapping-types-dict If we were ever to make them unordered again, that would be a breaking change that would need to go through a full deprecation process. Given how serious that would be, it would be a **LONG** deprecation process, so it wouldn't happen until at least Python 5000. We're more likely to add new, specialised mapping types which may not be ordered, rather than breaking that guarantee for dicts. And even that is unlikely unless there is a pressing need for them in the stdlib. -- Steve
On Wed, Oct 6, 2021 at 10:55 AM Finn Mason <finnjavier08@gmail.com> wrote:
I'm not a huge fan. Sure, dicts are ordered now, but I doubt that many people use that feature. I honestly still think of them as unordered ;)
I've seen several people say this so I'll be a voice on the other side: I am not a pro developer so my practices should probably not be weighted all that much. But nevertheless, I have been constantly relying on order-ness in regular dicts ever since it was non-official thing in cpython. I actually did a little happy dance in my chair when RH announced this at pycon years ago. I am sure I am not the only one. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
On Wed, Oct 6, 2021, 9:23 AM Ricky Teachey <ricky@teachey.org> wrote:
On Wed, Oct 6, 2021 at 10:55 AM Finn Mason <finnjavier08@gmail.com> wrote:
I'm not a huge fan. Sure, dicts are ordered now, but I doubt that many people use that feature. I honestly still think of them as unordered ;)
I've seen several people say this so I'll be a voice on the other side: I am not a pro developer so my practices should probably not be weighted all that much. But nevertheless, I have been constantly relying on order-ness in regular dicts ever since it was non-official thing in cpython. I actually did a little happy dance in my chair when RH announced this at pycon years ago.
I am sure I am not the only one.
--- Ricky.
"I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
Perhaps I'm wrong about that. However, I would still like things relying on ordered dictionaries to be implemented in OrderedDict first. I might be alone in this opinion, though.
On 2021-10-06 at 08:52:22 -0600, Finn Mason <finnjavier08@gmail.com> wrote:
I'm not a huge fan. Sure, dicts are ordered now, but I doubt that many people use that feature. I honestly still think of them as unordered ;)
+1 on still think of mappings as unordered (but finding myself screaming Get Off My Lawn more and more). I won't comment on popularity, except to note that there's a big part of me that thinks that it's popular for all the wrong reasons. I don't have any new arguments, but call me -1 on making mappings less distinct from lists, unless and until there is *no* performance penalty (space or speed) for using the former instead of the latter.
Finn Mason writes:
We don't need people writing `next(iter(iterable))` just to get the first item.
We already don't need that. `sequence[0]` and `next(iterator)` do the trick. You only need `next(iter(iterable))` if you need all three of - an expression (`for first in iterable: break` gives the other two), - efficient (although the difference with the `for` idiom should usually be irrelevant) - polymorphic over sequences and iterators (both special notations are more efficient expressions). and it makes it explicit that that's what you're after. At least for beginners, putting this idiom in the Tutorial is probably the best idea. With the exception of wacko iterables that define `__next__` but not `__iter__` (which `first` could take care of, but do we really want to do that? I guess we should if we're going to define `first`), not only is the idiom the efficient polymorphic expression, but to understand why it works teachs a lot about iterables vs. iterators vs. sequences.
It really ius frustrating how often we repeat entire conversations on this list :-( But last time, one of the use cases was "get a random item from a dict", and there really is not a terribly easy (and efficient) way to do that now. Anyway, I do see the benefit of adding first() to itertools -- there really is a key problem with: next(iter(an_iterable)) for newbies -- you can get really really far in Python without ever having to call either next() or iter(). Sure, if it's a recipe, people can use it without really understanding it, but having an easy and obvious solution would be nice. NOTE: I actually use this as a teaching moment in my intro to Python class: we have an assignment where there is a need to grab an arbitrary (not necessarily random) item from a dict (without removing it). most students come up with something like: random.choice(list(the_dict.items())) which works fine for any but the largest of dicts, but is not an optimum solution. I don't think anyone has ever come up with next(iter(the_dict.items)) on their own. So I have a little lesson where I run through the options, and use it as a chance to introduce the iterator protocol. So it's a good teaching moment, but not a great example of how Python usually has an easy and obvious way to do seemingly simple operations. -CHB On Thu, Oct 7, 2021 at 12:44 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
Finn Mason writes:
We don't need people writing `next(iter(iterable))` just to get the first item.
We already don't need that. `sequence[0]` and `next(iterator)` do the trick.
You only need `next(iter(iterable))` if you need all three of
- an expression (`for first in iterable: break` gives the other two), - efficient (although the difference with the `for` idiom should usually be irrelevant) - polymorphic over sequences and iterators (both special notations are more efficient expressions).
and it makes it explicit that that's what you're after.
At least for beginners, putting this idiom in the Tutorial is probably the best idea. With the exception of wacko iterables that define `__next__` but not `__iter__` (which `first` could take care of, but do we really want to do that? I guess we should if we're going to define `first`), not only is the idiom the efficient polymorphic expression, but to understand why it works teachs a lot about iterables vs. iterators vs. sequences.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5LQ5TI... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Christopher Barker writes:
But last time, one of the use cases was "get [an arbitrary] item from a dict", and there really is not a terribly easy (and efficient) way to do that now.
What's wrong with thedict.popitem()? Works in Python 2.7, BTW. Granted, this doesn't help for a random item (and it's last, not first), but I think dicts are a pretty special case. The fact that they're ubiquitous and ordered in Python 3 makes "indexing into dicts" a real problem to be solved, but not by itertools.first, IMO. OTOH, getting an arbitrary element, or state-dependent first element, from an iterable *is* simple, easy, *and obvious* to beginners: index with [0] if it's a sequence, and apply next() if it's an iterator. How many builtin or stdlib types are iterable, but neither sequences nor iterators? Builtin open(), zip(), reversed(), and enumerate(), for example, return iterators. Not to mention that only in the case of open() are they likely to have the iterator bound to an identifier they can apply itertools.first to.
So it's a good teaching moment, but not a great example of how Python usually has an easy and obvious way to do seemingly simple operations.
Polymorphic functions are not simple, almost by definition. IMHO, next(iter()) *is* Python giving us an easy and Zen-obvious way to do a polymorphic operation. It makes the following caveats Zen-obvious: it's state-dependent and destructive on iterators. Steve
On Sun, Oct 10, 2021 at 01:51:52AM +0900, Stephen J. Turnbull wrote:
Christopher Barker writes:
But last time, one of the use cases was "get [an arbitrary] item from a dict", and there really is not a terribly easy (and efficient) way to do that now.
What's wrong with thedict.popitem()? Works in Python 2.7, BTW.
It removes the key and value, not just retrieve them. -- Steve
Steven D'Aprano writes:
On Sun, Oct 10, 2021 at 01:51:52AM +0900, Stephen J. Turnbull wrote:
Christopher Barker writes:
But last time, one of the use cases was "get [an arbitrary] item from a dict", and there really is not a terribly easy (and efficient) way to do that now.
What's wrong with thedict.popitem()? Works in Python 2.7, BTW.
It removes the key and value, not just retrieve them.
So does next() on an iterator. If you're sampling with replacement, replace them. At least you can do that with dicts! The incoherence of polymorphic 'first' is my main point. I don't see why this is any worse than modifying an iterator in 'first'. Steve
On Mon, Oct 11, 2021 at 12:17 AM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Steven D'Aprano writes:
On Sun, Oct 10, 2021 at 01:51:52AM +0900, Stephen J. Turnbull wrote:
Christopher Barker writes:
But last time, one of the use cases was "get [an arbitrary] item from a dict", and there really is not a terribly easy (and efficient) way to do that now.
What's wrong with thedict.popitem()? Works in Python 2.7, BTW.
It removes the key and value, not just retrieve them.
So does next() on an iterator. If you're sampling with replacement, replace them. At least you can do that with dicts!
That's only relevant to the iterator itself. Using popitem mutates the underlying dict. That's a tad more likely to affect other parts of the code. ChrisA
Chris Angelico writes:
Using popitem mutates the underlying dict. That's a tad more likely to affect other parts of the code.
Granted. The context is opposition to itertools.first. "Advocating" popitem is tongue-in-cheek, with the serious point is that it's obvious that it alters state from the name (at least if you have any knowledge of data structures). You can restore the state trivially (OK, race condition, yadda yadda). Neither is true of itertools.first. It's neither obviously destructive from the name, nor is it necessarily possible to restore state.
On Fri, Oct 08, 2021 at 01:42:35PM -0700, Christopher Barker wrote:
It really ius frustrating how often we repeat entire conversations on this list :-(
That's because y'all don't just admit I'm right the first time *wink*
But last time, one of the use cases was "get a random item from a dict", and there really is not a terribly easy (and efficient) way to do that now.
Dicts are hash tables, not sequences. I think that the most efficient way to get a random item from a dict is to convert it to a list first, and then get the random item from the list. Not every data structure is well-suited to random access by position. But what's your use-case for getting a random item from a dict? Apart from a programming exercise and teaching moment, why would you want to get a random key and value from a dict? In 25-ish years of using Python, I think that the number of times I've needed to do that is zero. A random item from a list, dozens of times. But from a dict, never. (I'm not even convinced that dict.popitem is useful either. But maybe that's just me, and others use it ten times a day.) I'm not saying that there is no good reason to do so, but it is hardly a common task. In any case, if you want a *random* item, using `first()` and getting the first item every time is hardly random. It's not even really a good match for an *arbitrary* item. "In the case of a tie, the winner is the candidate sitting closest to the door." We introduced random.choice() back in Python 2.1. Nobody noticed until 3.4 that it didn't work on dicts, or at least they didn't care enough to raise a feature request to support dicts directly. https://bugs.python.org/issue33098 For many years in Python 2, random.choice() accidentally appeared to work with dicts if they were keyed with integers 0...len(dict), but returned values rather than keys. And nobody seems to have noticed. At least I can't find any reports in the bug tracker. So the evidence from the bug tracker suggests that getting a random key, value or item from a dict is more common as a programming exercise than a genuine thing that people need to do to solve real problems. -- Steve
On Sat, Oct 9, 2021, 10:44 PM Steven D'Aprano
Apart from a programming exercise and teaching moment, why would you want to get a random key and value from a dict? In 25-ish years of using Python, I think that the number of times I've needed to do that is zero. A random item from a list, dozens of times. But from a dict, never. (I'm not even convinced that dict.popitem is useful either. But maybe that's just me, and others use it ten times a day.)
I've used dict.popitem() from time to time. Usually for the purpose of consuming items in a loop. But if I want to, I can put items back when I want to. I don't see a need for anything past that. On rare occasion, this is fine: somekey, someval = mydict.popitem() mydict[somekey] = someval do_stuff(somekey, someval)
As was discussed the last time, I don’t know that selecting a random item from a dict has a lot of use cases — I’m pretty sure the only time I’ve needed to do it was for Dave Thomas’s “trigrams” coding kata. Though I’m not sure I’ve used random.choice fir anything mor “real” either. But it is a more compelling reason to add a feature to dict than getting the first or last item :-) -CHB On Sat, Oct 9, 2021 at 7:55 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
On Sat, Oct 9, 2021, 10:44 PM Steven D'Aprano
Apart from a programming exercise and teaching moment, why would you want to get a random key and value from a dict? In 25-ish years of using Python, I think that the number of times I've needed to do that is zero. A random item from a list, dozens of times. But from a dict, never. (I'm not even convinced that dict.popitem is useful either. But maybe that's just me, and others use it ten times a day.)
I've used dict.popitem() from time to time. Usually for the purpose of consuming items in a loop. But if I want to, I can put items back when I want to.
I don't see a need for anything past that. On rare occasion, this is fine:
somekey, someval = mydict.popitem() mydict[somekey] = someval do_stuff(somekey, someval) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5ILH4Y... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Fri, Oct 08, 2021 at 01:42:35PM -0700, Christopher Barker wrote:
Anyway, I do see the benefit of adding first() to itertools -- there really is a key problem with:
next(iter(an_iterable))
for newbies -- you can get really really far in Python without ever having to call either next() or iter(). Sure, if it's a recipe, people can use it without really understanding it, but having an easy and obvious solution would be nice.
Yes? How does that make it a *problem*? I disagree strongly that needing to learn a simple, basic technique is a problem to be solved. Let's take a simple example: the successor of a Unicode character e.g. B follows A, ρ (rho) follows π (pi). https://stackoverflow.com/questions/2673360/most-efficient-way-to-get-next-l... The obvious solution is: chr(ord(c)+1) but perhaps not so obvious to newbies. We can get very far without learning about ord() and chr(), so perhaps we need a string method `nextchar()` so that newbies don't get confused by the complexities of addition, ord() and chr(). Perhaps not. Programming is about composing functionality. I think it is fine to just teach them to use ord() and chr(). Not every one-liner needs to be a builtin, or even a named function in the stdlib. I think the same applies to `first()` and `last()`. Just learn to compose the existing functionality. The beauty of *teaching people to fish* instead of just handing them a fish is that they can then decide for themselves how to handle this. If their iterable is already an iterator, they can save one function call by dropping the call to iter() and just calling next() directly. If they want the second item as well, they can call next() a second time. If they want a default value when the iterable is empty, they can provide a default to the next() call. Instead of a fairly trivial named function that probably nobody is going to use in practice, it becomes a learning moment. Honestly, it really isn't clear that *anyone* would use these functions in practice. Especially if you already know you have an iterator, it is easier to just write a = next(myiterator) than to do this: # scroll to the top of the module, where the imports live from itertools import first # scroll back to where you were working a = first(myiterator) Newbies won't know first() lives in itertools, and those experienced enough to know it is there probably won't bother to use it. -- Steve
On Sat, Oct 9, 2021, 9:56 PM Steven D'Aprano <steve@pearwood.info> wrote:
[Snip...]
Newbies won't know first() lives in itertools, and those experienced
enough to know it is there probably won't bother to use it.
A very good point. Let's get back to the original topic. Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it? -- Finn Mason
On Sun, Oct 10, 2021 at 3:05 PM Finn Mason <finnjavier08@gmail.com> wrote:
On Sat, Oct 9, 2021, 9:56 PM Steven D'Aprano <steve@pearwood.info> wrote:
[Snip...]
Newbies won't know first() lives in itertools, and those experienced enough to know it is there probably won't bother to use it.
A very good point.
Let's get back to the original topic. Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
I say no, because dicts may retain order, but still aren't sequences. Under what situations do you actually want the 43rd item out of a dictionary? Asking for the first or the last MAY make some sense, but asking for arbitrary indexes doesn't really. And if you do, list(d) will give you the keys as a fully subscriptable sequence. ChrisA
On Sun, 10 Oct 2021 at 05:06, Finn Mason <finnjavier08@gmail.com> wrote:
Let's get back to the original topic. Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
I say no. "Why shouldn't it?" isn't sufficient justification for a change. Because it costs someone time and effort to implement it, and that time and effort is wasted unless people *actually use it*. Because no convincing use cases have been presented demonstrating that it would improve real-world code. Because dictionaries (mappings) and lists (sequences) are intended for different purposes. Because no-one is willing to implement this idea. Consider: "Should lists be indexable by arbitrary values, not just by integers? I say yes. Why shouldn't they?" "Should tuples be mutable? I say yes. Why shouldn't they?" "Should integers be allowed to have complex parts? I say yes. Why shouldn't they?" It's up to the person proposing a change to explain why the change *should* happen - not to everyone else to have to explain why it shouldn't. Paul
Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
Would there be a way to ensure that this had the same time complexity as indexing of sequences? If "yes", I would support this — I think it would be useful in some situations, and it would be more efficient than existing mechanisms to obtain the nth key from a dictionary. If (as I presume), the answer is "no", then I would not support this — I think it would give the misleading impression that obtaining the nth key/value from a dictionary is just as efficient as obtaining the nth item from a list or tuple. Best, Alex
On 10 Oct 2021, at 05:05, Finn Mason <finnjavier08@gmail.com> wrote:
On Sat, Oct 9, 2021, 9:56 PM Steven D'Aprano <steve@pearwood.info> wrote:
[Snip...]
Newbies won't know first() lives in itertools, and those experienced enough to know it is there probably won't bother to use it.
A very good point.
Let's get back to the original topic. Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
-- Finn Mason _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OOR2AU... Code of Conduct: http://python.org/psf/codeofconduct/
You have to check the C code to be sure, but IIRC the latest dict implementation has a dense array of the values in insert order, and the hash table (which has gaps) contains indexes into the values array. So you could easily index into the values array (which I believe also has the keys) in O(1) time. Though what happens to the dense array when a key is deleted? It must leave a gap there too. So, never mind, you’d have to walk through the array counting items but not gaps, and that’s O(n). Which explains why we don’t have such an API. But please check the C code! —Guido On Sun, Oct 10, 2021 at 07:18 Alex Waygood <alex.waygood@gmail.com> wrote:
Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
Would there be a way to ensure that this had the same time complexity as indexing of sequences? If "yes", I would support this — I think it would be useful in some situations, and it would be more efficient than existing mechanisms to obtain the *n*th key from a dictionary. If (as I presume), the answer is "no", then I would not support this — I think it would give the misleading impression that obtaining the *n*th key/value from a dictionary is just as efficient as obtaining the *n*th item from a list or tuple.
Best, Alex
On 10 Oct 2021, at 05:05, Finn Mason <finnjavier08@gmail.com> wrote:
On Sat, Oct 9, 2021, 9:56 PM Steven D'Aprano <steve@pearwood.info> wrote:
[Snip...]
Newbies won't know first() lives in itertools, and those experienced
enough to know it is there probably won't bother to use it.
A very good point.
Let's get back to the original topic. Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
-- Finn Mason
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OOR2AU... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RHNNAZ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
here is the thread from the last time that this was brought up: https://mail.python.org/archives/list/python-ideas@python.org/thread/S7UMTWK... It was very thoroughly discussed then. -CHB On Sun, Oct 10, 2021 at 8:33 AM Guido van Rossum <guido@python.org> wrote:
You have to check the C code to be sure, but IIRC the latest dict implementation has a dense array of the values in insert order, and the hash table (which has gaps) contains indexes into the values array. So you could easily index into the values array (which I believe also has the keys) in O(1) time.
Though what happens to the dense array when a key is deleted? It must leave a gap there too. So, never mind, you’d have to walk through the array counting items but not gaps, and that’s O(n). Which explains why we don’t have such an API. But please check the C code!
—Guido
On Sun, Oct 10, 2021 at 07:18 Alex Waygood <alex.waygood@gmail.com> wrote:
Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
Would there be a way to ensure that this had the same time complexity as indexing of sequences? If "yes", I would support this — I think it would be useful in some situations, and it would be more efficient than existing mechanisms to obtain the *n*th key from a dictionary. If (as I presume), the answer is "no", then I would not support this — I think it would give the misleading impression that obtaining the *n*th key/value from a dictionary is just as efficient as obtaining the *n*th item from a list or tuple.
Best, Alex
On 10 Oct 2021, at 05:05, Finn Mason <finnjavier08@gmail.com> wrote:
On Sat, Oct 9, 2021, 9:56 PM Steven D'Aprano <steve@pearwood.info> wrote:
[Snip...]
Newbies won't know first() lives in itertools, and those experienced
enough to know it is there probably won't bother to use it.
A very good point.
Let's get back to the original topic. Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
-- Finn Mason
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OOR2AU... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RHNNAZ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BISXD6... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
here is the thread from the last time that this was brought up:
https://mail.python.org/archives/list/python-ideas@python.org/thread/S7UMTWK...
Thanks, that's very helpful. Sounds like Guido is right, and the short answer is "while there would be ways of making it fast (and faster than the status quo) for small dicts, there is probably no way of making it O(1), as indexing is for lists and tuples". For me, that's a deal-breaker. I think if you're adding in a way to get the *n*th key of a dict by sequence-style indexing, it's going to be very surprising for newbies if they find out that getting the 8743rd key of the dict takes a lot longer than getting the 1st key of the dict, given that that's not at all the behaviour of lists and tuples. Best, Alex On Sun, Oct 10, 2021 at 7:26 PM Christopher Barker <pythonchb@gmail.com> wrote:
here is the thread from the last time that this was brought up:
https://mail.python.org/archives/list/python-ideas@python.org/thread/S7UMTWK...
It was very thoroughly discussed then.
-CHB
On Sun, Oct 10, 2021 at 8:33 AM Guido van Rossum <guido@python.org> wrote:
You have to check the C code to be sure, but IIRC the latest dict implementation has a dense array of the values in insert order, and the hash table (which has gaps) contains indexes into the values array. So you could easily index into the values array (which I believe also has the keys) in O(1) time.
Though what happens to the dense array when a key is deleted? It must leave a gap there too. So, never mind, you’d have to walk through the array counting items but not gaps, and that’s O(n). Which explains why we don’t have such an API. But please check the C code!
—Guido
On Sun, Oct 10, 2021 at 07:18 Alex Waygood <alex.waygood@gmail.com> wrote:
Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
Would there be a way to ensure that this had the same time complexity as indexing of sequences? If "yes", I would support this — I think it would be useful in some situations, and it would be more efficient than existing mechanisms to obtain the *n*th key from a dictionary. If (as I presume), the answer is "no", then I would not support this — I think it would give the misleading impression that obtaining the *n*th key/value from a dictionary is just as efficient as obtaining the *n*th item from a list or tuple.
Best, Alex
On 10 Oct 2021, at 05:05, Finn Mason <finnjavier08@gmail.com> wrote:
On Sat, Oct 9, 2021, 9:56 PM Steven D'Aprano <steve@pearwood.info> wrote:
[Snip...]
Newbies won't know first() lives in itertools, and those experienced
enough to know it is there probably won't bother to use it.
A very good point.
Let's get back to the original topic. Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
-- Finn Mason
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OOR2AU... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RHNNAZ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BISXD6... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Alex Waygood writes:
Should `dict.items()` be indexable now that dicts are ordered? I say yes. Why shouldn't it?
Read on, MacDuff!
Would there be a way to ensure that this had the same time complexity as indexing of sequences?
Depends on what you mean by "same". I wrote the following earlier, but apparently didn't send it, if I did, sorry for the duplication. It's been a while since I looked at the implementation, so I may have missed something. Caveat coder! Earlier I wrote:
The fact that [dicts are] ubiquitous and ordered in Python 3 makes "indexing into dicts" a problem to be solved, but not by itertools.first, IMO.
I imagine the following has been suggested already somewhere in these threads, but I don't recall it. To recap, now that dicts are ordered (since a few versions back in Python 3), the idea of indexing directly into dicts or dict views makes conceptual sense. The reason that this is not practically obvious to implement is that dicts and their views use an internal hash table whose values are indicies into an array of items (key-value pairs). The latter array is in insertion order[1], but the active ones are not necessarily contiguous. "Holes" happen only when a key is deleted, if I understand correctly. The implementation of __delitem__ replaces the index with a marker that is skipped in iteration. If the dict needs to be grown, only the active entries are rehashed, so compaction already happens already happens at that time. However, the table resize strategy is such that resizing takes place with geometrically increasing sizes, giving amortized O(1) for insertions. So the barrier to indexing into dicts is that in this implementation accessing and mutating dict items is O(1) (this is why dicts are attractive data structures), but adding the indexing capability requires that *some* fundamental operation become O(n). There are these possibilities: del could always compact the underlying data structure, the indexing operation could compact (and then be O(1) until the next del), or the indexing operation could just be linear search (and so O(n)). I would use del, because del is relatively infrequent in my experience, and indexing compacting as-needed would add overhead to all indexing and view operations (checking whether the underlying structure is currently compact). Linear search is just inefficient. On the other hand, if you are going to try to add this to the builtin dict, compacting as-needed in the indexing operation is the way to go. Unless del is frequent, this is O(n) with a small coefficient. It adds a tiny bit of overhead to del, but doesn't impact other core dict functionality at all to my knowledge. However, there are comments in the source about "shared tables", whose design and implementation I did not study. That might require some additional complexity, or perhaps shared tables would have to be omitted from the IndexableDict implementation. There are probably other complexities, too, but I'm pretty sure the basic strategy is sound. Since you do need to compact to turn this structure into a contiguous array, something needs to be O(n). Deriving such an IndexableDict from dict is conceptually simple: 1. Break out the __compact method (this needs to be done in C, of course). 2. Override __delitem__ with a version that calls __compact after calling super's __delitem__. 3. Document that del is O(n). 4. Add an index API. I don't think overriding __getitem__ and __setitem__ is a good idea, but YMMV -- there are already specialized implementations of dict that only accept strs as keys, so isinstance(key, int) would work for such applications. I'm pessimistic that it's worth putting such a class into the stdlib, but that depends on people coming forward with compelling use cases. I'm sure there will be extremely strong pushback on putting the feature into dict itself: dict accounts for a fraction of CPU cycles used by Python that's visible to the naked eye. Even additional complexity that might make it difficult to further optimize this desigh in the future would be opposed, I believe. Footnotes: [1] Note that a deleted key is not remembered. If you delete the first key of several, then reinsert it, it is now ordered last. AFAIK, this means that the dict semantics of IndexedDict are truly dict; only the performance is degraded.
On Sun, 10 Oct 2021 at 04:56, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Oct 08, 2021 at 01:42:35PM -0700, Christopher Barker wrote:
Anyway, I do see the benefit of adding first() to itertools -- there really is a key problem with:
next(iter(an_iterable))
for newbies -- you can get really really far in Python without ever having to call either next() or iter(). Sure, if it's a recipe, people can use it without really understanding it, but having an easy and obvious solution would be nice.
Yes? How does that make it a *problem*? I disagree strongly that needing to learn a simple, basic technique is a problem to be solved.
The real problem is the fact that it raises the wrong kind of exception in the degenerate case of an empty iterable: In [50]: next(iter([])) --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-50-bfed92c5b1cf> in <module> ----> 1 next(iter([])) A leaky StopIteration can wreak all sorts of havoc. There was a PEP that attempted to solve this by turning StopIteration into RuntimeError if it gets caught in a generator but that PEP (which was rushed through very quickly IIRC) missed the fact that generators are not the only iterators. It remains a problem that leaking a StopIteration into map, filter etc will terminate iteration of an outer loop. The culprit for the problem of leaking StopIteration is next itself which in the 1-arg form is only really suitable for use when implementing an iterator and not for the much more common case of simply wanting to extract something from an iterable. Numerous threads here and on stackoverflow and elsewhere suggesting that you can simply use next(iter(obj)) are encouraging bug magnet code. Worse, the bug when it arises will easily manifest in something like silent data loss and can be hard to debug. The correct usage of next/iter in most cases would be something like: try: val = next(iter(obj)) except StopIteration: raise AnotherError or perhaps val = next(iter(obj), None) if val is None: raise AnotherError The real advantage of providing first (or "take" or any of the other names that have been proposed in the past) is that it should raise a different exception like ValueError so that it would be safe to use by default. -- Oscar
On Tue, Oct 12, 2021 at 8:43 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
A leaky StopIteration can wreak all sorts of havoc. There was a PEP that attempted to solve this by turning StopIteration into RuntimeError if it gets caught in a generator but that PEP (which was rushed through very quickly IIRC) missed the fact that generators are not the only iterators. It remains a problem that leaking a StopIteration into map, filter etc will terminate iteration of an outer loop.
Generators are special because they never mention StopIteration. They are written like functions, but behave like iterators. That is why StopIteration leaking is such a problem. In every other situation, StopIteration is part of the API of what you're working with. It is a bug to call next() without checking for StopIteration (or knowingly and intentionally permitting it to bubble).
The culprit for the problem of leaking StopIteration is next itself which in the 1-arg form is only really suitable for use when implementing an iterator and not for the much more common case of simply wanting to extract something from an iterable. Numerous threads here and on stackoverflow and elsewhere suggesting that you can simply use next(iter(obj)) are encouraging bug magnet code. Worse, the bug when it arises will easily manifest in something like silent data loss and can be hard to debug.
That's no worse than getattr() and AttributeError. If you call getattr and you aren't checking for AttributeError, then you could be running into the exact same sorts of problems, because AttributeError is part of the function's API.
The correct usage of next/iter in most cases would be something like:
try: val = next(iter(obj)) except StopIteration: raise AnotherError
Yes. Or whatever other method you have for coping with the lack of a first element.
or perhaps
val = next(iter(obj), None) if val is None: raise AnotherError
Definitely not. The two-arg form is a short-hand for this: try: val = next(iter(obj)) except StopIteration: val = None If your except clause would simply set a default, use two-arg next. Otherwise, don't open yourself up to data-specific bugs.
The real advantage of providing first (or "take" or any of the other names that have been proposed in the past) is that it should raise a different exception like ValueError so that it would be safe to use by default.
ValueError is no safer. The first() function would have, as its API, "returns the first element or raises ValueError if there is none". So now the caller of first() has to use try/except to handle the case where there is no value. Failing to do so is *just as buggy* as leaking a StopIteration. A leaky StopIteration is a majorly confusing bug inside a __next__ function, because StopIteration is part of that function's API. A leaky KeyError is a majorly confusing bug inside a __getitem__ function, for the same reason. A leaky AttributeError inside a __getattr__ function, ditto. Anywhere else, those exceptions will all just bubble up normally, and most likely get printed to the console.
def leak(): return next(iter([])) # ooops ... for foo in leak(): print("Hello") ... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in leak StopIteration for foo in "test": print(leak()) ... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in leak StopIteration
They don't prematurely end the loop because they're not happening in places where that's the API you're working with. The only times you should need to think about StopIteration are calling next(), and implementing __next__. ChrisA
On Tue, 12 Oct 2021 at 11:48, Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Oct 12, 2021 at 8:43 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
A leaky StopIteration can wreak all sorts of havoc. There was a PEP that attempted to solve this by turning StopIteration into RuntimeError if it gets caught in a generator but that PEP (which was rushed through very quickly IIRC) missed the fact that generators are not the only iterators. It remains a problem that leaking a StopIteration into map, filter etc will terminate iteration of an outer loop.
Generators are special because they never mention StopIteration. They are written like functions, but behave like iterators. That is why StopIteration leaking is such a problem.
Generators are a common case and are important so the PEP definitely helps. It is incomplete though because the problem remains for other cases. StopIteration is rarely mentioned anywhere e.g. there is nothing about it in the docstring for map: https://docs.python.org/3/library/functions.html#map In every other situation, StopIteration is part of the API of what
you're working with. It is a bug to call next() without checking for StopIteration (or knowingly and intentionally permitting it to bubble).
Exactly: simple usage of next is often a bug. We need to be careful about this every time someone suggests that it's straight-forward to do next(iter(obj)).
The culprit for the problem of leaking StopIteration is next itself which in the 1-arg form is only really suitable for use when implementing an iterator and not for the much more common case of simply wanting to extract something from an iterable. Numerous threads here and on stackoverflow and elsewhere suggesting that you can simply use next(iter(obj)) are encouraging bug magnet code. Worse, the bug when it arises will easily manifest in something like silent data loss and can be hard to debug.
That's no worse than getattr() and AttributeError. If you call getattr and you aren't checking for AttributeError, then you could be running into the exact same sorts of problems, because AttributeError is part of the function's API.
The difference is that you usually don't try to catch AttributeError in a higher up frame. A function that leaks StopIteration is not iterator-safe and can not be used with functional iterator tools like map. The exact reason for the danger of bare next is not obvious even to experienced Python programmers. Before the discussions around the PEP I had pointed it out several times and saw experienced commenters on lists like this being confused about what exactly the problem was. Maybe I'm not good at explaining myself but if the problem was obvious then it shouldn't have needed careful explanation.
The real advantage of providing first (or "take" or any of the other names that have been proposed in the past) is that it should raise a different exception like ValueError so that it would be safe to use by default.
ValueError is no safer. The first() function would have, as its API, "returns the first element or raises ValueError if there is none". So now the caller of first() has to use try/except to handle the case where there is no value. Failing to do so is *just as buggy* as leaking a StopIteration.
A leaky StopIteration is a majorly confusing bug inside a __next__ function, because StopIteration is part of that function's API.
On the contrary: a __next__ function is the only place where it could possibly be valid to raise StopIteration. The fact that next raises StopIteration which passes through to the caller can be useful in this situation and this situation alone: https://github.com/python/cpython/blob/b37dc9b3bc9575adc039c6093c643b7ae5e91... In any other situation it would be better to call first() and have something like ValueError instead. Oscar
On Tue, Oct 12, 2021 at 10:24 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Tue, 12 Oct 2021 at 11:48, Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Oct 12, 2021 at 8:43 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
A leaky StopIteration can wreak all sorts of havoc. There was a PEP that attempted to solve this by turning StopIteration into RuntimeError if it gets caught in a generator but that PEP (which was rushed through very quickly IIRC) missed the fact that generators are not the only iterators. It remains a problem that leaking a StopIteration into map, filter etc will terminate iteration of an outer loop.
Generators are special because they never mention StopIteration. They are written like functions, but behave like iterators. That is why StopIteration leaking is such a problem.
Generators are a common case and are important so the PEP definitely helps. It is incomplete though because the problem remains for other cases. StopIteration is rarely mentioned anywhere e.g. there is nothing about it in the docstring for map: https://docs.python.org/3/library/functions.html#map
If you want to report it as a bug in map(), feel free to do so. It's not a general issue to be solved. I would say that this version of map() is naive, and that version is safe: class map_naive: def __init__(self, func, it): self.func = func; self.it = iter(it) def __iter__(self): return self def __next__(self): return self.func(next(self.it)) class map_safe: def __init__(self, func, it): self.func = func; self.it = iter(it) def __iter__(self): return self def __next__(self): value = next(self.it) try: return self.func(value) except StopIteration: raise ValueError("StopIteration raised by map function") def map_alsosafe(func, it): for value in it: yield func(value) The distinction between naive and safe is *inside the definition of __next__*, and nowhere else. The fault isn't in the function that you pass to map, any more than having it raise AttributeError would be a fault. The reason generators are special is that, despite not having __next__ visible anywhere, they still have that same consideration. That's why they automatically transform StopIterations.
In every other situation, StopIteration is part of the API of what you're working with. It is a bug to call next() without checking for StopIteration (or knowingly and intentionally permitting it to bubble).
Exactly: simple usage of next is often a bug. We need to be careful about this every time someone suggests that it's straight-forward to do next(iter(obj)).
Yes, but "give me the first entry" is underspecified anyway. What SHOULD happen if there is no first entry? Is ValueError particularly different? If you do the naive thing and leak StopIteration, most likely it'll end up on the console.
The culprit for the problem of leaking StopIteration is next itself which in the 1-arg form is only really suitable for use when implementing an iterator and not for the much more common case of simply wanting to extract something from an iterable. Numerous threads here and on stackoverflow and elsewhere suggesting that you can simply use next(iter(obj)) are encouraging bug magnet code. Worse, the bug when it arises will easily manifest in something like silent data loss and can be hard to debug.
That's no worse than getattr() and AttributeError. If you call getattr and you aren't checking for AttributeError, then you could be running into the exact same sorts of problems, because AttributeError is part of the function's API.
The difference is that you usually don't try to catch AttributeError in a higher up frame. A function that leaks StopIteration is not iterator-safe and can not be used with functional iterator tools like map. The exact reason for the danger of bare next is not obvious even to experienced Python programmers. Before the discussions around the PEP I had pointed it out several times and saw experienced commenters on lists like this being confused about what exactly the problem was. Maybe I'm not good at explaining myself but if the problem was obvious then it shouldn't have needed careful explanation.
Nor do you usually catch StopIteration. There are very very few cases where a StopIteration will silently truncate something, and they are all cases where the function should probably be changed. In user code, it's the rule of thumb that I described: be aware of StopIteration when writing __next__ or calling next(), otherwise it shouldn't be a problem. The problem is most definitely NOT obvious, because most situations are simply *not a problem*, and most of the ones that ARE a problem would still be just as much of a problem with any other exception.
The real advantage of providing first (or "take" or any of the other names that have been proposed in the past) is that it should raise a different exception like ValueError so that it would be safe to use by default.
ValueError is no safer. The first() function would have, as its API, "returns the first element or raises ValueError if there is none". So now the caller of first() has to use try/except to handle the case where there is no value. Failing to do so is *just as buggy* as leaking a StopIteration.
A leaky StopIteration is a majorly confusing bug inside a __next__ function, because StopIteration is part of that function's API.
On the contrary: a __next__ function is the only place where it could possibly be valid to raise StopIteration. The fact that next raises StopIteration which passes through to the caller can be useful in this situation and this situation alone: https://github.com/python/cpython/blob/b37dc9b3bc9575adc039c6093c643b7ae5e91...
In any other situation it would be better to call first() and have something like ValueError instead.
Yes, but that's an example of __next__ specifically chaining to next() - exactly like defining __getattr__ to look for an attribute of something else (maybe you're writing a proxy of some sort). You expect that a bubbling-up exception is fundamentally equivalent to one you raise yourself. Please give a real example of where calling first() and getting ValueError is safer than calling next(iter(x)) and getting StopIteration. So far, I am undeterred in believing that the two exceptions have equivalent effect if the caller isn't expecting them. ChrisA
Shouldn't your safe_map raise RuntimeError rather than ValueError? That's what PEP 479 does *wink* https://www.python.org/dev/peps/pep-0479/
On Tue, Oct 12, 2021 at 11:47 PM Steven D'Aprano <steve@pearwood.info> wrote:
Shouldn't your safe_map raise RuntimeError rather than ValueError? That's what PEP 479 does *wink*
If I'm explicitly choosing the exception to raise, ValueError seems better, although I'd also see TypeError as plausible. But the point isn't *which* exception is raised; the point is that it won't simply early-abort, as the naive one does. I never intended to replicate PEP 479 semantics here, just the equivalent level of safety against bizarre behaviours. Thanks for nitpicking, though. In return, I'll point out that I never actually created anything called safe_map :) ChrisA
On Tue, Oct 12, 2021 at 4:51 AM Chris Angelico <rosuav@gmail.com> wrote:
Exactly: simple usage of next is often a bug. We need to be careful about this every time someone suggests that it's straight-forward to do next(iter(obj)).
<snip>
Please give a real example of where calling first() and getting ValueError is safer than calling next(iter(x)) and getting StopIteration. So far, I am undeterred in believing that the two exceptions have equivalent effect if the caller isn't expecting them.
I don't know about safer, but it is a clear example of why using next(iter(obj)) requires a pretty complete knowledge of the iteration protocol. I can guarantee you I'd get some questions from my students when they got a StopIterationError! If one DID write a first() function, it maybe or maybe not should raise a different exception, but it should certainly provide a better error message:
next(iter([])) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
Is not very helpful. -CHB
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2AL5FE... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Wed, Oct 13, 2021 at 2:39 AM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Oct 12, 2021 at 4:51 AM Chris Angelico <rosuav@gmail.com> wrote:
Exactly: simple usage of next is often a bug. We need to be careful about this every time someone suggests that it's straight-forward to do next(iter(obj)).
<snip>
Please give a real example of where calling first() and getting ValueError is safer than calling next(iter(x)) and getting StopIteration. So far, I am undeterred in believing that the two exceptions have equivalent effect if the caller isn't expecting them.
I don't know about safer, but it is a clear example of why using next(iter(obj)) requires a pretty complete knowledge of the iteration protocol.
I can guarantee you I'd get some questions from my students when they got a StopIterationError!
If one DID write a first() function, it maybe or maybe not should raise a different exception, but it should certainly provide a better error message:
next(iter([])) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
Is not very helpful.
If they're actually writing it explicitly, like that, then this is a perfect opportunity to teach the meanings of iterators and next(). If it's buried behind a first() function, it would simply be an error, just like [][0] is an error. Either way, there's no real difference between them, and asking "what is the first element of an empty collection?" is always going to result in an error. (Unless you are JavaScript, in which case the answer is "undefined, of course".) ChrisA
If one DID write a first() function, it maybe or maybe not should raise a different exception, but it should certainly provide a better error message
For reference, the more-itertools package on PyPI has `first()` and `last()` functions: https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more.... (Not endorsing details of their implementations -- just providing a reference to a piece of prior art.) On Tue, Oct 12, 2021 at 4:41 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Oct 12, 2021 at 4:51 AM Chris Angelico <rosuav@gmail.com> wrote:
Exactly: simple usage of next is often a bug. We need to be careful about this every time someone suggests that it's straight-forward to do next(iter(obj)).
<snip>
Please give a real example of where calling first() and getting ValueError is safer than calling next(iter(x)) and getting StopIteration. So far, I am undeterred in believing that the two exceptions have equivalent effect if the caller isn't expecting them.
I don't know about safer, but it is a clear example of why using next(iter(obj)) requires a pretty complete knowledge of the iteration protocol.
I can guarantee you I'd get some questions from my students when they got a StopIterationError!
If one DID write a first() function, it maybe or maybe not should raise a different exception, but it should certainly provide a better error message:
next(iter([])) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
Is not very helpful.
-CHB
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2AL5FE... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VQFAQM... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, 12 Oct 2021 at 12:50, Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Oct 12, 2021 at 10:24 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Tue, 12 Oct 2021 at 11:48, Chris Angelico <rosuav@gmail.com> wrote:
ValueError is no safer. The first() function would have, as its API, "returns the first element or raises ValueError if there is none". So now the caller of first() has to use try/except to handle the case where there is no value. Failing to do so is *just as buggy* as leaking a StopIteration.
A leaky StopIteration is a majorly confusing bug inside a __next__ function, because StopIteration is part of that function's API.
On the contrary: a __next__ function is the only place where it could possibly be valid to raise StopIteration. The fact that next raises StopIteration which passes through to the caller can be useful in this situation and this situation alone: https://github.com/python/cpython/blob/b37dc9b3bc9575adc039c6093c643b7ae5e91...
In any other situation it would be better to call first() and have something like ValueError instead.
Yes, but that's an example of __next__ specifically chaining to next() - exactly like defining __getattr__ to look for an attribute of something else (maybe you're writing a proxy of some sort). You expect that a bubbling-up exception is fundamentally equivalent to one you raise yourself.
Please give a real example of where calling first() and getting ValueError is safer than calling next(iter(x)) and getting StopIteration. So far, I am undeterred in believing that the two exceptions have equivalent effect if the caller isn't expecting them.
I think that the situation where I first came across this was something analogous to wanting to separate the header line of a CSV file: csvfiles = [ ['name', 'joe', 'dave'], ['name', 'steve', 'chris'], [], # whoops, empty csv file ['name', 'oscar'], ] def remove_header(csvfile): it = iter(csvfile) next(it) return it # print all names from all csv files for names in map(remove_header, csvfiles): for name in names: print(name) If you run the above you get $ python t.py joe dave steve chris The data following the empty file (i.e. "oscar") was silently discarded. The context where I found this was something that took much longer to run and was harder to check and debug etc. I have not personally made the same mistake again because I have since been automatically wary of any usage of next(). I couldn't possibly count the number of times I've seen unsafe usage of next in code suggestions on mailing lists like this though (see all the examples above in this thread). The problem is that the erroneous case which is the empty file leads to a StopIteration. Unlike a normal exception though, a StopIteration can be caught without try/except. In the above it is the *for-loop* that swallows the exception. Had it been literally any other exception type then you would have been looking at a Traceback instead of silently discarded data: $ python t.py joe dave steve chris Traceback (most recent call last): File "t.py", line 21, in <module> for names in map(remove_header, csvfiles): File "t.py", line 18, in remove_header first(it) File "t.py", line 14, in first raise ValueError from None ValueError Your suggestion is that this is a bug in map() which is a fair alternative view. Following through to its conclusion your suggestion is that every possible function like map, filter, and all the iterator implementations in itertools and in the wild should carefully wrap any internal non-next function call in try/except to change any potential StopIteration into a different exception type. My view is that it would be better to have a basic primitive for getting an element from an iterable or for advancing an iterator that does not raise StopIteration in the first place. I would probably call that function something like "take" rather than "first" though. The reason I prefer introducing an alternative to next() is because I think that if both primitives were available then in the majority of situations next() would not be the preferred option. -- Oscar
On Thu, Oct 14, 2021 at 1:36 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Your suggestion is that this is a bug in map() which is a fair alternative view. Following through to its conclusion your suggestion is that every possible function like map, filter, and all the iterator implementations in itertools and in the wild should carefully wrap any internal non-next function call in try/except to change any potential StopIteration into a different exception type.
Yes, because it is the map function that is leaking StopIteration.
My view is that it would be better to have a basic primitive for getting an element from an iterable or for advancing an iterator that does not raise StopIteration in the first place. I would probably call that function something like "take" rather than "first" though. The reason I prefer introducing an alternative to next() is because I think that if both primitives were available then in the majority of situations next() would not be the preferred option.
How will that solve anything though? You still need a way to advance an iterator and get a value from it, or get told that there is no such value. No matter what exception you choose, it will ALWAYS be possible for the same problem to occur. Exceptions like ValueError will, instead of early-aborting a map(), cause something to mistakenly think that it couldn't parse a number, or something like that. Try replacing your map() with one of my safer versions. (Or, of course, replacing your remove_header with a version that isn't itself buggy.) The problem will disappear. ChrisA
On Wed, 13 Oct 2021 at 18:30, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 14, 2021 at 1:36 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Your suggestion is that this is a bug in map() which is a fair alternative view. Following through to its conclusion your suggestion is that every possible function like map, filter, and all the iterator implementations in itertools and in the wild should carefully wrap any internal non-next function call in try/except to change any potential StopIteration into a different exception type.
Yes, because it is the map function that is leaking StopIteration.
But it is not the map function that *raises* StopIteration. The exception "leaks" through map just like *all* exceptions "leak" through *all* Python functions in the absence of try/except. This is not normally referred to as "leaking" but rather as "propagating" and it is precisely the design of exceptions that they should propagate to the calling frame. The difference in the case of StopIteration is that it can be caught even if there is no try/except.
My view is that it would be better to have a basic primitive for getting an element from an iterable or for advancing an iterator that does not raise StopIteration in the first place. I would probably call that function something like "take" rather than "first" though. The reason I prefer introducing an alternative to next() is because I think that if both primitives were available then in the majority of situations next() would not be the preferred option.
How will that solve anything though? You still need a way to advance an iterator and get a value from it, or get told that there is no such value. No matter what exception you choose, it will ALWAYS be possible for the same problem to occur. Exceptions like ValueError will, instead of early-aborting a map(), cause something to mistakenly think that it couldn't parse a number, or something like that.
I find it surreal that I am arguing that StopIteration is a uniquely problematic exception and that you seem to be arguing that it is not. Yet at the same time you are an author of a (successful!) PEP that was *entirely* about this very subject: https://www.python.org/dev/peps/pep-0479/ The first two paragraphs of the rationale from the PEP: """ The interaction of generators and StopIteration is currently somewhat surprising, and can conceal obscure bugs. An unexpected exception should not result in subtly altered behaviour, but should cause a noisy and easily-debugged traceback. Currently, StopIteration raised accidentally inside a generator function will be interpreted as the end of the iteration by the loop construct driving the generator. The main goal of the proposal is to ease debugging in the situation where an unguarded next() call (perhaps several stack frames deep) raises StopIteration and causes the iteration controlled by the generator to terminate silently. (Whereas, when some other exception is raised, a traceback is printed pinpointing the cause of the problem.) """ I agree entirely with the above but every occurence of "generators" should have been generalised to "iterators" in order to address the problem fully. You think this should be fixed in map. I think that the root of the problem is next. The PEP discusses changing next: https://www.python.org/dev/peps/pep-0479/#converting-the-exception-inside-ne... The idea was rejected on backward compatibility grounds: I am proposing that an alternative function could be added which unlike changing next would not cause compatibility problems. Although we may disagree about what is the best way to fix this I don't see how we can disagree that StopIteration is a uniquely problematic exception to raise (as you seem to argue above). -- Oscar
On Thu, Oct 14, 2021 at 8:04 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Wed, 13 Oct 2021 at 18:30, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 14, 2021 at 1:36 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Your suggestion is that this is a bug in map() which is a fair alternative view. Following through to its conclusion your suggestion is that every possible function like map, filter, and all the iterator implementations in itertools and in the wild should carefully wrap any internal non-next function call in try/except to change any potential StopIteration into a different exception type.
Yes, because it is the map function that is leaking StopIteration.
But it is not the map function that *raises* StopIteration. The exception "leaks" through map just like *all* exceptions "leak" through *all* Python functions in the absence of try/except. This is not normally referred to as "leaking" but rather as "propagating" and it is precisely the design of exceptions that they should propagate to the calling frame. The difference in the case of StopIteration is that it can be caught even if there is no try/except.
Wrong. You still won't catch StopIteration unless it is in one very specific place: a __next__ function. Exactly the same as AttributeError can be silently caught inside a __getattr__ function. See my earlier post for full context, but the problem is right here: def __next__(self): return self.func(next(self.it)) The problem isn't the transformation function; the problem is that __next__ is putting two completely different concepts (pumping the iterator, and calling the transformation function) inside, effectively, the same exception handling context.
My view is that it would be better to have a basic primitive for getting an element from an iterable or for advancing an iterator that does not raise StopIteration in the first place. I would probably call that function something like "take" rather than "first" though. The reason I prefer introducing an alternative to next() is because I think that if both primitives were available then in the majority of situations next() would not be the preferred option.
How will that solve anything though? You still need a way to advance an iterator and get a value from it, or get told that there is no such value. No matter what exception you choose, it will ALWAYS be possible for the same problem to occur. Exceptions like ValueError will, instead of early-aborting a map(), cause something to mistakenly think that it couldn't parse a number, or something like that.
I find it surreal that I am arguing that StopIteration is a uniquely problematic exception and that you seem to be arguing that it is not. Yet at the same time you are an author of a (successful!) PEP that was *entirely* about this very subject: https://www.python.org/dev/peps/pep-0479/
I find it surreal that people keep holding up PEP 479, disagreeing with the document's wording, and assuming that I believe the altered wording. I don't.
The first two paragraphs of the rationale from the PEP: """ The interaction of generators and StopIteration is currently somewhat surprising, and can conceal obscure bugs. An unexpected exception should not result in subtly altered behaviour, but should cause a noisy and easily-debugged traceback. Currently, StopIteration raised accidentally inside a generator function will be interpreted as the end of the iteration by the loop construct driving the generator.
The main goal of the proposal is to ease debugging in the situation where an unguarded next() call (perhaps several stack frames deep) raises StopIteration and causes the iteration controlled by the generator to terminate silently. (Whereas, when some other exception is raised, a traceback is printed pinpointing the cause of the problem.) """ I agree entirely with the above but every occurence of "generators" should have been generalised to "iterators" in order to address the problem fully.
No! Generators *ARE* special, because they don't have that same concept. You can write map safely like this: def map(func, iterable): for value in iterable: yield func(value) Since there's no call to next(), there's no expectation of StopIteration. Since there's no __next__ function being defined, there's no expectation that StopIteration has meaning. That's why generators are different.
You think this should be fixed in map. I think that the root of the problem is next. The PEP discusses changing next: https://www.python.org/dev/peps/pep-0479/#converting-the-exception-inside-ne... The idea was rejected on backward compatibility grounds: I am proposing that an alternative function could be added which unlike changing next would not cause compatibility problems.
Although we may disagree about what is the best way to fix this I don't see how we can disagree that StopIteration is a uniquely problematic exception to raise (as you seem to argue above).
Where do I argue that StopIteration is unique? It was unique pre-479 only in the odd interaction with generators. It is now exactly like every other exception that is used in a protocol: a signal that there is no value to be returned. In fact, NotImplemented is more special - it would be more consistent to have __add__ raise a special exception than to return a magical value. StopIteration, AttributeError, LookupError, etc, are all used the same way. ChrisA
- dict.first_key = lambda self: next(iter(self)) - dict.first_val = lambda self: next(iter(self.values())) - dict.first_item = lambda self: next(iter(self.items())) - dict.last_key = lambda self: next(reversed(self)) - dict.last_val = lambda self: next(reversed(self.values())) - dict.last_item = lambda self: next(reversed(self.items()))
But I think I like a lot more the idea of adding general ways of doing these things to itertools.
Except many iterables don’t have a last item. And many more can’t give you the last item efficiently. -CHB On 5 Oct 2021, at 05:30, Christopher Barker <pythonchb@gmail.com> wrote:
On Mon, Oct 4, 2021 at 5:46 PM Erik Demaine <edemaine@mit.edu> wrote:
Have folks thought about allowing indexing dictionary views as in the following code, where d is a dict object?
d.keys()[0] d.keys()[-1] d.values()[0] d.values()[-1] d.items()[0] d.items()[-1] # item that would be returned by d.popitem()
since dicts were made order-preserving, indexing the keys, items, etc does make some sense.
I've also often wanted to get an arbitrary item/key from a dictionary, and
This is indeed one of the use cases identified.
I found some related discussion in
https://mail.python.org/archives/list/python-ideas@python.org/thread/QVTGZD6... but not this exact idea.
That's a pretty different idea but this exact idea has been discussed on this list relatively recently. I still like it, but there wan't much general support.
I'll leave it exercise for the read to find that thead, but it is there, and I suggest you look for it if you want to further pursue this idea.
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RAEDZP...
Code of Conduct: http://python.org/psf/codeofconduct/
--
Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Except many iterables don’t have a last item. And many more can’t give you the last item efficiently.
That's manageable - reversed won't work either unless the object either implements either __reversed__, or __len__ and __getitem__. last could simply fail under the same conditions, in which case you could use last(list(obj)). I think first, second and last, with optional default parameters, would be great additions to itertools. I use the toolz library's first and last functions frequently. The latter is not "smart" in the way described above and just consumes the iterable, possibly indefinitely. Alex On Wed, Oct 6, 2021 at 10:46 AM Christopher Barker <pythonchb@gmail.com> wrote:
- dict.first_key = lambda self: next(iter(self)) - dict.first_val = lambda self: next(iter(self.values())) - dict.first_item = lambda self: next(iter(self.items())) - dict.last_key = lambda self: next(reversed(self)) - dict.last_val = lambda self: next(reversed(self.values())) - dict.last_item = lambda self: next(reversed(self.items()))
But I think I like a lot more the idea of adding general ways of doing these things to itertools.
Except many iterables don’t have a last item. And many more can’t give you the last item efficiently.
-CHB
On 5 Oct 2021, at 05:30, Christopher Barker <pythonchb@gmail.com> wrote:
On Mon, Oct 4, 2021 at 5:46 PM Erik Demaine <edemaine@mit.edu> wrote:
Have folks thought about allowing indexing dictionary views as in the following code, where d is a dict object?
d.keys()[0] d.keys()[-1] d.values()[0] d.values()[-1] d.items()[0] d.items()[-1] # item that would be returned by d.popitem()
since dicts were made order-preserving, indexing the keys, items, etc does make some sense.
I've also often wanted to get an arbitrary item/key from a dictionary, and
This is indeed one of the use cases identified.
I found some related discussion in
https://mail.python.org/archives/list/python-ideas@python.org/thread/QVTGZD6... but not this exact idea.
That's a pretty different idea but this exact idea has been discussed on this list relatively recently. I still like it, but there wan't much general support.
I'll leave it exercise for the read to find that thead, but it is there, and I suggest you look for it if you want to further pursue this idea.
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RAEDZP...
Code of Conduct: http://python.org/psf/codeofconduct/
--
Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/THRXG2... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Oct 05, 2021 at 08:45:55AM +0100, Alex Waygood wrote:
I think there definitely should be a more obvious way to do this (specifically the first and last keys/values/items of a dictionary
What's your use-case for caring what the first and last key in a dict is?
An anti-pattern you see quite often on Stack Overflow to get the first key of a dictionary is something like the following:
first_key = list(mydict.keys())[0]
Example number 9758 of why not to trust everything you see on Stackoverflow :-)
Another possibility I've been wondering about was whether several methods should be added to the dict interface:
dict.first_key = lambda self: next(iter(self)) dict.first_val = lambda self: next(iter(self.values())) dict.first_item = lambda self: next(iter(self.items())) dict.last_key = lambda self: next(reversed(self)) dict.last_val = lambda self: next(reversed(self.values())) dict.last_item = lambda self: next(reversed(self.items()))
Not every *one* line function needs to be a builtin.
But I think I like a lot more the idea of adding general ways of doing these things to itertools.
How about some recipes? `next(iter(mydict))` etc is a simple, easy, memorable, readable, maintainable way to get what you want. Composition of simple operations is great! Not everything needs to be a named function: def addone(x): """Return x + 1. >>> addone(32) 33 """ return x + 1 -- Steve
There seems to be a growing list of issues with adding `itertools.first(x)` as shorthand for `next(iter(x))`: * If `x` is an iterator, it modifies the iterator, which is counterintuitive from the name `first`. * It'll still be difficult for new users to find/figure out. In the end, I feel like the main case I want to use a `first` and `last` functions on are `dict`s; other objects like `range`, `str`, `list`, `tuple` all support `[0]` and `[-1]`. So I wonder whether we should go back to this idea: On Tue, 5 Oct 2021, Alex Waygood wrote:
[...] Another possibility I've been wondering about was whether several methods should be added to the dict interface: * dict.first_key = lambda self: next(iter(self)) * dict.first_val = lambda self: next(iter(self.values())) * dict.first_item = lambda self: next(iter(self.items())) * dict.last_key = lambda self: next(reversed(self)) * dict.last_val = lambda self: next(reversed(self.values())) * dict.last_item = lambda self: next(reversed(self.items())) But I think I like a lot more the idea of adding general ways of doing these things to itertools.
At the least, I wonder whether a `dict.lastitem` method that's the nondestructive analog of `dict.popitem` would be good to add. This would solve the case of "I want an arbitrary item from this dict, I don't care which one, but I don't want to modify the dict so I'd rather not use popitem" which I've seen repeated a few times in this thread. By contrast, I don't think `next(iter(my_dict))` is an intuitive way to solve this problem, even for many experts; and I don't think it's as efficient as `my_dict.lastitem()` would be, because the current `dict` code maintains a pointer to the last item but not to the first item. [I also admit that I've mostly forgotten the original situation where I wanted this functionality. I believe it was an exhaustive search, where I wanted to branch on an arbitrary item of a dict, and nondestructively build new versions of that dict for recursive calls (instead of modifying before recursion and unmodifying afterward).] One more idea to throw around: Consider the following "anonymous unpacking" syntax. ``` first, * = [1, 2, 3] *, last = [1, 2, 3] ``` For someone used to unpacking syntax, this seems like a natural extension to what we have now, and is far more flexible than just extracting the first element. The distinction from the existing methods (with e.g. `*_`) is that it wouldn't waste time extracting elements you don't want. And it could work well with things like `dict` (and `dict_items` etc.). Erik -- Erik Demaine | edemaine@mit.edu | http://erikdemaine.org/
On Mon, Oct 11, 2021 at 06:59:14PM -0400, Erik Demaine wrote:
In the end, I feel like the main case I want to use a `first` and `last` functions on are `dict`s;
"I want to use first on dicts" is not really a use-case. Presumably you're not just doing: d = {key: value, ...} who_cares = first(d) del who_cares process(d) I assume you're not just extracting the first value for the LOLs, you must have some reason for it. It is *that reason* which counts as a use-case. I think the fact that it seems hard to get a really compelling use-case that isn't extremely niche does suggest that this doesn't need to be a named function. We don't bloat classes and the builtins with named trivial one-liners unless they are frequently useful, fundamental and really compelling: float.add_one() # return float + 1.0 float.double() # return float*2.0 list.length_equals_one() # return len(list) == 1 I think that even if there are occassional uses for first as an alias for next(iter(dict)), it fails to be useful *enough*, fundamental or compelling to justify bloating the API with such a simple one-liner. If people disagree, you can argue by demonstrating good use-cases (hopefully *common* use-cases), or by demonstrating that other languages provide this functionality. As for the argument that the idiom `next(iter(obj))` is "not intuitive", true. Neither is `seq[0]`, or with statements, or range, or classes, or importing, or async, or comprehensions, or regular expressions, or pretty much everything else in Python. And yet somehow we cope. It is okay to learn how do something. -- Steve
Please do go back and read that previous thread, there was a LOT of discussion and much detail. "I want to use first on dicts" is not really a use-case. quite true -- what IS the use case(s) here? But honestly, I'm even more confused by the desire for the last item -- what's the use case for that? I can see a use case for "arbitrary item", for which the first would work. There is also the use for a random item, and random.choice does not work with Mappings, but would work if we made dict.items() indexable. But there doesn't seem to be a lot of clamoring for that. And if there were, maybe adding it as a feature to random.choice would be the way to go. NOTE: A key issue with why indexable dict,items() could be an attractive nuisance if one were to index into it many times: IF you want one first, last, or random item, then having it built in would be the fastest way to go. but if you wanted multiple random or other indexed items, then making a list out of it ( O(n) ), and then indexing ( O(1) ) many times would be more efficient. So saying "the way to index the contents of a dict is to index: list(dict.items()) is good advice. As for a first in itertools, this is running into the way Python provides (almost) the same interface to iterables as iterators -- first() makes some sense for iterables, but none for iterators, so confusion is likely. We don't bloat classes and the builtins with named
trivial one-liners unless they are frequently useful, fundamental and really compelling:
indeed. float.add_one() # return float + 1.0
float.double() # return float*2.0 list.length_equals_one() # return len(list) == 1
Come on, those examples are pretty disingenuous. They are analogous to adding first() to the Sequence ABC, when we have [0] -- which the OP has clearly said is unnecessary. As for the argument that the idiom `next(iter(obj))` is "not intuitive",
true. Neither is `seq[0]`, or with statements, or range, or classes, or importing, or async, or comprehensions, or regular expressions, or pretty much everything else in Python.
"intuitive" is impossible to pin down -- but using next(iter(obj)) is a advanced topic (or intermediate anyway) -- you can get a LOT done in Python without ever using next() or iter() -- they are far more often called implicitly (via for loops, making lists, etc). I've said it before, "getting a random (or even arbitrary) item from a dict" is a rarity in Python, there is not an easy and obvious way to it (without knowing the iteration protocol). Whether that's something people need to do often enough to justify a new function is another story, but it is NOT as trivial as making a function that adds one to something. -CHB
And yet somehow we cope.
It is okay to learn how do something.
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SAZLCI... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Mon, Oct 11, 2021 at 8:08 PM Steven D'Aprano <steve@pearwood.info> wrote:
I assume you're not just extracting the first value for the LOLs, you must have some reason for it. It is *that reason* which counts as a use-case.
I dug through our application code base (500k lines) and found just one use case of the first/last pattern, so my opinion is that it's quite rare. This case is building the GUI labels for an undo list, and specializes the text of the label when the dict is exactly one long. Looks something like this: # 'undos' is a dict of attribute name:value pairs if len(undos) == 1: undo_label = 'Set %s to %s' % list(undos.items())[0] else: undo_label = 'Modify blah blah blah' ... # Use 'undos' to do the database mods and so on...
So I implemented these functions as operators in a downloaded source of CPython... the differences are insane! (Sorry if this produces nested quotes)
import timeit # d + 1 vs list(d.values())[0]: 2133x speedup timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d + 1"]) 2000000 loops, best of 5: 165 nsec per loop timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.values())[0]"]) 1000 loops, best of 5: 352 usec per loop # d - 1 vs list(d.values())[-1]: 2017x speedup timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d - 1"]) 2000000 loops, best of 5: 168 nsec per loop timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.values())[-1]"]) 1000 loops, best of 5: 354 usec per loop # d * 1 vs list(d.keys())[0]: 3663x speedup timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d * 1"]) 2000000 loops, best of 5: 166 nsec per loop timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.keys())[0]"]) 1000 loops, best of 5: 608 usec per loop # d / 1 vs list(d.keys())[-1]: 2163x speedup timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d / 1"]) 2000000 loops, best of 5: 166 nsec per loop timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.keys())[-1]"]) 1000 loops, best of 5: 359 usec per loop # d >> 1 vs list(d.items())[0]: 15302x speedup timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d >> 1"]) 1000000 loops, best of 5: 281 nsec per loop timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.items())[0]"]) 50 loops, best of 5: 4.3 msec per loop # d << 1 vs list(d.items())[-1]: 15357x speedup timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d << 1"]) 1000000 loops, best of 5: 280 nsec per loop timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.items())[-1]"]) 50 loops, best of 5: 4.3 msec per loop
On Thu, Oct 14, 2021 at 7:37 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
So I implemented these functions as operators in a downloaded source of CPython... the differences are insane! (Sorry if this produces nested quotes)
import timeit # d + 1 vs list(d.values())[0]: 2133x speedup timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d + 1"]) 2000000 loops, best of 5: 165 nsec per loop timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.values())[0]"]) 1000 loops, best of 5: 352 usec per loop
Insane speedup but what does it prove? That if you handcraft C code to do the precise and narrow feature you need, it's faster than a naive listification? Try the actually-recommended way: next(iter(d)). See how much difference you get. Not having your hacked-on interpreter I can't fully test this, but in my test of vanilla, d[next(iter(d))] was three orders of magnitude faster than list(d.values())[0] - bringing it within cooee of your "d + 1" version. Neither form copes with the possibility of an empty dictionary, so for proper comparisons, you'd need to try/except it; although recent CPythons have very very low cost to a try block that doesn't end up catching anything, so it probably won't make a material difference. You've created some *incredibly* arbitrary operators which have no justification other than "this operator isn't being used". Even if you *do* manage to see a two thousand to one speedup, that is a microoptimization that comes at an extremely steep readability penalty. In no way does this make mathematical sense as "adding 1 to the dictionary", nor does it parallel the way that other types behave, nor is it internally consistent (multiplying isn't repeated addition), nor is it particularly apt use of a symbol (like Path/str). And I suspect that your speedups are really in the order of 2:1, not 2000:1. ChrisA
Results are in (tested `next(iter(d))`/`next(iter(d.values())`/`next(iter(d.items())` and their `next(reverse())` counterparts): `*` / `/` implemented is 2x faster than `next(iter(d))`/`next(reversed(d))` `+` / `-` implemented is approximately 3x faster than `next(iter(d.values()))`/`next(reversed(d.values()))` `<<` / `>>` implemented is at least 4x faster than `next(iter(d.items()))`/`next(reversed(d.items()))`
On Thu, Oct 14, 2021 at 11:03 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
Results are in (tested `next(iter(d))`/`next(iter(d.values())`/`next(iter(d.items())` and their `next(reverse())` counterparts): `*` / `/` implemented is 2x faster than `next(iter(d))`/`next(reversed(d))` `+` / `-` implemented is approximately 3x faster than `next(iter(d.values()))`/`next(reversed(d.values()))` `<<` / `>>` implemented is at least 4x faster than `next(iter(d.items()))`/`next(reversed(d.items()))`
So, negligible benefits. Thanks for checking. ChrisA
On Thu, Oct 14, 2021 at 11:15:52PM +1100, Chris Angelico wrote:
On Thu, Oct 14, 2021 at 11:03 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
Results are in (tested `next(iter(d))`/`next(iter(d.values())`/`next(iter(d.items())` and their `next(reverse())` counterparts): `*` / `/` implemented is 2x faster than `next(iter(d))`/`next(reversed(d))` `+` / `-` implemented is approximately 3x faster than `next(iter(d.values()))`/`next(reversed(d.values()))` `<<` / `>>` implemented is at least 4x faster than `next(iter(d.items()))`/`next(reversed(d.items()))`
So, negligible benefits. Thanks for checking.
Be fair Chris :-) A 2x or 4x speed-up (even of a micro-benchmark) is not negligible. If someone managed a mere 20% or 30% speedup to next(), we would probably be more than happy to take it. A better way to put this is that while the speed benefit to one uncommon task is non-negligible, the cost to readability and comprehensibility is horrendous. This is premature optimization: there's no evidence that getting the first element of a dict is a common operation, let alone a bottleneck that needs optimising. Ultimately, for every millisecond in program runtime saved by using obscure operators for uncommon operations on dicts, we would probably cost a dozen programmers five or ten minutes in confusion while they try to decipher what on earth `mydict << 1` means. There are programming languages designed to be terse and even deliberately obfuscated, especially code-golfing languages. I'm glad Python is not one of them :-) -- Steve
On Fri, Oct 15, 2021 at 12:16 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Oct 14, 2021 at 11:15:52PM +1100, Chris Angelico wrote:
On Thu, Oct 14, 2021 at 11:03 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
Results are in (tested `next(iter(d))`/`next(iter(d.values())`/`next(iter(d.items())` and their `next(reverse())` counterparts): `*` / `/` implemented is 2x faster than `next(iter(d))`/`next(reversed(d))` `+` / `-` implemented is approximately 3x faster than `next(iter(d.values()))`/`next(reversed(d.values()))` `<<` / `>>` implemented is at least 4x faster than `next(iter(d.items()))`/`next(reversed(d.items()))`
So, negligible benefits. Thanks for checking.
Be fair Chris :-)
A 2x or 4x speed-up (even of a micro-benchmark) is not negligible. If someone managed a mere 20% or 30% speedup to next(), we would probably be more than happy to take it.
Okay, lemme rephrase. Relatively insignificant benefits, considering that (as I said in the preceding post, which was based on approximations rather than measurements) this involves hand-rolled C code instead of composing a concept out of pre-existing Python functions. Any time you rewrite Python code in C, you can expect a measurable improvement. Having this be merely 2-4 times is rather underwhelming for a microbenchmark, although of course that same benefit for your whole project would be quite notable. But let's face it: if next(iter(d)) is the bottleneck in your code, something's wrong with your methodology or algorithm. A percentage speedup to an existing function? Definitely, that's basically free performance gains for everything that uses it. An improvement to one very specific operation on one single data type? It has to be either an incredibly common one, or a spectacular improvement, to be more than "negligible". For a fair comparison, I'd like to see a function that uses next(iter(d)) get cythonized. Or run in PyPy. Anything that can give drastic performance improvements for the original code. Then see whether the handrolled one is still better, and if so, how much. I suspect it will still benchmark measurably higher (since it doesn't involve two global lookups), but the difference would narrow even further. Of course, it's hard to get a fair comparison between handrolled C and PyPy, so this is probably academic, but perhaps it'll give some idea of why I consider anything less than an order of magnitude to be negligible here.
A better way to put this is that while the speed benefit to one uncommon task is non-negligible, the cost to readability and comprehensibility is horrendous. This is premature optimization: there's no evidence that getting the first element of a dict is a common operation, let alone a bottleneck that needs optimising.
Right. That's the main thing.
Ultimately, for every millisecond in program runtime saved by using obscure operators for uncommon operations on dicts, we would probably cost a dozen programmers five or ten minutes in confusion while they try to decipher what on earth `mydict << 1` means.
There are programming languages designed to be terse and even deliberately obfuscated, especially code-golfing languages. I'm glad Python is not one of them :-)
Indeed :) And I can't even picture this as being particularly useful for golfing! ChrisA
I am gonna think about that, but all of these operators (though I think you are calling it "confusing") are basically the equivalent of `iter(dict_object)[x]`. So they're fast, and can also extract something from the middle of a 'dict' object.
On Thu, Oct 14, 2021 at 08:36:37AM -0000, Jeremiah Vivian wrote:
So I implemented these functions as operators in a downloaded source of CPython... the differences are insane! (Sorry if this produces nested quotes)
import timeit # d + 1 vs list(d.values())[0]: 2133x speedup
d is a dict. You are trying to add 1 to a dict. Why? `list(d.values())[0]` is a terrible way to extract a single value from a dict. Of course it is going to be slow. -- Steve
Fixed reply: So I implemented these functions as operators in CPython... the differences are insane!
\>\>\> import timeit
# d + 1 vs list(d.values())[0]: 2133x speedup \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d + 1"]) 2000000 loops, best of 5: 165 nsec per loop \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.values())[0]"]) 1000 loops, best of 5: 352 usec per loop
# d - 1 vs list(d.values())[-1]: 2017x speedup \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d - 1"]) 2000000 loops, best of 5: 168 nsec per loop \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.values())[-1]"]) 1000 loops, best of 5: 354 usec per loop
# d * 1 vs list(d.keys())[0]: 3663x speedup \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d * 1"]) 2000000 loops, best of 5: 166 nsec per loop \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.keys())[0]"]) 1000 loops, best of 5: 608 usec per loop
# d / 1 vs list(d.keys())[-1]: 2163x speedup \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d / 1"]) 2000000 loops, best of 5: 166 nsec per loop \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.keys())[-1]"]) 1000 loops, best of 5: 359 usec per loop
# d >> 1 vs list(d.items())[0]: 15302x speedup \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d >> 1"]) 1000000 loops, best of 5: 281 nsec per loop \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.items())[0]"]) 50 loops, best of 5: 4.3 msec per loop
# d << 1 vs list(d.items())[-1]: 15357x speedup \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "d << 1"]) 1000000 loops, best of 5: 280 nsec per loop \>\>\> timeit.main(['-s', "d = {x: x+1 for x in range(10000)}", "list(d.items())[-1]"]) 50 loops, best of 5: 4.3 msec per loop
participants (18)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Alex Waygood
-
Alexander Hill
-
Chris Angelico
-
Christopher Barker
-
David Mertz, Ph.D.
-
Eric Fahlgren
-
Eric V. Smith
-
Erik Demaine
-
Finn Mason
-
Guido van Rossum
-
Jeremiah Vivian
-
Oscar Benjamin
-
Paul Bryan
-
Paul Moore
-
Ricky Teachey
-
Stephen J. Turnbull
-
Steven D'Aprano