Re: Argumenting in favor of first()

On Sat, Dec 14, 2019 at 2:50 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Did you intend to reply only to me?
Nope - I hate list that don’t default to replying to the list (I know that’s an unpopular opinion). I was wondering why no one comment on it.
I’m quite convinced that first() would NOT cause any particular
confusion, neither for sets nor iterators.
I don't see how you make that judgment so confidently, since you don't claim experience with `first`, only the educationally problematic `next(iter())`.
Well, that’s why I said *I’m* convinced, rather than state it as a general truism. I have a lot of experience with newbies, so I think I have a pretty good idea what tends to trip them up. I could be wrong of course, but without actually introducing it using it in and a couple of years worth of teaching, we’ll never know for sure :-) I don't agree. I can't speak for others, but once I understood the
difference between "iterable" and "iterator", "next(iter(iterable))" was self-explanatory when I first encountered it.
Sure — but the newbies I’m talking about don’t yet understand the difference between "iterable" and "iterator", and many people get very far before they do. I *think* the primary target audience for first() would be those folks, so it’s kind of ironic that we’re concerned about confusing them.
I don't see how you
explain all that in terms of first() without the concept of iter(), and the detail that (iter(iterator)) == iterator.
I don’t think you need to — that’s my whole point. In common code, folks rarely work with bare iterators (usually they are implicitly created in a for loop, for instance), so that distinction doesn’t come up. And when it *does* come up, it applies to every use of the iteration protocol: for loops, next, etc. so nothing special about first() I think I made that point already, probably in a note sent to only one person by mistake. Of the top of my head, I can’t think of a single non-contrived example of using a bare iterator in case that is not specifically doing something “special”. Can anyone else? Sure, you can
*tell* students that's how first() works, but how do they apply it in their own code?
You tell them that can use it to get the first item out of a container or other iterable. And you’re done. Some time in the future, they may encounter a bare iterator, and surprised that first() consumes an item. But as I said before, they’ll have other issues as well, and will need to learn about iterators then. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Dec 14, 2019, at 12:36, Christopher Barker <pythonchb@gmail.com> wrote:
Of the top of my head, I can’t think of a single non-contrived example of using a bare iterator in case that is not specifically doing something “special”.
Calling iter on a container is hardly the only way to get an Iterator. You also get Iterators from open, map, filter, zip, genexprs, generator functions, most itertools functions, etc. And I’m pretty sure at least the first of those is learned pretty early and used pretty often by novices. In fact, I think files are one of the most common ways people learn about iterators. There are certainly a lot of StackOverflow dups asking why `for line in file:` gives them no lines when just 10 lines earlier the same file had 20 lines.

A pattern I've written a number of times is roughly: lines = open(fname) header = next(lines) for line in lines: process (line, header) That's not so artificial, I think. Of course, first() would also work here. But I'm not sure it's any particular advantage in this case. On Sun, Dec 15, 2019, 12:47 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Dec 14, 2019, at 12:36, Christopher Barker <pythonchb@gmail.com> wrote:
Of the top of my head, I can’t think of a single non-contrived example
of using a bare iterator in case that is not specifically doing something “special”.
Calling iter on a container is hardly the only way to get an Iterator. You also get Iterators from open, map, filter, zip, genexprs, generator functions, most itertools functions, etc. And I’m pretty sure at least the first of those is learned pretty early and used pretty often by novices.
In fact, I think files are one of the most common ways people learn about iterators. There are certainly a lot of StackOverflow dups asking why `for line in file:` gives them no lines when just 10 lines earlier the same file had 20 lines.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BP2VRK... Code of Conduct: http://python.org/psf/codeofconduct/

On Sun, 15 Dec 2019 at 05:54, David Mertz <mertz@gnosis.cx> wrote:
A pattern I've written a number of times is roughly:
lines = open(fname) header = next(lines) for line in lines: process (line, header)
That's not so artificial, I think. Of course, first() would also work here. But I'm not sure it's any particular advantage in this case.
I think you haven't understood the point about wanting a different exception instead of StopIteration. This is precisely an example of what not to do because it can fail in strange ways if the file is empty (see above in this thread or PEP 479). The simplest way to fix this is with 2-arg next: with open(fname) as lines: header = next(lines, None) if header is None: # Not sure what you want to do here... for line in lines: process(line, header) -- Oscar

Yes, of course. I was just trying to illustrate using next() in a non-artificial way. In real code (but truthfully, probably not in my quick "one off" scripts) I write lines = get_lines_file_or_elswhere(resource) header = next(lines, sentinel) if looks_like_header(header): for line in lines: ... On Sun, Dec 15, 2019, 5:40 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Sun, 15 Dec 2019 at 05:54, David Mertz <mertz@gnosis.cx> wrote:
A pattern I've written a number of times is roughly:
lines = open(fname) header = next(lines) for line in lines: process (line, header)
That's not so artificial, I think. Of course, first() would also work
here. But I'm not sure it's any particular advantage in this case.
I think you haven't understood the point about wanting a different exception instead of StopIteration. This is precisely an example of what not to do because it can fail in strange ways if the file is empty (see above in this thread or PEP 479).
The simplest way to fix this is with 2-arg next:
with open(fname) as lines: header = next(lines, None) if header is None: # Not sure what you want to do here... for line in lines: process(line, header)
-- Oscar _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6LCVRR... Code of Conduct: http://python.org/psf/codeofconduct/

On Sun, Dec 15, 2019 at 6:40 AM David Mertz <mertz@gnosis.cx> wrote:
Yes, of course. I was just trying to illustrate using next() in a non-artificial way. In real code (but truthfully, probably not in my quick "one off" scripts) I write
lines = get_lines_file_or_elswhere(resource) header = next(lines, sentinel) if looks_like_header(header): for line in lines: ...
Hmm, interesting -- so this means that you do write code expecting a generic iterator, rather than a file-like object. I can't say I've ever done that, nor seem anyone else to that. I'm curious: what other iterators might this code be expected to work with? (that is, a list of lines, as returned by file.readlines() would not work --you'd have to wrap it in iter() first... But apparently it reflects the move that Python has been making toward being all about iterators. I think first() would be a help mostly to folks that DON'T think primarily in terms of iterators. And frankly, I think that is a population Python should strive to support. For the record, I write a lot of code that looks like: data_file = get_lines_file_(resource) header = data_file.readline() if looks_like_header(header): for line in data_file: ... That is, I'm always expecting a file-like object, rather than a generic iterator. That may be because I developed habits long before files were iterators.... BTW: this has been a REALLY LONG thread -- I think it's time for a concrete proposal to be written up, sonce it appears we're not all clear on what we're talking about. For my part I think a first() function would be nice, an am open to a couple variations, so someone with a stronger opinion should propose something. Tim: what version do you have in mind? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 12/15/2019 2:21 PM, Christopher Barker wrote:
On Sun, Dec 15, 2019 at 6:40 AM David Mertz <mertz@gnosis.cx <mailto:mertz@gnosis.cx>> wrote:
Yes, of course. I was just trying to illustrate using next() in a non-artificial way. In real code (but truthfully, probably not in my quick "one off" scripts) I write
lines = get_lines_file_or_elswhere(resource) header = next(lines, sentinel) if looks_like_header(header): for line in lines: ...
Hmm, interesting -- so this means that you do write code expecting a generic iterator, rather than a file-like object.
I can't say I've ever done that, nor seem anyone else to that.
I'm curious: what other iterators might this code be expected to work with? (that is, a list of lines, as returned by file.readlines() would not work --you'd have to wrap it in iter() first...
I do this a lot for test cases. Instead of having a test file, I just have a list of lines that would be in the file, and pass that list in to a function that just takes an iterator. The function normally is passed a file, but also works for my test code which doesn't use files. Eric

On Sun, Dec 15, 2019, 2:21 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Sun, Dec 15, 2019 at 6:40 AM David Mertz <mertz@gnosis.cx> wrote:
lines = get_lines_file_or_elswhere(resource) header = next(lines, sentinel) if looks_like_header(header): for line in lines: ...
Hmm, interesting -- so this means that you do write code expecting a generic iterator, rather than a file-like object.
How file-like do you need? I've certainly written things that usually take an actual file, but sometimes get io.StringIO, or an SQL cursor. On the other hand, I doubt I'd ever have the same handling of anything coming from itertools.permutations(). data_file = get_lines_file_(resource)
header = data_file.readline() if looks_like_header(header): for line in data_file: ...
That is, I'm always expecting a file-like object, rather than a generic iterator. That may be because I developed habits long before files were iterators....
I'm 20 years with Python too. But one habit I've changed is to use next() rather than .readline() in this sort of code. Even if my current need is for a file-as-such, there's no reason to restrict that. On the other hand, I use .readlines() and .read() plenty often enough. Sometimes greedy and concrete is appropriate.

On Sun, Dec 15, 2019 at 11:45 AM David Mertz <mertz@gnosis.cx> wrote:
Hmm, interesting -- so this means that you do write code expecting a
generic iterator, rather than a file-like object.
How file-like do you need? I've certainly written things that usually take an actual file, but sometimes get io.StringIO, or an SQL cursor.
file-like enough to have a .readline() method. which I expect an SQL cursor does not (or does it?) It all depends on what interface a given object is emulating -- I've always thought in terms of file-like objects, but maybe iterator_of_lines is more generic. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

[Christopher Barker <pythonchb@gmail.com>]
... BTW: this has been a REALLY LONG thread -- I think it's time for a concrete proposal to be written up, sonce it appears we're not all clear on what we're talking about. For my part I think a first() function would be nice, an am open to a couple variations, so someone with a stronger opinion should propose something.
Tim: what version do you have in mind?
Same as always: exactly what more-itertools has supplied for years already :-) If the iterable is empty/exhausted, by default ValueError is raised, but that can be overridden by also passing an optional argument to return instead (like dict.pop() in this respect). So, e.g., first([42]) returns 42 first([]) raises ValueError first([], 42) and first([], default=42) return 42 I don't think it belongs in the builtins. It doesn't perfectly fit anywhere, but given its prior history in the more-itertools and itertoolz packages, Python's itertools package seems the least annoying ;-) home for it.

On Sun, Dec 15, 2019 at 12:08 PM Tim Peters <tim.peters@gmail.com> wrote:
Tim: what version do you have in mind?
Same as always: exactly what more-itertools has supplied for years already :-)
If the iterable is empty/exhausted, by default ValueError is raised, but that can be overridden by also passing an optional argument to return instead (like dict.pop() in this respect).
So, e.g.,
first([42]) returns 42 first([]) raises ValueError first([], 42) and first([], default=42) return 42
perfect -- thanks -- we now have a concrete proposal +1 from me. :-) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Dec 14, 2019, at 21:52, David Mertz <mertz@gnosis.cx> wrote:
A pattern I've written a number of times is roughly:
lines = open(fname) header = next(lines) for line in lines: process (line, header)
That's not so artificial, I think. Of course, first() would also work here. But I'm not sure it's any particular advantage in this case.
I think first is actually worse here. If you might ever refactor this code to, say, take an arbitrary iterable as input rather than opening (and leaking) a file by name, using next means that forgetting to call iter somewhere is an easy to debug TypeError, but using first means it’s a subtle bug where the header is counted again as a normal row. But I also don’t think anyone will reach for first here. People who are already thinking of the file in terms of being an iterable are going to continue to use next. Everyone else is just going to call readline instead of either next or first. I think the advantages are in cases like the one Chris pointed out, where you have a dict view (or some other case where you know you have a non-iterator iterable but it isn’t, or at least might not be, a sequence), where first is helpful, for the reasons he described. For example, you’re at the REPL interactively working out how to process some web service: >>> p = requests.post(url) >>> r = p.json() >>> first(r.items()) ... here I see what I need to deal with Or maybe even that ends up bigger than you expected, but then you write: >>> things = _['things'] >>> things[0] ... still too big? ... >>> thing = things[0] >>> first(thing.items()) ... aha, now I get it Looking at my own REPL history and pile of throwaway scripts, dicts actually seem to make up a lot of my own uses of more_itertools.first.

On Sat, Dec 14, 2019 at 9:46 PM Andrew Barnert <abarnert@yahoo.com> wrote:
Of the top of my head, I can’t think of a single non-contrived example of using a bare iterator in case that is not specifically doing something “special”.
Calling iter on a container is hardly the only way to get an Iterator. You also get Iterators from open, map, filter, zip, genexprs, generator functions, most itertools functions, etc.
thanks -- I was being pretty brain dead. But most of those are usually immediately iterated over again.So in common use, I'd say the fileobject from open() is a biggie -- that's one of the few where folks will commonly do something other than simply iterate over the whole thing at once anyway. In fact, I think files are one of the most common ways people learn about
iterators. There are certainly a lot of StackOverflow dups asking why `for line in file:` gives them no lines when just 10 lines earlier the same file had 20 lines.
EXACTLY ! here we have it -- the most common case of folks working directly with a iterator, and they are confused with "for". So first() wouldn't add any new source of confusion. Yes, I can see that folks might do somethig like: my_file - open(something) the_header = first(my_file) and then later on be surprised that another call to first() produces a different result. But I think they'd only be surprsed once :-) and file objects are a bit special, because they have a whol other API: my_file.readline() in fact, In my code, and most code I've seen, that's exactly what's used: # get the header header = my_file.readline() # parse the rest for line in my_file(): do_something(line) I could use next() instead of readline() but I never do. Probably because I learned Python long before files were an iterator. But I don't' see newbies doing it, either. (of course, I'm not teaching it that way ;-) ). But I'm not sure it's just that - I think it's because most of us are writing that kind of code to deal with any foile-like object, not any generic iterator. As in: a file object happens to be an iterator (so I can put it in a for loop), rather than a file object is an iterator, that happens to have some other methods for working with it. Now that I think about it, that's also the case with sequences -- we think of the fact that we are working with a list, and it also happens to be an iterable, so we can put it in a for loop. rather than thinking about writing general code for any iterator, that might be a sequence. Which is why I'd use a_list[0], rather than next(iter(a_list)). but I can't use dict.keys()[0], for instance. So it would be nice to have a way to do that that was more obvious than: next(iter(d.keys()) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
participants (6)
-
Andrew Barnert
-
Christopher Barker
-
David Mertz
-
Eric V. Smith
-
Oscar Benjamin
-
Tim Peters