Re: Argumenting in favor of first()
On Fri, Dec 06, 2019 at 09:11:44AM -0400, Juancarlo Añez wrote: [...]
Sure, but in this case, it isn't a fragment of a larger function, and that's not what it looks like. If it looked like what you wrote, I would understand it. But it doesn't, so I didn't really understand what it was supposed to do, until I read the equivalent version using first/next.
Exactly my point.
Indeed, and I agree with that. But I still don't see what advantage there is to having a `first` builtin which does so little. It's a really thin wrapper around `next` that: calls iter() on its iterable argument supplies a default and then calls next() with two arguments I guess my question is asking you to justify adding a builtin rather than just educating people how to use next effectively. This is how I would implement the function in Python: def first(iterable, default=None): return next(iter(iterable), default) But there's a major difference in behaviour depending on your input, and one which is surely going to lead to bugs from people who didn't realise that iterator arguments and iterable arguments will behave differently: # non-iterator iterable py> obj = [1, 2, 3, 4] py> [first(obj) for __ in range(5)] [1, 1, 1, 1, 1] # iterator py> obj = iter([1, 2, 3, 4]) py> [first(obj) for __ in range(5)] [1, 2, 3, 4, None] We could document the difference in behaviour, but it will still bite people and surprise them. We could, I guess, eliminate the difference by adding the ability to peek ahead to the next value of an arbitrary iterator without consuming that value. This would have to be done by the interpreter, not in Python code, and it would open new complexities. Consider a generator which yields values which depend, in some complex and unpredicatable way, on *when* it is called. Say, the amount of disk space available, or the number of records in a database, or the current time. If I peek into the generator, I would see the time at the moment I peeked: now = get_current_time_generator() peek(now) # returns 11:25:30am Since peek can't literally see into the future, it cannot be otherwise. But what happens when I call next? time.sleep(60) next(now) # what will this return? There are two alternatives: 1. `next(now)` will return 11:25:30am, the same value that peek gave; 2. `next(now)` will return 11:27:30am, the current time. Option 1 keeps the invariant that peeking will show you the next value without advancing the iterable, but it violates the invariant that `next(now)` yields the current time. Option 2 keeps the `next` invariant, but violates the `peek` invariant. Whichever option we choose, peeking into arbitrary iterators will break somebody's expectations. Bringing it back to `first`: * It seems to me that `first` adds very little that `next` doesn't already give us. * The simple and obvious implementation of `first` would have a troublesome difference in behaviour between iterator arguments and non-iterator arguments. * To eliminate that difference would require the ability to peek ahead into arbitrary iterators, including generators, which is a much bigger change, and equally troublesome. -- Steven
On Dec 6, 2019, at 16:44, Steven D'Aprano <steve@pearwood.info> wrote:
We could, I guess, eliminate the difference by adding the ability to peek ahead to the next value of an arbitrary iterator without consuming that value. This would have to be done by the interpreter, not in Python code,
You can easily wrap an iterator to make it peekable. Untested, off the top of my head on my phone: class Peekable(Iterator): _sentinel = object() def __init__(self, it): self._peek = self._sentinel self._it = iter(it) def __iter__(self): return self def __next__(self): if self._peek is not self._sentinel: result, self._peek = self._peek, self._sentinel return result return next(self._it) def peek(self): if self._peek is self._sentinel: self._peek = next(self._it) return self._peek You can easily add a default value for peek, a prepend method, an isempty method (or just call it __bool__), multiple levels of peek (use a deque, or tee), combine the last two to prepend multiple values (which is equivalent to chain, but sometimes more readable as a method), add indexing on top of the multi-peek, … There’s a version of this included in more-itertools, which I’ve used quite a few times. I don’t remember exactly which extra features it comes with, because usually I just want the basic peek.
On Fri, Dec 06, 2019 at 07:27:19PM -0800, Andrew Barnert wrote:
On Dec 6, 2019, at 16:44, Steven D'Aprano <steve@pearwood.info> wrote:
We could, I guess, eliminate the difference by adding the ability to peek ahead to the next value of an arbitrary iterator without consuming that value. This would have to be done by the interpreter, not in Python code,
You can easily wrap an iterator to make it peekable.
Fair enough, in hindsight I'm not sure what I was thinking when I said you couldn't do it from pure Python. Nevertheless, you still have a fundamental problem when it comes to iterators where the value yielded varies in time. Your Peekable wrapper reports the past state of whatever value the underlying iterator produces (the value at the time peek was called), not the current state. -- Steven
сб, 7 дек. 2019 г. в 03:45, Steven D'Aprano <steve@pearwood.info>:
This is how I would implement the function in Python:
def first(iterable, default=None): return next(iter(iterable), default)
A somewhat related discussion was somewhere inside November 2017 <https://mail.python.org/archives/list/python-ideas@python.org/thread/QX6IRM4SNCK74IYQNFOCLDAZ5RIIRUKL/> thread "How assignment should work with generators?". The main idea was to extend assignment syntax to something like `first, second, ... = gen` which should be equivalent to `first, second, ... = next(gen), next(gen)`. Another idea was (but nobody liked it) to change the behavior of `x, y, *tail = iter` to not to consume starred part and to make a generator instead of list. I understand that the topic discussed here is slightly different, but in my understanding it is all the eggs of one chicken. I still like the idea of partial assignment syntax `x, y, ... = iter` but as well as then, I understand that I don’t have enough English knowledge and time to write a proposal on that topic and especially to follow thread and to defend an idea :( The main objection was that this all is easy achievable with `islice` from `itertools`. Back to the topic - maybe `first` is a good citizen for `itertools` module. Because this idea (about first ) is discussed constantly every two years. with kind regards, -gdg
+1 for itertools.first(seq, default=Exception) *and* itertools.one(seq, default=Exception) This is a very common pattern (particularly with RDF / JSON-LD (@type) where there can be multiple instances of any predicate-object / attribute-value pair) SQLAlchemy also has .first(), .one(), and .one_or_none() (with no default= and sqlalchemy.orm.exc.NoResultFound and sqlalchemy.orm.exc.MultipleResultsFound) On Sat, Dec 7, 2019, 9:20 AM Kirill Balunov <kirillbalunov@gmail.com> wrote:
сб, 7 дек. 2019 г. в 03:45, Steven D'Aprano <steve@pearwood.info>:
This is how I would implement the function in Python:
def first(iterable, default=None): return next(iter(iterable), default)
A somewhat related discussion was somewhere inside November 2017 <https://mail.python.org/archives/list/python-ideas@python.org/thread/QX6IRM4SNCK74IYQNFOCLDAZ5RIIRUKL/> thread "How assignment should work with generators?". The main idea was to extend assignment syntax to something like `first, second, ... = gen` which should be equivalent to `first, second, ... = next(gen), next(gen)`. Another idea was (but nobody liked it) to change the behavior of `x, y, *tail = iter` to not to consume starred part and to make a generator instead of list. I understand that the topic discussed here is slightly different, but in my understanding it is all the eggs of one chicken.
I still like the idea of partial assignment syntax `x, y, ... = iter` but as well as then, I understand that I don’t have enough English knowledge and time to write a proposal on that topic and especially to follow thread and to defend an idea :( The main objection was that this all is easy achievable with `islice` from `itertools`.
Back to the topic - maybe `first` is a good citizen for `itertools` module. Because this idea (about first ) is discussed constantly every two years.
with kind regards, -gdg _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/D7ALA5... Code of Conduct: http://python.org/psf/codeofconduct/
On Dec 7, 2019, at 07:33, Wes Turner <wes.turner@gmail.com> wrote:
+1 for itertools.first(seq, default=Exception) *and* itertools.one(seq, default=Exception)
What does default=Exception mean? What happens if you pass a different value? Does it do one thing if the argument is a type that’s a subclass of Exception (or of BaseException?) and a different thing if it’s any other value? Also, why do you want these to be different from similar recipes like nth and first_true, and from the versions of exactly these functions in more_itertools, and the other related ones like only? Also, “seq” implies that you’re expecting these to be used on sequences, not general iterables. In that case, why not just use [0]? Arguably, first, and maybe some of it’s cousins, should go into the recipes. And I don’t see any reason they shouldn’t be identical to the versions in more-itertools, but if there is one, it should be coordinated with Erik Rose in some way so they stay in sync. Maybe first is so useful, so much more so than all of the other very useful recipes, including things like consume, flatten, and unique (which IIRC were the ones that convinced everyone it’s time to add a more-itertools link to the docs), that it needs to be slightly more discoverable—e.g., by itertools.<TAB> completion? But that seems unlikely given that they’ve been recipes for decades and first wasn’t. And it seems even less likely for one, which nobody has mentioned in this thread yet. If there’s a general argument that linking to more-itertools hasn’t helped anything, or that the recipes are still useless until someone makes the often-proposed/never-followed-through change of finding a way to make the recipes individually searchable and linkable, or whatever, that’s fine, but it’s not really an argument against making a special case for one that isn’t made for unique or consume.
On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 7, 2019, at 07:33, Wes Turner <wes.turner@gmail.com> wrote:
+1 for itertools.first(seq, default=Exception) *and* itertools.one(seq,
default=Exception)
What does default=Exception mean? What happens if you pass a different value? Does it do one thing if the argument is a type that’s a subclass of Exception (or of BaseException?) and a different thing if it’s any other value?
That's a good point: Exception is a bad sentinel value. Is None a good default value? What if the genexpr'd iterable is [None, 2, 3]
Also, why do you want these to be different from similar recipes like nth and first_true, and from the versions of exactly these functions in more_itertools, and the other related ones like only?
Also, “seq” implies that you’re expecting these to be used on sequences, not general iterables. In that case, why not just use [0]?
I chose `seq` as the argument because I was looking at toolz.itertoolz.first(),which has no default= argument. Though, .first() (or .one()) on an unordered iterable is effectively first(shuffle(iterable)), which *could* raise an annotation exception at compile time. Sets are unordered iterables and so aren't sequences; arent OrderedIterables.
Arguably, first, and maybe some of it’s cousins, should go into the recipes. And I don’t see any reason they shouldn’t be identical to the versions in more-itertools, but if there is one, it should be coordinated with Erik Rose in some way so they stay in sync.
Oh hey, "more-itertools". I should've found that link in the cpython docs. https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more... : def first(iterable, default=_marker) That makes more sense than default=Exception. FWIW, more-itertools .one() raises ValueError (or whatever's passed as too_short= or too_long= kwargs). Default subclasses of ValueError may not be justified?
Maybe first is so useful, so much more so than all of the other very useful recipes, including things like consume, flatten, and unique (which IIRC were the ones that convinced everyone it’s time to add a more-itertools link to the docs), that it needs to be slightly more discoverable—e.g., by itertools.<TAB> completion? But that seems unlikely given that they’ve been recipes for decades and first wasn’t.
def itertools._check_more_itertools(): """ https://more-itertools.readthedocs.io/en/stable/api.html """
And it seems even less likely for one, which nobody has mentioned in this thread yet.
If there’s a general argument that linking to more-itertools hasn’t helped anything, or that the recipes are still useless until someone makes the often-proposed/never-followed-through change of finding a way to make the recipes individually searchable and linkable, or whatever, that’s fine, but it’s not really an argument against making a special case for one that isn’t made for unique or consume.
Is programming by Exception faster or preferable to a sys.version_info conditional? try: from itertools import one, first except ImportError: from more_itertools.more import one, first # Or: if sys.version_info[:2] > (3,7); from itertools import one, first else: from more_itertools.more import one, first # Or, just: # install_requires=["more-itertools"] from more_itertools.more import one, first
On Dec 7, 2019, at 18:09, Wes Turner <wes.turner@gmail.com> wrote:
On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote: On Dec 7, 2019, at 07:33, Wes Turner <wes.turner@gmail.com> wrote:
+1 for itertools.first(seq, default=Exception) *and* itertools.one(seq, default=Exception)
What does default=Exception mean? What happens if you pass a different value? Does it do one thing if the argument is a type that’s a subclass of Exception (or of BaseException?) and a different thing if it’s any other value?
That's a good point: Exception is a bad sentinel value. Is None a good default value? What if the genexpr'd iterable is [None, 2, 3]
That’s a common issue in Python. When you can’t use None as a sentinel because it could be a valid user input or return value, you just create a private module or class attribute that can’t equal anything the user could pass in, like this: _sentinel = object() And then: def spam(stuff, default=_sentinel): if default is _sentinel: do single-argument stuff here else: do default-value stuff here This seems like the kind of thing that should be explained somewhere in every tutorial (including the official one), but most people end up finding it only by accident, reading some code that uses it and trying to figure out what it does and why. The same way people figure out how useful two-argument iter is, and a couple other things.
Also, “seq” implies that you’re expecting these to be used on sequences, not general iterables. In that case, why not just use [0]?
I chose `seq` as the argument because I was looking at toolz.itertoolz.first(),which has no default= argument.
Ah. I’m not sure why toolz uses seq for arguments that are usually iterators, but I guess that’s not horrible or anything. In itertools and more-itertools, the argument is usually called iterable, and seq is reserved specifically for the ones that should be sequences, like chunked(iterable, n) vs. sliced(seq, n). But as useful as that convention is, I suppose it’s not a universal thing that everyone knows and follows; it’s not documented or explained anywhere, you just kind of have to guess the distinction from the names.
Though, .first() (or .one()) on an unordered iterable is effectively first(shuffle(iterable)), which *could* raise an annotation exception at compile time.
I’m not sure what you mean by an “annotation exception”. You mean an error from static type checking in mypy or something? I’m not sure why it would be an error, unless you got the annotation wrong. It should be Iterable, and that will work for iterators and sequences and sets and so on just fine. Also, it’s not really like shuffle, because even most “unordered iterables” in Python, like sets, actually have an order. It’s not guaranteed to be a meaningful one, but it’s not guaranteed to be meaningless either. If you need that (e.g., you’re creating a guessing game where you don’t want the answer to be the same every time anyone runs the game, or for security reasons), you really do need to explicitly randomize. For example, if s = set(range(10,0,-1)), it’s not guaranteed anywhere that next(iter(s)) will be 0, but it is still always 0 in any version of CPython. Worse, whatever next(iter(s)) is, if you call next(iter(s)) again (without mutating s in between), you’ll get the same value from the new Iterator in any version of any Python implementation. But if you don’t care whether it’s meaningful or meaningless, first, one, etc. on a set are fine.
Sets are unordered iterables and so aren't sequences; arent OrderedIterables.
Right, but a sequence isn’t just an ordered iterable, it’s also random-access indexable (plus a few other things). An itertools.count(), a typical sorteddict type, a typical linked list, etc. are all ordered but not sequences. The more-itertools functions that require sequences (and name them seq) usually require indexing or slicing.
Arguably, first, and maybe some of it’s cousins, should go into the recipes. And I don’t see any reason they shouldn’t be identical to the versions in more-itertools, but if there is one, it should be coordinated with Erik Rose in some way so they stay in sync.
Oh hey, "more-itertools". I should've found that link in the cpython docs.
Well, it was only added to the docs in, I believe, 3.8, so a lot of people probably haven’t seen the link yet. (That’s always a problem for a widely-used decades-old language that evolves over 18-month cycles and carefully preserves backward compatibility; you can’t expect everyone to always know the latest of anything the way you can with something like Swift. But if we’re talking about further changes beyond what’s in 3.8, I think we have to assume that the docs change will start being effective before anything new we propose.)
https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more... :
def first(iterable, default=_marker)
That makes more sense than default=Exception.
FWIW, more-itertools .one() raises ValueError (or whatever's passed as too_short= or too_long= kwargs). Default subclasses of ValueError may not be justified?
I don’t _think_ they are. It’s probably pretty rare that you want to switch on the type programmatically (e.g., use different handlers for too short and too long), and it’s pretty trivial to add your own subclasses. You do often want to be able to distinguish them as a human when debugging your code, but that’s already taken care of by the exception message text. (That’s only documented in the examples, but is that a problem?)
Maybe first is so useful, so much more so than all of the other very useful recipes, including things like consume, flatten, and unique (which IIRC were the ones that convinced everyone it’s time to add a more-itertools link to the docs), that it needs to be slightly more discoverable—e.g., by itertools.<TAB> completion? But that seems unlikely given that they’ve been recipes for decades and first wasn’t.
def itertools._check_more_itertools(): """ https://more-itertools.readthedocs.io/en/stable/api.html """
I’m not sure what this is intended to mean. Are you suggesting we could add this as an empty function just so that tab completion, dir, help, IDE mechanisms, etc. could make it more discoverable? If so, you’d want to give it a non-private name (most of those things will ignore a name starting with an underscore; some of them will use __all__ to override it, but others won’t.) But otherwise, it might not be a bad idea. A lot of people do explore by IDE completion, apparently.
And it seems even less likely for one, which nobody has mentioned in this thread yet.
If there’s a general argument that linking to more-itertools hasn’t helped anything, or that the recipes are still useless until someone makes the often-proposed/never-followed-through change of finding a way to make the recipes individually searchable and linkable, or whatever, that’s fine, but it’s not really an argument against making a special case for one that isn’t made for unique or consume.
Is programming by Exception faster or preferable to a sys.version_info conditional?
I don’t know if it’s faster (and I doubt that matters), and I’m sure you could argue other pros and cons each way, but it is a long-standing common idiom (at least back to the 2.5 days, when half the web services on the internet probably started by importing json with a fallback to simplejson) to do it your first way:
try: from itertools import one, first except ImportError: from more_itertools.more import one, first
But why does one need to be added to itertools in the first place? Is it really that much more common a need than flatten, consume, etc., or so much harder to write yourself (maybe not inherently, but because its target audience is more novice-y), or what? You need some argument for that to overcome the status quo, and expanding itertools making it harder to find the stuff that really is necessary to have there, and the fact that you’d either have to implement it in C or convert itertools to a Python-and-C module to do it, etc. Otherwise, either just adding it to the recipes, or doing nothing at all, seems like the right choice.
On Sat, Dec 7, 2019, 11:30 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 7, 2019, at 18:09, Wes Turner <wes.turner@gmail.com> wrote:
On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 7, 2019, at 07:33, Wes Turner <wes.turner@gmail.com> wrote:
+1 for itertools.first(seq, default=Exception) *and* itertools.one(seq,
default=Exception)
What does default=Exception mean? What happens if you pass a different value? Does it do one thing if the argument is a type that’s a subclass of Exception (or of BaseException?) and a different thing if it’s any other value?
That's a good point: Exception is a bad sentinel value. Is None a good default value? What if the genexpr'd iterable is [None, 2, 3]
Here are more_itertools.more.one() and more_itertools.more.first() without docstrings from https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more... :
``` def one(iterable, too_short=None, too_long=None): it = iter(iterable) try: value = next(it) except StopIteration: raise too_short or ValueError('too few items in iterable (expected 1)') try: next(it) except StopIteration: pass else: raise too_long or ValueError('too many items in iterable (expected 1)') return value def first(iterable, default=_marker): try: return next(iter(iterable)) except StopIteration: # I'm on the edge about raising ValueError instead of StopIteration. At # the moment, ValueError wins, because the caller could conceivably # want to do something different with flow control when I raise the # exception, and it's weird to explicitly catch StopIteration. if default is _marker: raise ValueError('first() was called on an empty iterable, and no ' 'default value was provided.') return default ``` I would argue that there could be subclasses of ValueError for .one() that would also be appropriate for .first() (and/or .take(iterable, count=1, default=_default) class TooShortValueError(ValueError): class TooLongValueError(ValueError): (Where, again, SQLAlchemy has NoResultFound and MultipleResultsFound) The names are less important than being able to distinguish the difference between the cases. And then itertools.one() could be interface-compatible with this in more_itertools.more.one() def one(iterable, too_short=TooShortValueError, too_long=TooLongValueError):
That’s a common issue in Python. When you can’t use None as a sentinel because it could be a valid user input or return value, you just create a private module or class attribute that can’t equal anything the user could pass in, like this:
_sentinel = object()
And then:
def spam(stuff, default=_sentinel): if default is _sentinel: do single-argument stuff here else: do default-value stuff here
`None` is not a good default value for .first() (or .one()) because None may be the first item in the iterable. It should be necessary to explicitly specify default=None if that's what's expected.
This seems like the kind of thing that should be explained somewhere in every tutorial (including the official one), but most people end up finding it only by accident, reading some code that uses it and trying to figure out what it does and why. The same way people figure out how useful two-argument iter is, and a couple other things.
I'll second a recommendation to note the existence of two-argument iter() and two-argument next() in the docstring for itertools.first()
Also, “seq” implies that you’re expecting these to be used on sequences,
not general iterables. In that case, why not just use [0]?
I chose `seq` as the argument because I was looking at toolz.itertoolz.first(),which has no default= argument.
Ah. I’m not sure why toolz uses seq for arguments that are usually iterators, but I guess that’s not horrible or anything. In itertools and more-itertools, the argument is usually called iterable, and seq is reserved specifically for the ones that should be sequences, like chunked(iterable, n) vs. sliced(seq, n). But as useful as that convention is, I suppose it’s not a universal thing that everyone knows and follows; it’s not documented or explained anywhere, you just kind of have to guess the distinction from the names.
Though, .first() (or .one()) on an unordered iterable is effectively first(shuffle(iterable)), which *could* raise an annotation exception at compile time.
I’m not sure what you mean by an “annotation exception”. You mean an error from static type checking in mypy or something? I’m not sure why it would be an error, unless you got the annotation wrong. It should be Iterable, and that will work for iterators and sequences and sets and so on just fine.
Also, it’s not really like shuffle, because even most “unordered iterables” in Python, like sets, actually have an order. It’s not guaranteed to be a meaningful one, but it’s not guaranteed to be meaningless either. If you need that (e.g., you’re creating a guessing game where you don’t want the answer to be the same every time anyone runs the game, or for security reasons), you really do need to explicitly randomize. For example, if s = set(range(10,0,-1)), it’s not guaranteed anywhere that next(iter(s)) will be 0, but it is still always 0 in any version of CPython. Worse, whatever next(iter(s)) is, if you call next(iter(s)) again (without mutating s in between), you’ll get the same value from the new Iterator in any version of any Python implementation.
Taking first(unordered_sequence) *is* like shuffle. Sets merely seem to be ordered when the items are integers that hash to said integer: https://stackoverflow.com/a/45589593 Does .first() need to solve for this with type annotations and/or just a friendly docstring reminder?
But if you don’t care whether it’s meaningful or meaningless, first, one, etc. on a set are fine.
Sets are unordered iterables and so aren't sequences; arent OrderedIterables.
Right, but a sequence isn’t just an ordered iterable, it’s also random-access indexable (plus a few other things). An itertools.count(), a typical sorteddict type, a typical linked list, etc. are all ordered but not sequences.
In terms of math, itertools.count() is an infinite ordered sequence (for which there is not implementation of lookup by subscript) In terms of Python, the generator returned by itertools.count() is an Iterable (hasattr('__iter__')) that does not implement __getitem__ ( doesn't implement the Mapping abstract type interface ). https://github.com/python/typeshed/blob/master/stdlib/3/itertools.pyi : _N = TypeVar('_N', int, float) def count(start: _N = ..., step: _N = ...) -> Iterator[_N]: ... # more general types? A collections.abc.Ordered type might make sense if Reversible does not imply Ordered. A hasattr('__iter_ordered__') might've made sense. hasattr('__getitem__') !=> Sequence Sequence => hasattr('__getitem__') The more-itertools functions that require sequences (and name them seq)
usually require indexing or slicing.
That may be a good convention. But in terms of type annotations - https://docs.python.org/3/library/collections.abc.html - [x] Iterable (__iter__) - [x] Collection (__getitem__, __iter__, __len__) - [x] Mapping / MutableMapping (Collection) - [x] Sequence / MutableSequence (Sequence, Reversible, Collection (Iterable)) - [x] Reversible - [ ] Ordered Does 'Reversible' => (imply) Ordered; which would then be redundant? Math definition (setting aside a Number sequence-element type restriction): Sequence = AllOf(Iterable, Ordered) More_itertools convention, AFAIU?: seq => AllOf(Iterable, Mapping, Ordered) seq => all(hasattr(x) for x in (' __iter__', '__getitem__')) How does this apply to .first()? If I call .first() on an unordered Iterable like a set, I may not get the first item; this can/may/will sometimes fail: assert first({'a', 'b'}) == 'a' If there was an Ordered ABC (maybe unnecessarily in addition to Reversible), we could specify: # collections.abc class OrderedIterable(Iterable, Ordered): pass # itertools def first(Iterable: OrderedIterable, default=_default): And then type checking would fail at linting time. But then we'd want take(iterable: Iterable, count=1, default=_default) for use with unordered iterables like sets. Implicit in a next() call is a hasattr(obj, '__iter__') check; but a user calling .first() may or may not be aware that there is no check that the passed Iterable is ordered. Type annotations could catch that mistake. "Dicts are now insertion-ordered (when there are no deletes), so everything is ordered and .first() is deterministic" is not correct and documentation in .first() may be pedantic but not redundant.
On 9 Dec 2019, at 20:07, Wes Turner <wes.turner@gmail.com> wrote:
class TooShortValueError(ValueError): class TooLongValueError(ValueError):
I had to think about the short and long for a moment. I'd suggest: class TooManyItems(ValueError): class TooFewItems(ValueError): Barry
On Dec 9, 2019, at 12:08, Wes Turner <wes.turner@gmail.com> wrote:
On Sat, Dec 7, 2019, 11:30 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 7, 2019, at 18:09, Wes Turner <wes.turner@gmail.com> wrote:
On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote:
I would argue that there could be subclasses of ValueError for .one() that would also be appropriate for .first() (and/or .take(iterable, count=1, default=_default)
...
The names are less important than being able to distinguish the difference between the cases.
But again, the need to be able to distinguish is, while not nonexistent, pretty rare. And cases where you need to distinguish them but don’t care what the types are otherwise are even less common. So, is that common enough to be worth adding two more exception types to Python (or just to itertools) that aren’t used anywhere else? Just saying that they might be useful somewhere doesn’t answer that question.
That’s a common issue in Python. When you can’t use None as a sentinel because it could be a valid user input or return value, you just create a private module or class attribute that can’t equal anything the user could pass in, like this:
_sentinel = object()
And then:
def spam(stuff, default=_sentinel): if default is _sentinel: do single-argument stuff here else: do default-value stuff here
`None` is not a good default value for .first() (or .one()) because None may be the first item in the iterable.
Yes. And, as I said, this is a common case in Python, with a standard idiom (which more-itertools uses) to deal with it.
This seems like the kind of thing that should be explained somewhere in every tutorial (including the official one), but most people end up finding it only by accident, reading some code that uses it and trying to figure out what it does and why. The same way people figure out how useful two-argument iter is, and a couple other things.
I'll second a recommendation to note the existence of two-argument iter() and two-argument next() in the docstring for itertools.first()
I don’t think 2-arg iter belongs anywhere near first, just that it belongs somewhere in itertools tutorials, and maybe the module docs. As for 2-arg next, notice that the existing docs for more_itertools.first cover that by saying “If is marginally shorter than next(iter(iterable), default)”. I think maybe a stdlib version of first should be a bit less dismissive of its own value, but noting thIs relationship is really all you need to teach people 2-arg next.
Though, .first() (or .one()) on an unordered iterable is effectively first(shuffle(iterable)), which *could* raise an annotation exception at compile time.
I’m not sure what you mean by an “annotation exception”. You mean an error from static type checking in mypy or something? I’m not sure why it would be an error, unless you got the annotation wrong. It should be Iterable, and that will work for iterators and sequences and sets and so on just fine.
Also, it’s not really like shuffle, because even most “unordered iterables” in Python, like sets, actually have an order. It’s not guaranteed to be a meaningful one, but it’s not guaranteed to be meaningless either. If you need that (e.g., you’re creating a guessing game where you don’t want the answer to be the same every time anyone runs the game, or for security reasons), you really do need to explicitly randomize. For example, if s = set(range(10,0,-1)), it’s not guaranteed anywhere that next(iter(s)) will be 0, but it is still always 0 in any version of CPython. Worse, whatever next(iter(s)) is, if you call next(iter(s)) again (without mutating s in between), you’ll get the same value from the new Iterator in any version of any Python implementation.
Taking first(unordered_sequence) *is* like shuffle. Sets merely seem to be ordered when the items are integers that hash to said integer:
It’s not a matter of when they seem to have some specific order. It’s that they always do have an order, even if it often isn’t a meaningful one. If you need to actually guarantee not having a meaningful order, you need to ask for that explicitly (whether with shuffle or something else).
Does .first() need to solve for this with type annotations and/or just a friendly docstring reminder?
Solve for what? People should know that sets have no guarantee about the meaningfulness of their order, but the right place to teach that is on sets, not on every function that works with iterables.
Right, but a sequence isn’t just an ordered iterable, it’s also random-access indexable (plus a few other things). An itertools.count(), a typical sorteddict type, a typical linked list, etc. are all ordered but not sequences.
In terms of math, itertools.count() is an infinite ordered sequence (for which there is not implementation of lookup by subscript)
Right, and? Sorted dicts and linked lists are ordered but not sequences despite being generally finite, and even Sized. That isn’t the key distinction that makes a Sequence; it’s being random-access indexable (which, e.g., a linked list usually isn’t, because it can’t do it in constant time).
In terms of Python, the generator returned by itertools.count() is an Iterable (hasattr('__iter__')) that does not implement __getitem__ ( doesn't implement the Mapping abstract type interface ).
Not quite. The Mapping interface is not for everything that’s indexable, it’s for everything that’s subscriptable by keys rather than indexes. Things that are subscriptable by indexes are Sequences, not Mappings. Almost nothing is both. (The fact that Python’s type system can’t distinguish those—e.g., you can use ints as keys and as indexes—is why neither of these can be an implicit structural ABC like Iterable, and instead they need to register types manually.)
https://github.com/python/typeshed/blob/master/stdlib/3/itertools.pyi :
_N = TypeVar('_N', int, float) def count(start: _N = ..., step: _N = ...) -> Iterator[_N]: ... # more general types?
A collections.abc.Ordered type might make sense if Reversible does not imply Ordered. A hasattr('__iter_ordered__') might've made sense.
But what would ordered mean here? Just that there is some ordering? That the ordering is consistent between iterations if nothing is mutated? That it’s consistent even after mutations except for the mutated bits? Something even more strict? If you don’t have any code that needs to switch on any of those distinctions, there’s no need for an ABC.
hasattr('__getitem__') !=> Sequence Sequence => hasattr('__getitem__')
Yes. Mappings also have __getitem__ and they’re not Sequences. And not-quite-Mapping types. And “old-style sequence protocol” types (which can be consistently indexed from 0 up to the smallest int that raises IndexError, but don’t necessarily have __len__, or even __iter__). And so on.
The more-itertools functions that require sequences (and name them seq) usually require indexing or slicing.
That may be a good convention. But in terms of type annotations - https://docs.python.org/3/library/collections.abc.html - [x] Iterable (__iter__) - [x] Collection (__getitem__, __iter__, __len__) - [x] Mapping / MutableMapping (Collection) - [x] Sequence / MutableSequence (Sequence, Reversible, Collection (Iterable)) - [x] Reversible - [ ] Ordered
Does 'Reversible' => (imply) Ordered; which would then be redundant?
Which more-itertools functions require testing for Reversible, or Ordered, but not Sequence? There might be some of the former, but I doubt there are any of the latter. Most take Iterable, the rest take Sequence or Iterator, and I don’t think anything is left out, or had to be crammed into either of those as a hacky workaround or anything. So what are you trying to fix here?
Math definition (setting aside a Number sequence-element type restriction):
Sequence = AllOf(Iterable, Ordered)
So your Ordered implies Sized and Container?
More_itertools convention, AFAIU?:
seq => AllOf(Iterable, Mapping, Ordered) seq => all(hasattr(x) for x in (' __iter__', '__getitem__'))
I think it’s a lot simpler. seq => Sequence. Theremay be a bit of looseness in that some functions can take an old-style half-sequence or various other things, but no more than any other code annotated with Sequence in Python.
How does this apply to .first()?
If I call .first() on an unordered Iterable like a set, I may not get the first item; this can/may/will sometimes fail:
assert first({'a', 'b'}) == 'a'
But 'a' isn’t the first element in the set just because it came first in the display. Consider this: assert first(sortedlist('zyx')) == 'z' Clearly that should fail, because the first item in a sorted list of those letters is x, not z. The fact that you constructed it with z first isn’t relevant; they’re kept in sorted order, and x sorts before z. But surely you wouldn’t say that a sorted list isn’t ordered? Meanwhile, notice that in either case, first(it) always returns the same thing that list(it)[0] would (except for a different exception if it is empty). That’s guaranteed by the way iteration works. In that sense, all iterables are ordered. There are other senses in which that’s not true, but without having a specific sense in mind that you’re trying to distinguish, the word doesn’t help anything.
If there was an Ordered ABC (maybe unnecessarily in addition to Reversible), we could specify:
# collections.abc class OrderedIterable(Iterable, Ordered): pass
So your Ordered doesn’t imply Iterable? What kinds of things are ordered but not Iterable? Again, what are you actually trying to solve this this distinction?
Implicit in a next() call is a hasattr(obj, '__iter__') check
No there isn’t. It’s almost always true, because the only things that normally have __next__ are iterators, and they always have __iter__ as well. But there’s no need to check for that. If you create a type that has __next__ but not __iter__ for some reason, you expect that it can’t be used in a for loop, but why shouldn’t it be usable in a next call? Why would we want to go out of our way to block that when nobody ever does it, and it would be a clear “consenting adults” case if anyone ever did?
; but a user calling .first() may or may not be aware that there is no check that the passed Iterable is ordered. Type annotations could catch that mistake.
Only with some meaningful (and universally meaningful) definition of “ordered”. And I don’t know what definition you have in mind, or even could have in mind, that would alleviate potential confusion.
On Sat, 7 Dec 2019 at 00:43, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Dec 06, 2019 at 09:11:44AM -0400, Juancarlo Añez wrote:
[...]
Sure, but in this case, it isn't a fragment of a larger function, and that's not what it looks like. If it looked like what you wrote, I would understand it. But it doesn't, so I didn't really understand what it was supposed to do, until I read the equivalent version using first/next.
Exactly my point.
Indeed, and I agree with that. But I still don't see what advantage there is to having a `first` builtin which does so little. It's a really thin wrapper around `next` that:
calls iter() on its iterable argument supplies a default and then calls next() with two arguments
I guess my question is asking you to justify adding a builtin rather than just educating people how to use next effectively.
The real problem with next is the fact that it raises StopIteration with no default. That can be useful when you are *implementing* iterators but it is very much not what you want when you are just *using* iterators. That makes next something of a footgun because it's tempting to write something like first = next(iter(iterable)) but if there is no applicable default value that should really be try: first = next(iter(iterable)) except StopIteration: raise ValueError There is a PEP that attempted to solve this problem: PEP 479 -- Change StopIteration handling inside generators https://www.python.org/dev/peps/pep-0479/ However PEP 479 (wrongly IMO) attributed the problem to generators rather than iterators. Consequently the fix does nothing for users of itertools type functions like map etc. The root of the problem it attempted to fix is the fact that bare next raises StopIteration and so is not directly suitable in situations where you just want to get the next/first element of an iterable. So you can have something like this: csvfiles = [ ['header', '1', '2', '3'], [], # <----- file has no header ['header', '4', '5', '6'], # This file is skipped ] def total_csvfile(lines): lines = iter(lines) header = next(lines) # Skip header return sum(int(row) for row in lines) for total in map(total_csvfile, csvfiles): print(total) This prints out the total of the first csv file. Then StopIteration that is emitted from attempting to skip the missing header of the second csvfile. That StopIteration leaks out from map.__iter__ and is "caught" by the enclosing for loop. If you change the end of the script to totals = map(total_csvfile, csvfiles) for total in totals: print(total) for total in totals: print(total) then you will see totals for the files after the empty file showing that it is the for loop that caught the StopIteration. The reason this is particularly pernicious is that it leads to silent action-at-a-distance failure and can be hard to debug. This was considered enough of a problem for PEP 479 to attempt to solve in the case of generators (but not iterators in general).
This is how I would implement the function in Python:
def first(iterable, default=None): return next(iter(iterable), default)
I agree that that doesn't need to be a builtin. However I would advocate for a function like this: def first(iterable, *default): iterator = iter(iterable) if default: (default,) = default return next(iterator, default) else: try: return next(iterator) except StopIteration: raise ValueError('Empty iterable') This has the following behaviour:
first({1, 2, 3}) 1 first(x for x in [1, 2]) 1 first([]) Traceback (most recent call last): ... ValueError: Empty iterable first([], 2) 2
You can use it safely with map e.g. to get the first element of a bunch of iterables: # raises ValueError if any of csvfiles is empty for header in map(first, csvfiles): print(header) With next that would be # silently aborts if any of csvfiles is empty for header in map(lambda e: next(iter(e)), csvfiles): print(header)
But there's a major difference in behaviour depending on your input, and one which is surely going to lead to bugs from people who didn't realise that iterator arguments and iterable arguments will behave differently:
# non-iterator iterable py> obj = [1, 2, 3, 4] py> [first(obj) for __ in range(5)] [1, 1, 1, 1, 1]
# iterator py> obj = iter([1, 2, 3, 4]) py> [first(obj) for __ in range(5)] [1, 2, 3, 4, None]
We could document the difference in behaviour, but it will still bite people and surprise them.
This kind of confusion can come with iterators and iterables all the time. I can see that the name "first" is potentially confusing. Another possible name is "take" which might make more sense in the context of partially consumed iterators. Essentially the idea should just be that this is next for users rather than implementers of iterables. -- Oscar
On Mon, Dec 9, 2019 at 12:48 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Sat, 7 Dec 2019 at 00:43, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Dec 06, 2019 at 09:11:44AM -0400, Juancarlo Añez wrote:
[...]
Sure, but in this case, it isn't a fragment of a larger function, and that's not what it looks like. If it looked like what you wrote, I would understand it. But it doesn't, so I didn't really understand what it was supposed to do, until I read the equivalent version using first/next.
Exactly my point.
Indeed, and I agree with that. But I still don't see what advantage there is to having a `first` builtin which does so little. It's a really thin wrapper around `next` that:
calls iter() on its iterable argument supplies a default and then calls next() with two arguments
I guess my question is asking you to justify adding a builtin rather than just educating people how to use next effectively.
The real problem with next is the fact that it raises StopIteration with no default. That can be useful when you are *implementing* iterators but it is very much not what you want when you are just *using* iterators. That makes next something of a footgun because it's tempting to write something like
first = next(iter(iterable))
but if there is no applicable default value that should really be
try: first = next(iter(iterable)) except StopIteration: raise ValueError
If you're defining a first() function, then this would absolutely be correct. I don't think it's a fundamental problem with next(), since its job is to grab the next value from an iterator, or tell you to stop iterating. (BTW, when you're converting exceptions like this in production code, use "raise ValueError from None" to hide the StopIteration from the traceback.)
There is a PEP that attempted to solve this problem: PEP 479 -- Change StopIteration handling inside generators https://www.python.org/dev/peps/pep-0479/
However PEP 479 (wrongly IMO) attributed the problem to generators rather than iterators. Consequently the fix does nothing for users of itertools type functions like map etc. The root of the problem it attempted to fix is the fact that bare next raises StopIteration and so is not directly suitable in situations where you just want to get the next/first element of an iterable.
Hmm. Actually, I'd say that PEP 479 was correct, but that map() is wrong. If you define map() in the most obvious pure-Python way, it will be a generator: def map(func, *iter): while True: args = [next(it) for it in iter] yield func(*args) (modulo some error handling) Written thus, it would be guarded by PEP 479's conversion of StopIteration. I'd say that a more correct implementation of map would be something like: def map(func, *iter): while True: args = [next(it) for it in iter] try: yield func(*args) except StopIteration: raise RuntimeError("mapped function raised StopIteration")
The reason this is particularly pernicious is that it leads to silent action-at-a-distance failure and can be hard to debug. This was considered enough of a problem for PEP 479 to attempt to solve in the case of generators (but not iterators in general).
Agreed, but the problem isn't iterators or next. The problem is with functions that convert iterators into other iterators, while doing other work along the way; if the *other work* raises StopIteration, it causes problems.
This is how I would implement the function in Python:
def first(iterable, default=None): return next(iter(iterable), default)
I agree that that doesn't need to be a builtin. However I would advocate for a function like this:
def first(iterable, *default): iterator = iter(iterable) if default: (default,) = default return next(iterator, default) else: try: return next(iterator) except StopIteration: raise ValueError('Empty iterable')
Ahh the good ol' bikeshedding. The simpler form guarantees that next() is always given a default, ergo it shouldn't ever leak. If you'd prefer it to raise ValueError, then I reckon don't bother implementing the version that takes a default - just let next() do that job, and implement first() the easy way: def first(iterable): it = iter(iterable) try: return next(it) except StopIteration: raise ValueError("Empty iterable")
# silently aborts if any of csvfiles is empty for header in map(lambda e: next(iter(e)), csvfiles): print(header)
(With files, there's no point calling iter, as it'll return the same thing. So you could write this as map(next, csvfiles).)
This kind of confusion can come with iterators and iterables all the time. I can see that the name "first" is potentially confusing. Another possible name is "take" which might make more sense in the context of partially consumed iterators. Essentially the idea should just be that this is next for users rather than implementers of iterables.
Maybe. If it's imported from itertools, though, there shouldn't be any confusion. Since it's somewhat orthogonal to the discussion of first(), I'm going to spin off a separate thread to look at PEP479ifying some iterator conversion functions. ChrisA
We're not changing next(). It's too fundamental to change even subtly. We might add itertools.first(), but not builtins.first(). This kind of functionality is not fundamental but it's easy to get slightly wrong (witness many hasty attempts in these threads). itertools.first() should be implemented in C, but its semantics should be given by this (well, let me see if I can get it right): def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default Note the call to iter() -- this ensures it works if the argument is e.g. a collection. Because iter() on an iterator returns itself, the function also works if the argument is alread an iterator (e.g. first(iter("abc"))). On Sun, Dec 8, 2019 at 6:18 AM Chris Angelico <rosuav@gmail.com> wrote:
On Mon, Dec 9, 2019 at 12:48 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Sat, 7 Dec 2019 at 00:43, Steven D'Aprano <steve@pearwood.info>
wrote:
On Fri, Dec 06, 2019 at 09:11:44AM -0400, Juancarlo Añez wrote:
[...]
Sure, but in this case, it isn't a fragment of a larger function,
and
that's not what it looks like. If it looked like what you wrote, I would understand it. But it doesn't, so I didn't really understand what it was supposed to do, until I read the equivalent version using first/next.
Exactly my point.
Indeed, and I agree with that. But I still don't see what advantage there is to having a `first` builtin which does so little. It's a really thin wrapper around `next` that:
calls iter() on its iterable argument supplies a default and then calls next() with two arguments
I guess my question is asking you to justify adding a builtin rather than just educating people how to use next effectively.
The real problem with next is the fact that it raises StopIteration with no default. That can be useful when you are *implementing* iterators but it is very much not what you want when you are just *using* iterators. That makes next something of a footgun because it's tempting to write something like
first = next(iter(iterable))
but if there is no applicable default value that should really be
try: first = next(iter(iterable)) except StopIteration: raise ValueError
If you're defining a first() function, then this would absolutely be correct. I don't think it's a fundamental problem with next(), since its job is to grab the next value from an iterator, or tell you to stop iterating. (BTW, when you're converting exceptions like this in production code, use "raise ValueError from None" to hide the StopIteration from the traceback.)
There is a PEP that attempted to solve this problem: PEP 479 -- Change StopIteration handling inside generators https://www.python.org/dev/peps/pep-0479/
However PEP 479 (wrongly IMO) attributed the problem to generators rather than iterators. Consequently the fix does nothing for users of itertools type functions like map etc. The root of the problem it attempted to fix is the fact that bare next raises StopIteration and so is not directly suitable in situations where you just want to get the next/first element of an iterable.
Hmm. Actually, I'd say that PEP 479 was correct, but that map() is wrong. If you define map() in the most obvious pure-Python way, it will be a generator:
def map(func, *iter): while True: args = [next(it) for it in iter] yield func(*args)
(modulo some error handling)
Written thus, it would be guarded by PEP 479's conversion of StopIteration. I'd say that a more correct implementation of map would be something like:
def map(func, *iter): while True: args = [next(it) for it in iter] try: yield func(*args) except StopIteration: raise RuntimeError("mapped function raised StopIteration")
The reason this is particularly pernicious is that it leads to silent action-at-a-distance failure and can be hard to debug. This was considered enough of a problem for PEP 479 to attempt to solve in the case of generators (but not iterators in general).
Agreed, but the problem isn't iterators or next. The problem is with functions that convert iterators into other iterators, while doing other work along the way; if the *other work* raises StopIteration, it causes problems.
This is how I would implement the function in Python:
def first(iterable, default=None): return next(iter(iterable), default)
I agree that that doesn't need to be a builtin. However I would advocate for a function like this:
def first(iterable, *default): iterator = iter(iterable) if default: (default,) = default return next(iterator, default) else: try: return next(iterator) except StopIteration: raise ValueError('Empty iterable')
Ahh the good ol' bikeshedding. The simpler form guarantees that next() is always given a default, ergo it shouldn't ever leak. If you'd prefer it to raise ValueError, then I reckon don't bother implementing the version that takes a default - just let next() do that job, and implement first() the easy way:
def first(iterable): it = iter(iterable) try: return next(it) except StopIteration: raise ValueError("Empty iterable")
# silently aborts if any of csvfiles is empty for header in map(lambda e: next(iter(e)), csvfiles): print(header)
(With files, there's no point calling iter, as it'll return the same thing. So you could write this as map(next, csvfiles).)
This kind of confusion can come with iterators and iterables all the time. I can see that the name "first" is potentially confusing. Another possible name is "take" which might make more sense in the context of partially consumed iterators. Essentially the idea should just be that this is next for users rather than implementers of iterables.
Maybe. If it's imported from itertools, though, there shouldn't be any confusion.
Since it's somewhat orthogonal to the discussion of first(), I'm going to spin off a separate thread to look at PEP479ifying some iterator conversion functions.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GW3J7S... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 8 Dec 2019, at 19:40, Guido van Rossum <guido@python.org> wrote:
We're not changing next(). It's too fundamental to change even subtly.
We might add itertools.first(), but not builtins.first(). This kind of functionality is not fundamental but it's easy to get slightly wrong (witness many hasty attempts in these threads).
itertools.first() should be implemented in C, but its semantics should be given by this (well, let me see if I can get it right):
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
Why ban the use of keyword argument for the default value?
On Sun, Dec 8, 2019 at 11:03 AM Anders Hovmöller <boxed@killingar.net> wrote:
On 8 Dec 2019, at 19:40, Guido van Rossum <guido@python.org> wrote:
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
Why ban the use of keyword argument for the default value?
You must misremember PEP 570. This use of `/` means that `it` must be passed positional (it's mandatory and it's the "main" argument), but that `default` can be specified positionally or via a keyword. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 8 Dec 2019, at 20:11, Guido van Rossum <guido@python.org> wrote:
On Sun, Dec 8, 2019 at 11:03 AM Anders Hovmöller <boxed@killingar.net> wrote:
On 8 Dec 2019, at 19:40, Guido van Rossum <guido@python.org> wrote:
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
Why ban the use of keyword argument for the default value?
You must misremember PEP 570. This use of `/` means that `it` must be passed positional (it's mandatory and it's the "main" argument), but that `default` can be specified positionally or via a keyword.
Ah. Sorry. My bad.
[Guido]
We're not changing next(). It's too fundamental to change even subtly.
Note that `next()` already accepts two arguments (the second is an optional default in case its iterable argument is exhausted). Like:
next(iter([]), 42) 42
We might add itertools.first(), but not builtins.first(). This kind of functionality is not fundamental but it's easy to get slightly wrong (witness many hasty attempts in these threads).
itertools.first() should be implemented in C, but its semantics should be given by this (well, let me see if I can get it right):
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
Or, more succinctly, the body can be replaced by: return next(iter(it), default) I expect - but don't know - that people who think `first()` is a trivial use of `next()` have that in mind. I'd nevertheless like to see it in `itertools`. But then functional languages have a long tradition of supplying all sort of things trivially spelled in terms of other things, and I believe that tradition is appropriate _in that context_ (when your view of the world builds on layering functions, you don't want to stop to write a common boilerplate function no matter how trivial it is, and neither do functional language readers want to stop to figure out how _you_ named a common boilerplate function). In other words, in that context, the bar for "building it in" consists far more of "will it be used?" than "is it difficult or tricky or long-winded?".
On Sun, Dec 8, 2019 at 2:09 PM Tim Peters <tim.peters@gmail.com> wrote:
[Guido]
We're not changing next(). It's too fundamental to change even subtly.
Note that `next()` already accepts two arguments (the second is an optional default in case its iterable argument is exhausted). Like:
next(iter([]), 42) 42
Which I learned earlier in this thread but still haven't quite internalized. :-)
We might add itertools.first(), but not builtins.first(). This kind of functionality is not fundamental but it's easy to get slightly wrong (witness many hasty attempts in these threads).
itertools.first() should be implemented in C, but its semantics should be given by this (well, let me see if I can get it right):
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
Or, more succinctly, the body can be replaced by:
return next(iter(it), default)
What I said. :-)
I expect - but don't know - that people who think `first()` is a trivial use of `next()` have that in mind. I'd nevertheless like to see it in `itertools`.
I actually think most people who think first() is a trivial use of next() don't care about what happens when the iterator is empty -- either because they *know* it won't happen (maybe they checked for len(a) == 1 before) or because they *assume* it won't happen (or perhaps because if it's empty their code is already broken or they consider its input invalid and are not yet in the stage of development where they care about input validation).
But then functional languages have a long tradition of supplying all sort of things trivially spelled in terms of other things, and I believe that tradition is appropriate _in that context_ (when your view of the world builds on layering functions, you don't want to stop to write a common boilerplate function no matter how trivial it is, and neither do functional language readers want to stop to figure out how _you_ named a common boilerplate function).
I don't know how functional-minded people think.
In other words, in that context, the bar for "building it in" consists far more of "will it be used?" than "is it difficult or tricky or long-winded?".
But even if you know about 2-arg next(), the next(iter(it), default) version is not quite trivial to come up with -- you have to remember to put the iter() call in -- but IMO the main problem is that not enough people know about 2-arg next(), and that makes it pass the second bar. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Dec 8, 2019, at 20:59, Guido van Rossum <guido@python.org> wrote:
But even if you know about 2-arg next(), the next(iter(it), default) version is not quite trivial to come up with -- you have to remember to put the iter() call in -- but IMO the main problem is that not enough people know about 2-arg next(), and that makes it pass the second bar.
Are people who never find 2-arg next going to find itertools.first? I think most people who use itertools regularly enough to look there are people who already know 2-arg next (and also, mostly people who are “thinking functionally”, for that matter). Other people will discover it if someone points them there on -list or on StackOverflow or as student help or whatever, but they can already discover 2-arg next the same ways. In fact, what they’re probably going to find on StackOverflow is an answer all about 2-arg next, with a little edited-in footnote or comment saying “if you’re using the upcoming 3.9, you can use first instead of next and leave out the call to iter”.
On Sun, Dec 8, 2019 at 9:27 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 8, 2019, at 20:59, Guido van Rossum <guido@python.org> wrote:
But even if you know about 2-arg next(), the next(iter(it), default)
version is not quite trivial to come up with -- you have to remember to put the iter() call in -- but IMO the main problem is that not enough people know about 2-arg next(), and that makes it pass the second bar.
Are people who never find 2-arg next going to find itertools.first?
Nobody is going to write a blog post about 2-arg next() (there just isn't enough for more than a sentence or two) but people write tutorials about itertools all the time, since it's such a rich module. So I think it's likely that first() will get some exposure that way.
I think most people who use itertools regularly enough to look there are people who already know 2-arg next (and also, mostly people who are “thinking functionally”, for that matter). Other people will discover it if someone points them there on -list or on StackOverflow or as student help or whatever, but they can already discover 2-arg next the same ways. In fact, what they’re probably going to find on StackOverflow is an answer all about 2-arg next, with a little edited-in footnote or comment saying “if you’re using the upcoming 3.9, you can use first instead of next and leave out the call to iter”.
If the only argument against adding a new feature is that existing documentation doesn't yet describe it, I'm not too concerned. :-) I do have to admit that I'm probably biased because I didn't recall 2-arg next() myself until it was mentioned in this thread. But I doubt that I'm alone -- I've seen plenty of code written by people (other than me :-) who clearly didn't know about it either. I can't show examples, since what I recall was in a large proprietary code base to which I no longer have access. I did find this, in test_itertools.py no less:
def pairwise(iterable): ... "s -> (s0,s1), (s1,s2), (s2, s3), ..." ... a, b = tee(iterable) ... try: ... next(b) ... except StopIteration: ... pass ... return zip(a, b)
Methinks that could have been three lines shorter using next(b, None). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
[Guido]
... I do have to admit that I'm probably biased because I didn't recall 2-arg next() myself until it was mentioned in this thread.
I knew about it once, but had forgotten all about it too until this thread :-) Which does indeed make the case stronger for adding itertools.first, even if - unlike me - you're not in favor of adopting more of a "will it be used?" standard for itertools (you don't have to know how functional language folks think to realize that's what such people want - just look, e.g., at the extensive lists of simple-to-implement functions in the Python more_itertools and toolz.itertools packages). Here's another: 2-argument `iter()`. I totally forget about that one at least twice every year ;-) BTW, another change I'd make is to break the tradition of coding every itertools function in C. That makes the implementation bar much higher, and the other similar packages (more_itertools, toolz.itertools) don't bother. There's also that pypy has trouble optimizing code using itertools heavily, _because_ it's written in C instead of Python. But I don't mean to hijack this thread. So just +1 from me for itertools.first (and even if it's implemented as a Python 1-liner).
On Sun, Dec 8, 2019 at 10:02 PM Tim Peters <tim.peters@gmail.com> wrote:
BTW, another change I'd make is to break the tradition of coding every itertools function in C. That makes the implementation bar much higher, and the other similar packages (more_itertools, toolz.itertools) don't bother. There's also that pypy has trouble optimizing code using itertools heavily, _because_ it's written in C instead of Python.
This deserves a separate thread. If we're going to gradually add more recipes to itertools, that sounds like a good idea. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Dec 8, 2019, at 22:01, Tim Peters <tim.peters@gmail.com> wrote:
BTW, another change I'd make is to break the tradition of coding every itertools function in C. That makes the implementation bar much higher, and the other similar packages (more_itertools, toolz.itertools) don't bother. There's also that pypy has trouble optimizing code using itertools heavily, _because_ it's written in C instead of Python.
Didn’t PyPy already make the fix years ago of rewriting all of itertools (for both 2.7 and 3.3 or whenever) as “Python builtins” in the underlying namespace? Also, even if I’m remembering wrong, just writing a Python module in front of the C module, with most of the functions still being C-only, wouldn’t help PyPy. You’d still need to port every function to Python (and be aware that the “equivalent code” in the help is usually only a rough equivalent with subtle differences, so you’d have to spot, fix, and write unit tests for all of those), with the C only an optional accelerator, a la PEP 399 (the requirements for C accelerators in newly-added modules). Which is far from impossible, it’s just more work than it seems like anyone’s ever been willing to do each time it comes up (and you’re right, it comes up every time a new potentially useful itertools tool is proposed…). If someone cares about first enough to finally do that, I’m +1 on the proposal instead of 0.
PyPy apart, we wouldn’t have to rewrite everything. It would just be simpler to add new functions written in Python. On Mon, Dec 9, 2019 at 09:29 Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 8, 2019, at 22:01, Tim Peters <tim.peters@gmail.com> wrote:
BTW, another change I'd make is to break the tradition of coding every itertools function in C. That makes the implementation bar much higher, and the other similar packages (more_itertools, toolz.itertools) don't bother. There's also that pypy has trouble optimizing code using itertools heavily, _because_ it's written in C instead of Python.
Didn’t PyPy already make the fix years ago of rewriting all of itertools (for both 2.7 and 3.3 or whenever) as “Python builtins” in the underlying namespace?
Also, even if I’m remembering wrong, just writing a Python module in front of the C module, with most of the functions still being C-only, wouldn’t help PyPy. You’d still need to port every function to Python (and be aware that the “equivalent code” in the help is usually only a rough equivalent with subtle differences, so you’d have to spot, fix, and write unit tests for all of those), with the C only an optional accelerator, a la PEP 399 (the requirements for C accelerators in newly-added modules).
Which is far from impossible, it’s just more work than it seems like anyone’s ever been willing to do each time it comes up (and you’re right, it comes up every time a new potentially useful itertools tool is proposed…). If someone cares about first enough to finally do that, I’m +1 on the proposal instead of 0.
-- --Guido (mobile)
[Andrew Barnert <abarnert@yahoo.com>]
Didn’t PyPy already make the fix years ago of rewriting all of itertools (for both 2.7 and 3.3 or whenever) as “Python builtins” in the underlying namespace?
I don't know.
Also, even if I’m remembering wrong, just writing a Python module in front of the C module, with most of the functions still being C-only, wouldn’t help PyPy.
I wasn't suggesting that. I was suggesting that we drop the tradition of writing _every_ itertools function in C and _only_ C. That would lower the bar for adding new functions. Many of the many functions in the more_itertools and toolz.itertools packages are implemented by brief pure Python functions, often just 1-liners. Works fine for them.
You’d still need to port every function to Python
Why? My comments about pypy were a footnote to the main point: that functional language people don't hesitate to "build in" any number of functions easily implemented in terms of other ones. This started already with LISP, which very quickly, e.g., added (CADR x) for (CAR (CDR x)), (CADDR x) for (CAR (CDR (CDR x))) and so on - then went on to also add additional spellings (FIRST, SECOND, NTH, etc). The point in that context is to have _common_ spelling and endcase behavior for things - no matter how simple - that are reinvented every day otherwise.
(and be aware that the “equivalent code” in the help is usually only a rough equivalent with subtle differences, so you’d have to spot, fix, and write unit tests for all of those), with the C only an optional accelerator, a la PEP 399 (the requirements for C accelerators in newly-added modules).
I'm not at all suggesting to rewrite itertools. I am suggesting that, for most of itertools's natural audience most of the time, an implementation in Python _only_ is "good enough", and that it would best if we recognized that for _new_ itertools functions.
Which is far from impossible, it’s just more work than it seems like anyone’s ever been willing to do each time it comes up (and you’re right, it comes up every time a new potentially useful itertools tool is proposed…). If someone cares about first enough to finally do that, I’m +1 on the proposal instead of 0.
Different itch. I'm a "practicality beats purity" guy ;-)
On Mon, Dec 9, 2019 at 10:03 AM Tim Peters <tim.peters@gmail.com> wrote:
[Andrew Barnert <abarnert@yahoo.com>]
Didn’t PyPy already make the fix years ago of rewriting all of itertools (for both 2.7 and 3.3 or whenever) as “Python builtins” in the underlying namespace?
I don't know.
Also, even if I’m remembering wrong, just writing a Python module in front of the C module, with most of the functions still being C-only, wouldn’t help PyPy.
I wasn't suggesting that. I was suggesting that we drop the tradition of writing _every_ itertools function in C and _only_ C. That would lower the bar for adding new functions. Many of the many functions in the more_itertools and toolz.itertools packages are implemented by brief pure Python functions, often just 1-liners. Works fine for them.
Plus you have to start somewhere. ;) Any new interpreter in the future plus existing ones would benefit if we said that the C versions were just accelerators instead of the sole implementation.
You’d still need to port every function to Python
Why? My comments about pypy were a footnote to the main point: that functional language people don't hesitate to "build in" any number of functions easily implemented in terms of other ones. This started already with LISP, which very quickly, e.g., added (CADR x) for (CAR (CDR x)), (CADDR x) for (CAR (CDR (CDR x))) and so on - then went on to also add additional spellings (FIRST, SECOND, NTH, etc). The point in that context is to have _common_ spelling and endcase behavior for things - no matter how simple - that are reinvented every day otherwise.
I've been learning Clojure and getting reminded of this again where the uniformity of being able to use these common functions across sequences is where the real power lies. From there you end up with abstractions that everyone can build upon and only care about duck typing against something that's an iterable.
(and be aware that the “equivalent code” in the help is usually only a rough equivalent with subtle differences, so you’d have to spot, fix, and write unit tests for all of those), with the C only an optional accelerator, a la PEP 399 (the requirements for C accelerators in newly-added modules).
I'm not at all suggesting to rewrite itertools. I am suggesting that, for most of itertools's natural audience most of the time, an implementation in Python _only_ is "good enough", and that it would best if we recognized that for _new_ itertools functions.
+1, but I wrote https://www.python.org/dev/peps/pep-0399/ so this shouldn't shock anyone. :) -Brett
Which is far from impossible, it’s just more work than it seems like anyone’s ever been willing to do each time it comes up (and you’re right, it comes up every time a new potentially useful itertools tool is proposed…). If someone cares about first enough to finally do that, I’m +1 on the proposal instead of 0.
Different itch. I'm a "practicality beats purity" guy ;-) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KBKUC2... Code of Conduct: http://python.org/psf/codeofconduct/
On 9 Dec 2019, at 18:31, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Dec 8, 2019, at 22:01, Tim Peters <tim.peters@gmail.com> wrote:
BTW, another change I'd make is to break the tradition of coding every itertools function in C. That makes the implementation bar much higher, and the other similar packages (more_itertools, toolz.itertools) don't bother. There's also that pypy has trouble optimizing code using itertools heavily, _because_ it's written in C instead of Python.
Didn’t PyPy already make the fix years ago of rewriting all of itertools (for both 2.7 and 3.3 or whenever) as “Python builtins” in the underlying namespace?
Also, even if I’m remembering wrong, just writing a Python module in front of the C module, with most of the functions still being C-only, wouldn’t help PyPy. You’d still need to port every function to Python (and be aware that the “equivalent code” in the help is usually only a rough equivalent with subtle differences, so you’d have to spot, fix, and write unit tests for all of those), with the C only an optional accelerator, a la PEP 399 (the requirements for C accelerators in newly-added modules).
Which is far from impossible, it’s just more work than it seems like anyone’s ever been willing to do each time it comes up (and you’re right, it comes up every time a new potentially useful itertools tool is proposed…). If someone cares about first enough to finally do that, I’m +1 on the proposal instead of 0.
This is a perfect case for mutation testing! Just pip install mutmut; mutmut run. I'm willing to do the test suite if there's a python implementation.
09.12.19 07:41, Guido van Rossum пише:
Nobody is going to write a blog post about 2-arg next() (there just isn't enough for more than a sentence or two) but people write tutorials about itertools all the time, since it's such a rich module. So I think it's likely that first() will get some exposure that way.
Would not adding a recipe in the itertools documentation or the tutorial help?
On Mon, Dec 9, 2019 at 3:29 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
09.12.19 07:41, Guido van Rossum пише:
Nobody is going to write a blog post about 2-arg next() (there just isn't enough for more than a sentence or two) but people write tutorials about itertools all the time, since it's such a rich module. So I think it's likely that first() will get some exposure that way.
Would not adding a recipe in the itertools documentation or the tutorial help?
Not nearly as much as just adding the function. We'll be doing humanity a favor if we just add it. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Serhiy Storchaka wrote:
Would not adding a recipe in the itertools documentation or the tutorial help?
I think adding a recipe in the itertools documentation might help, but I don't know that it would be a great fit for the tutorial. It seems a bit too specific and could be a distraction from the main purpose of the tutorial, which is to get a basic understanding of the fundamentals of Python. On Mon, Dec 9, 2019 at 6:29 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
09.12.19 07:41, Guido van Rossum пише:
Nobody is going to write a blog post about 2-arg next() (there just isn't enough for more than a sentence or two) but people write tutorials about itertools all the time, since it's such a rich module. So I think it's likely that first() will get some exposure that way.
Would not adding a recipe in the itertools documentation or the tutorial help? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IZWPWG... Code of Conduct: http://python.org/psf/codeofconduct/
On Dec 8, 2019, at 21:41, Guido van Rossum <guido@python.org> wrote:
On Sun, Dec 8, 2019 at 9:27 PM Andrew Barnert <abarnert@yahoo.com> wrote: On Dec 8, 2019, at 20:59, Guido van Rossum <guido@python.org> wrote:
But even if you know about 2-arg next(), the next(iter(it), default) version is not quite trivial to come up with -- you have to remember to put the iter() call in -- but IMO the main problem is that not enough people know about 2-arg next(), and that makes it pass the second bar.
Are people who never find 2-arg next going to find itertools.first?
Nobody is going to write a blog post about 2-arg next() (there just isn't enough for more than a sentence or two)
You’d be surprised. I wrote the (at least at the time) StackOverflow answer to the canonical duplicate for all related questions. It was _originally_ two sentences showing how and why to use 2-arg next (plus an explanation for why the OP’s attempt to write peek wasn’t necessary and why it didn’t work but check out more_itertools.peekable), but not by the time the bike shedding was over (it has to mention all the performance details, including comparing to for: break, and mention “functional style” but also explain that iterators only pretend to be monads when they’re actually exposed state, etc., or someone isn’t happy…). But, more importantly:
but people write tutorials about itertools all the time, since it's such a rich module. So I think it's likely that first() will get some exposure that way.
But many such tutorials are already using 2-arg next (although some don’t explain it, because they expect you to have read their previous tutorial on Iterator basics and they explained it there). Sure, they’ll mostly grow an extra paragraph on how first is handy for those cases where you have an iterable that might be an iterator but also might be a list or a set or anything else, which will make it a little more discoverable or memorable. I just don’t know that it’s going to have as much benefit as you’re hoping.
I think most people who use itertools regularly enough to look there are people who already know 2-arg next (and also, mostly people who are “thinking functionally”, for that matter). Other people will discover it if someone points them there on -list or on StackOverflow or as student help or whatever, but they can already discover 2-arg next the same ways. In fact, what they’re probably going to find on StackOverflow is an answer all about 2-arg next, with a little edited-in footnote or comment saying “if you’re using the upcoming 3.9, you can use first instead of next and leave out the call to iter”.
If the only argument against adding a new feature is that existing documentation doesn't yet describe it, I'm not too concerned. :-)
It’s not that the existing documentation doesn’t yet describe the new way, it’s that the existing documentation already describes the existing way and apparently nobody’s finding it. Which casts doubt on how many people will find the new way.
I do have to admit that I'm probably biased because I didn't recall 2-arg next() myself until it was mentioned in this thread.
That’s because you learned Python before 2.6, when there was no 2-arg next (because next was a method). That being said, a lot of old 2.x code—and tutorials even—that’s been ported didn’t take advantage of the new feature. In fact:
But I doubt that I'm alone -- I've seen plenty of code written by people (other than me :-) who clearly didn't know about it either. I can't show examples, since what I recall was in a large proprietary code base to which I no longer have access. I did find this, in test_itertools.py no less:
def pairwise(iterable): ... "s -> (s0,s1), (s1,s2), (s2, s3), ..." ... a, b = tee(iterable) ... try: ... next(b) ... except StopIteration: ... pass ... return zip(a, b)
Methinks that could have been three lines shorter using next(b, None).
The pairwise recipe in the docs does use next(b, None), but obviously someone missed the copy in the unit tests. And searching for tutorials on things like CSV files, I found some that were clearly updated for 3.x but still explicitly checked for StopIteration anyway. If adding first causes some people to re-evaluate old code like that in tutorials and improve a few of them, maybe that’s enough to be worth it on its own. Also, the new docs on first itself will presumably mention 2-arg next, which is one more place for people to learn about that.
One more and then I'll let this go. On Mon, Dec 9, 2019 at 10:49 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 8, 2019, at 21:41, Guido van Rossum <guido@python.org> wrote:
I do have to admit that I'm probably biased because I didn't recall 2-arg next() myself until it was mentioned in this thread.
That’s because you learned Python before 2.6, when there was no 2-arg next (because next was a method).
I know that for me something different was at play, and I suspect it's the same for many others. 1-arg next() is essential in straddling code because .next() was renamed .__next__() in Python 3, so everybody doing any migration work at all quickly learns about it by example. But 2-arg next() is *not* essential and one is much less likely to learn about it from reading other code (except for itertools lovers). Another thing is that 1-arg next() raises StopIteration, and almost every next() caller has to handle that. So again many people see examples of how to do this. (It's telling that we have PEP 479 to "tame" uncaught StopIteration exceptions.) But cases where 2-arg next() can be used instead of try/except are rare. In fact, I found that one case in test_itertools.py by grepping the stdlib for 'except StopIteration'. Almost no code I found that way was amenable to using 2-arg next() -- that one test in test_itertools.py was literally the first example I found that was, after inspecting dozens of occurrences. So while 1-arg next() and the try/except StopIteration pattern are essential and well-known, 2-arg next() is relatively esoteric and consequently (I think) not nearly as well-known. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
So while 1-arg next() and the try/except StopIteration pattern are essential and well-known, 2-arg next() is relatively esoteric and consequently (I think) not nearly as well-known.
And knowing that you must use *iter()* for 2-arg *next()* to (maybe) work right is idiosyncratic. It takes a "Python historian" to understand why it *may be correct* to use: the_first_item_if_ordered = next(iter(container), default='not found') While the semantics of *first()* (whichever the chosen implementation) are straightforward to explain: one_item_if_any = first(return_a_set(), default=-1) or: the_first_item = first(sorted(container)) I agree with others in that the "*default*" argument should be explicit instead of implied. It's how *dict.get()*, and *dict.pop()*, etc., work. The exceptions raised when nothing can be returned from *first()* and there is no *default=* should be the same. -- Juancarlo *Añez*
Juancarlo Añez writes:
the_first_item_if_ordered = next(iter(container), default='not found')
Ouch!
one_item_if_any = first(return_a_set(), default=-1)
Is "first" really the right color for this bikeshed? Maybe it's OK, but speaking precisely you can't ask for "first" of a set. The question is "will 'any one' do?" There may be a natural (pre)order on the objects of a set, so that first_item_if_any = first(sorted(return_a_set(), default=-1)) is the desired result. I'm not sure if this is occurs more than very rarely in practice (but I know variations in iteration order for sets across invocations has bitten me in testing). It would be nasty to debug. Steve
On Mon, Dec 9, 2019 at 6:19 PM Juancarlo Añez <apalala@gmail.com> wrote:
I agree with others in that the "*default*" argument should be explicit instead of implied. It's how *dict.get()*, and *dict.pop()*, etc., work. The exceptions raised when nothing can be returned from *first()* and there is no *default=* should be the same.
Python historian here. dict.get() does *not* require you to specify a default -- in fact its *sole* purpose in life is to not raise an exception. And it has the sensible default default of None. There is already a spelling that does raise: dict[key]. Similarly, there is already a spelling of first() (or close enough) that raises: 1-arg next(). If 1-arg first() would also raise, it would fail the rule "similar things should be spelled similarly, and different things should be spelled differently". I am not making this rule up -- it's an old design principle that Lambert Meertens used for ABC, Python's predecessor. It just didn't make it in the Zen of Python, unless you want to interpret "there should be one [...] way to do it" as its spiritual descendant. IMO "one raises StopIteration and one raises ValueError" is not enough to warrant two different ones, nor is "one calls iter() and the other doesn't." But "one raises and the other doesn't" is a significant enough difference (as the example of dict.get() shows). FWIW I think "first()" is a fine name to get one element from a set -- it's not like the iteration order is a secret, and it's not called "lowest". The other schemes to get one item out of a set in O(1) time will return the same element, since the only sensible way to do it is to iterate and stop after one iteration. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
[Guido]
... Similarly, there is already a spelling of first() (or close enough) that raises: 1-arg next(). If 1-arg first() would also raise, it would fail the rule "similar things should be spelled similarly, and different things should be spelled differently".
The "... unless you're Dutch" part of the Zen may be at work here ;-) Under your semantics for first(), first(iterator) next(iterator) look darned near identical, but act very differently if the iterator is exhausted: the second raises StopIteration, while the former returns None. The alternative proposal is that they both raise an exception if the iterator is exhausted: then they look much the same _and_ act much the same. Likewise under the alternative if a default is explicitly given to both. It's true that dict.get(key) returns None too, but (a) that's a method rather than a function; and, (b) `first` and `next` are very closely related to each other, but not to any dict methods; and, (c) next(iterable) and dict.get(key) already look kinda similar but already did quite different things in their respective end cases ("iterator already exhausted" and "key not present").
I am not making this rule up -- it's an old design principle that Lambert Meertens used for ABC, Python's predecessor. It just didn't make it in the Zen of Python, unless you want to interpret "there should be one [...] way to do it" as its spiritual descendant.
Or unless you want to add it now as the mysterious 20th aphorism that's been waiting to be revealed :-)
IMO "one raises StopIteration and one raises ValueError" is not enough to warrant two different ones,
Two different _whats_?
nor is "one calls iter() and the other doesn't." But "one raises and the other doesn't" is a significant enough difference (as the example of dict.get() shows).
Well, ya: that's a very significant difference, so doesn't the Meertens Principle suggest that "one raises but the other doesn't" should _not_ look identical?
FWIW I think "first()" is a fine name to get one element from a set -- it's not likez . the iteration order is a secret, and it's not called "lowest". The other schemes to get one item out of a set in O(1) time will return the same element, since the only sensible way to do it is to iterate and stop after one iteration.
Wholly agreed there. `first()` may or may not be self-evident at first glance, but the meaning becomes obvious and impossible to forget after the first minor effort to grasp it.
The argument that first(it) and next(it) "look the same" doesn't convince me; if these look the same then all function applications look the same, and that can certainly not have been Meertens' intention. But if everyone thinks that first() should raise, fine, this thread is way too long already (I should have kept it muted :-). On Mon, Dec 9, 2019 at 8:56 PM Tim Peters <tim.peters@gmail.com> wrote:
... Similarly, there is already a spelling of first() (or close enough) that raises: 1-arg next(). If 1-arg first() would also raise, it would fail the rule "similar
[Guido] things should be spelled
similarly, and different things should be spelled differently".
The "... unless you're Dutch" part of the Zen may be at work here ;-) Under your semantics for first(),
first(iterator) next(iterator)
look darned near identical, but act very differently if the iterator is exhausted: the second raises StopIteration, while the former returns None. The alternative proposal is that they both raise an exception if the iterator is exhausted: then they look much the same _and_ act much the same.
Likewise under the alternative if a default is explicitly given to both.
It's true that dict.get(key) returns None too, but (a) that's a method rather than a function; and, (b) `first` and `next` are very closely related to each other, but not to any dict methods; and, (c) next(iterable) and dict.get(key) already look kinda similar but already did quite different things in their respective end cases ("iterator already exhausted" and "key not present").
I am not making this rule up -- it's an old design principle that Lambert Meertens used for ABC, Python's predecessor. It just didn't make it in the Zen of Python, unless you want to interpret "there should be one [...] way to do it" as its spiritual descendant.
Or unless you want to add it now as the mysterious 20th aphorism that's been waiting to be revealed :-)
IMO "one raises StopIteration and one raises ValueError" is not enough to warrant two different ones,
Two different _whats_?
nor is "one calls iter() and the other doesn't." But "one raises and the other doesn't" is a significant enough difference (as the example of dict.get() shows).
Well, ya: that's a very significant difference, so doesn't the Meertens Principle suggest that "one raises but the other doesn't" should _not_ look identical?
FWIW I think "first()" is a fine name to get one element from a set --
it's not likez
. the iteration order is a secret, and it's not called "lowest". The other schemes to get one item out of a set in O(1) time will return the same element, since the only sensible way to do it is to iterate and stop after one iteration.
Wholly agreed there. `first()` may or may not be self-evident at first glance, but the meaning becomes obvious and impossible to forget after the first minor effort to grasp it.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
[Guido]
The argument that first(it) and next(it) "look the same" doesn't convince me;
I'm sorry - I guess then I have absolutely no idea what you were trying to say, and read it completely wrong.
if these look the same then all function applications look the same, and that can certainly not have been Meertens' intention.
No argument on that from me ;-)
But if everyone thinks that first() should raise, fine, this thread is way too long already (I should have kept it muted :-).
It was being discussed. The 1-argument more-itertools `first()` does raise on an exhausted iterator, and that does make most sense to me. In my algorithms I usually "know" I'm not trying to take an element from an empty iterable, and have no use for a default value in such a case. Since there's no non-destructive way to assert that the iterable is not exhausted, raising an exception if it is exhausted is most useful. _empty = object() a = first(iterable, _empty) if a is _empty: raise ... is a PITA by comparison, as is my _current_ idiom: for a in iterable: break else: raise ... Plain old a = first(iterable) would be perfect - but only if it raised.
def next(iterable[, default]): """ Returns: Return the next item from the iterator. Raises: StopIteration: when iterator is empty and default is not specified """ def first(iterable, default=None): """ Returns: default (which defaults to None) if the iterator is empty or the first item of the iterable Raises: StopIteration: when iterator is empty and default is not specified """ If this is the distinction, I'm now -1 on adding .first() at all due to the likelihood that I, being an idiot, will do: iterables = [[None, 2, 3], []] for it in iterables: x = first(it) if x is not None: print(x) And that's not a risk I'm willing to take. IMHO, a StopIteration / ValueError / TooFewItems(ValueError) exception is preferable in most cases. more_itertools.more.first questions the reasoning behind catching StopIteration and raising ValueError instead. _default = object() def first(iterable, default=_default): """ Returns: the first item of iterable or default if iterable raises StopIteration and default is specified. Raises: StopIteration: when iterator is empty and default is not specified ValueError: when iterator is empty and default is not specified TooFewItems: when iterator is empty and default is not specified .. note:: :func:`next` accepts a default value as a second argument. :func:`next` raises StopIteration. """ For a .one(iterable, default=_default) function, it makes sense to have TooFewItems and TooManyItems Exceptions that subclass ValueError because the alternative (calling next again and determining whether it's the first or the second call to next() that's raising StopIteration) is verbose. Does this justify new builtin Exceptions? If so, .first() should raise TooFewItems(ValueError) as well. https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.one If we're looking to add list.get(n, default=None) that would seem to be a different argument. ... s = set('abc') assert s == {'a','b','c'} # This fails approximately 1/3 of the time assert next(iter(s)) == 'a' # This fails approximately 1/3 of the time assert [x for x in s][0] == 'a' # This fails approximately 1/3 of the time assert list(s)[0] == tuple(s)[0] == 'a' # This'll also fail approximately 1/3 of the time; # this is not obvious to most non-core developers. assert first(s) == 'a' Sets are not collections.abc.Sequence because they do not implement __getitem__. Are there other unordered Iterables in the standard library that the .first() docstring could mention after mentioning 2-arg next()? On Tue, Dec 10, 2019 at 12:44 AM Tim Peters <tim.peters@gmail.com> wrote:
[Guido]
The argument that first(it) and next(it) "look the same" doesn't convince me;
I'm sorry - I guess then I have absolutely no idea what you were trying to say, and read it completely wrong.
if these look the same then all function applications look the same, and that can certainly not have been Meertens' intention.
No argument on that from me ;-)
But if everyone thinks that first() should raise, fine, this thread is way too long already (I should have kept it muted :-).
It was being discussed. The 1-argument more-itertools `first()` does raise on an exhausted iterator, and that does make most sense to me. In my algorithms I usually "know" I'm not trying to take an element from an empty iterable, and have no use for a default value in such a case. Since there's no non-destructive way to assert that the iterable is not exhausted, raising an exception if it is exhausted is most useful.
_empty = object() a = first(iterable, _empty) if a is _empty: raise ...
is a PITA by comparison, as is my _current_ idiom:
for a in iterable: break else: raise ...
Is this pattern in the tutorial?
Plain old
a = first(iterable)
would be perfect - but only if it raised.
+1. How is this distinct from: first = next
On Dec 9, 2019, at 22:07, Wes Turner <wes.turner@gmail.com> wrote:
Sets are not collections.abc.Sequence because they do not implement __getitem__. Are there other unordered Iterables in the standard library
Again, it depends on what you’re trying to distinguish by that word “unordered”. Mappings (including dict) and their views, Sets (including frozenset), Iterators (including file objects, map, filter, generators, and most itertools functions), and many other things are Iterables without being Sequences. I have no idea which if any of these you mean by “unordered”. For every single one of them, just like for set, first(it) will give you the same value as list(it)[0] (albeit a different exception if they’re empty), so I don’t see why any of them are confusing. What else would you expect first to do? And, without knowing why you think first should mention any of them, I have no idea which ones it should mention.
On Tue, Dec 10, 2019 at 12:44 AM Tim Peters <tim.peters@gmail.com> wrote:
... as is my _current_ idiom:
for a in iterable: break else: raise ...
Is this pattern in the tutorial?
No, but you will find it on StackOverflow. I don’t think it needs to be in the tutorial. If needing to raise a specific exception—or just not StopIteration—is common enough that novices need to learn it, I think it’s common enough that it should be given a name and put in the itertools recipes or module, or at least pointed out in more-itertools, rather than teaching people to spell it out every time.
Plain old
a = first(iterable)
would be perfect - but only if it raised.
+1. How is this distinct from:
first = next
Because first works with any Iterable; next works only with iterators.
On Tue, Dec 10, 2019 at 1:37 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 9, 2019, at 22:07, Wes Turner <wes.turner@gmail.com> wrote:
Sets are not collections.abc.Sequence because they do not implement __getitem__. Are there other unordered Iterables in the standard library
Again, it depends on what you’re trying to distinguish by that word “unordered”. Mappings (including dict) and their views, Sets (including frozenset), Iterators (including file objects, map, filter, generators, and most itertools functions), and many other things are Iterables without being Sequences. I have no idea which if any of these you mean by “unordered”. For every single one of them, just like for set, first(it) will give you the same value as list(it)[0] (albeit a different exception if they’re empty), so I don’t see why any of them are confusing. What else would you expect first to do? And, without knowing why you think first should mention any of them, I have no idea which ones it should mention.
(naievely) 'first' would seem to imply a stable ordering that doesn't change with PYTHONHASHSEED='random' / -R (the default since 3.3) The .first() docstring could mention that sets, in particular, may have a different ordering for different interpreter invocations (when the elements are integers less then sys.hash_info.modulus).
On Tue, Dec 10, 2019 at 12:44 AM Tim Peters <tim.peters@gmail.com> wrote:
... as is my _current_ idiom:
for a in iterable: break else: raise ...
Is this pattern in the tutorial?
No, but you will find it on StackOverflow. I don’t think it needs to be in the tutorial. If needing to raise a specific exception—or just not StopIteration—is common enough that novices need to learn it, I think it’s common enough that it should be given a name and put in the itertools recipes or module, or at least pointed out in more-itertools, rather than teaching people to spell it out every time.
Plain old
a = first(iterable)
would be perfect - but only if it raised.
+1. How is this distinct from:
first = next
Because first works with any Iterable; next works only with iterators.
def first(iterable, *args): return next(iter(iterable), *args) first.__doc__ = (next.__doc__.replace('next(', 'first(') + """ .. note:: func:`next` takes a second argument as a default value .. warning:: first(set('abc')) will only sometimes return 'a'""")
On Mon, Dec 9, 2019 at 9:43 PM Tim Peters <tim.peters@gmail.com> wrote:
[Guido]
The argument that first(it) and next(it) "look the same" doesn't convince me;
I'm sorry - I guess then I have absolutely no idea what you were trying to say, and read it completely wrong.
if these look the same then all function applications look the same, and that can certainly not have been Meertens' intention.
No argument on that from me ;-)
But if everyone thinks that first() should raise, fine, this thread is way too long already (I should have kept it muted :-).
It was being discussed. The 1-argument more-itertools `first()` does raise on an exhausted iterator, and that does make most sense to me. In my algorithms I usually "know" I'm not trying to take an element from an empty iterable, and have no use for a default value in such a case. Since there's no non-destructive way to assert that the iterable is not exhausted, raising an exception if it is exhausted is most useful.
_empty = object() a = first(iterable, _empty) if a is _empty: raise ...
is a PITA by comparison, as is my _current_ idiom:
for a in iterable: break else: raise ...
Plain old
a = first(iterable)
would be perfect - but only if it raised.
Thinking out loud here... What idiom are we trying to replace with one that's more obviously and whose semantics are easy to grasp? `first(iterable)` that raises is StopIteration is `next(iter(iterable))`. `first(iterable)` that defaults to None and doesn't raise is `next(iter(iterable), None)`. Now only if the raised exception changes do you end up with something like Tim's examples where more than one line is definitely needed. So I think the question is what problem are we trying to solve here? Is it the lack of knowledge of the 2-argument next() form? Or is it that people are regularly wanting the first item from an iterable and when it doesn't exist they want to raise an exception different from StopIteration (and what is that alternative exception)? If it's the former then I think the case could be made that more education of the one-liner might be all that's needed here. Now Guido did the research and showed that the stdlib doesn't seem to realize this form really exists, so it might be quite the education. ;) But if it's the latter then there's a greater savings in complexity from providing first() with those semantics. But once again the question becomes how often does that come up? I obviously have no answers to provide. :) My gut is suggesting that if it's the one-liner replacement it might not be worth it, but if it's to raise a different exception I could see more of a use for adding something to the stdlib.
[Brett Cannon <brett@python.org>]
Thinking out loud here...
What idiom are we trying to replace with one that's more obviously and whose semantics are easy to grasp?
For me, most of the time, it's to have an obvious, uniform way to spell "non-destructively pick an object from a container (set, dict, list, deque, heap, tuple, custom tree class, ...)". I don't even have iterators in mind then, except as an implementation detail. For that reason, raising `StopIteration` if the container is empty grates. The _value_ (the state of the container) I passed is inappropriate then, so more-itertool's ValueError makes more sense to me. The toolz.itertoolz version of `first()` differs. That one is just next(iter(argument)). No default. I like the more-itertools flavor better. As to which idiom it intends to replace, _that's_ the annoyance being addressed: there is no "one obvious way to do it" now. Although for each individual container type, there's sometimes an obvious way to do it for objects of that type (e.g., object[0] for a list or heap, or object.root for a rooted tree class).
`first(iterable)` that raises is StopIteration is `next(iter(iterable))`. `first(iterable)` that defaults to None and doesn't raise is `next(iter(iterable), None)`. Now only if the raised exception changes do you end up with something like Tim's examples where more than one line is definitely needed.
Talking about "needed" is treating this like an axiomatic system where redundancy is in Very Bad Taste. But, to the contrary, in functional languages the _implementers_ think very hard about creating a minimal core, but the user interface supplies everything _useful_ and sometimes doesn't even note whether a thing is part of the minimal core. When I want `first()`, I want `first()`. I don't care how it's implemented, and I couldn't care less that I _could_ write it myself by composing other functions in a brief one-liner.
So I think the question is what problem are we trying to solve here? Is it the lack of knowledge of the 2-argument next() form? Or is it that people are regularly wanting the first item from an iterable and when it doesn't exist they want to raise an exception different from StopIteration (and what is that alternative exception)?
If it's the former then I think the case could be made that more education of the one-liner might be all that's needed here. Now Guido did the research and showed that the stdlib doesn't seem to realize this form really exists, so it might be quite the education. ;)
`first()` definitely isn't _needed_. Without it, people will continue reinventing their own ad hoc methods of getting it done, and they'll succeed.
But if it's the latter then there's a greater savings in complexity from providing first() with those semantics. But once again the question becomes how often does that come up?
Often enough that both relevant packages (more-itertools and toolz.itertoolz) have supplied it for years, although with different endcase behavior. Certainly not often enough to merit being __bulitin__.
I obviously have no answers to provide. :) My gut is suggesting that if it's the one-liner replacement it might not be worth it, but if it's to raise a different exception I could see more of a use for adding something to the stdlib.
As above, `first()` is an atomic concept in my head. It _can_ be synthesized out of more basic concepts, but in the ways I think about getting a problem solved, it's a primitive. As a primitive, passing an empty argument is a ValueError in the absence of also passing an explicit default to return in that case. I can live without it, but that's not really the point ;-)
On Dec 10, 2019, at 13:50, Tim Peters <tim.peters@gmail.com> wrote:
Talking about "needed" is treating this like an axiomatic system where redundancy is in Very Bad Taste. But, to the contrary, in functional languages the _implementers_ think very hard about creating a minimal core, but the user interface supplies everything _useful_ and sometimes doesn't even note whether a thing is part of the minimal core.
There’s a pretty clear internal philosophy to what’s in itertools, and a just as clear but very different one for more-itertools. The former is just the building blocks that you couldn’t build yourself, and you’re supposed to compose them up yourself, with the recipes as a guide; if you really insist on out-of-the-box stuff, here’s a link to more-itertools. Meanwhile, more-itertools is your lispy philosophy: anything that people are likely to want to use as a primitive belongs there, even if it’s dead obvious how to compose it up yourself as a one-liner. I think you’re arguing that the philosophy of itertools is just wrong, at least for the kind of code you usually write with it and the kind of people who usually write that code. Is that fair, or am I misrepresenting you here? Meanwhile, I think Guido and some others accept the itertools philosophy but argue that first needs to be there because many of the kinds of people who need it actually can’t just write it themselves (they don’t know about 2-arg next, or they don’t understand the subtleties of leaking StopIteration, or whatever). That’s a pretty different argument. (Not that there can’t be something to both arguments, of course.)
[Andrew Barnert <abarnert@yahoo.com>]
... I think you’re arguing that the philosophy of itertools is just wrong, at least for the kind of code you usually write with it and the kind of people who usually write that code. Is that fair, or am I misrepresenting you here?
It's fair enough, although rather than "wrong" I'd say more that it's inappropriately applying design principles that work well in most of Python's libraries to an area where they don't. The very fact that half the itertools docs are devoted to "recipes" kinda suggests it knows it's leaving its intended audience hanging ;-) It's hardly coincidence that the more-itertools and itertoolz packages are richer, in very similar ways, than Python's version. The primary itertoolz author does a nice job of explaining that project's heritage starting here: https://toolz.readthedocs.io/en/latest/heritage.html
Meanwhile, I think Guido and some others accept the itertools philosophy
Don't know about Guido, but certainly applies to Raymond. Guido is more a practicality-beats-purity guy. but has no natural attraction to functional languages (Raymond partly does, with his APL background). Guido just sees a function here that would be nice to have. I do like functional languages, and always have, so it seems to fall on me here to advocate for what those weirdos value. In part, no, the itertools namespace is not a precious resource that must be vigilantly minimized ;-)
but argue that first needs to be there because many of the kinds of people who need it actually can’t just write it themselves (they don’t know about 2-arg next, or they don’t understand the subtleties of leaking StopIteration, or whatever). That’s a pretty different argument. (Not that there can’t be something to both arguments, of course.)
And I expect Raymond would say `first()` isn't needed at all - if it has to be addressed, make it recipe #30. There's something to that argument too.
Tim Peters wrote:
It's fair enough, although rather than "wrong" I'd say more that it's inappropriately applying design principles that work well in most of Python's libraries to an area where they don't. The very fact that half the itertools docs are devoted to "recipes" kinda suggests it knows it's leaving its intended audience hanging ;-)
Yeah I'm personally not a huge fan of how large the recipes section of itertools is, compared to the actual module contents. My personal opinion is that the recipes section should be primarily for niche or complex algorithms that wouldn't be general purpose enough to be used for the module, but still have legitimate use cases. If it's general purpose enough and has a legitimate use case, it should be added as a function, not as a recipe. Instead, it seems to have gradually turned into: "Put everything in the recipes section that might be useful and can be implemented using existing tools, even if adding a dedicated function would be more readable. Only add a new function if it doesn't already exist in some other form, no matter how obscure." (oversimplified, but that's been my interpretation). The last time a new function was added to itertools was back in 3.2 (2011), for itertools.accumulate(). I think itertools.first() is a good example of something that would be useful and general-purpose enough to be included as a function in itertools. `first_item = first(iterable)` is much easier to read than `first_item = next(iter(iterable))`, not to mention first(iterable) raising a ValueError instead of StopIteration on an empty iterable, which seems much more clear to me. I also find the default version (where an exception isn't raised) to be a lot easier to read with first, especially if *default* is able to be passed as a keyword argument. Compare `first_item = first(iterable, default=None)` to `first_item = `next(iter(iterable), None)` (next doesn't take any kwargs). If even those who have been involved with Python's development since its infancy aren't overly familiar with 2-arg next(), how could we reasonably expect the average user to be? I'll admit that I had not even heard about the 2-arg next() until reading the posts in this thread. My method of extracting the first value from an iterable (when I wanted it to default to None instead of raising StopIteration) was: ```py try: first_item = next(iter(iterable)) except StopIteration: first_item = None ``` On Tue, Dec 10, 2019 at 8:15 PM Tim Peters <tim.peters@gmail.com> wrote:
[Andrew Barnert <abarnert@yahoo.com>]
... I think you’re arguing that the philosophy of itertools is just wrong, at least for the kind of code you usually write with it and the kind of people who usually write that code. Is that fair, or am I misrepresenting you here?
It's fair enough, although rather than "wrong" I'd say more that it's inappropriately applying design principles that work well in most of Python's libraries to an area where they don't. The very fact that half the itertools docs are devoted to "recipes" kinda suggests it knows it's leaving its intended audience hanging ;-)
It's hardly coincidence that the more-itertools and itertoolz packages are richer, in very similar ways, than Python's version. The primary itertoolz author does a nice job of explaining that project's heritage starting here:
https://toolz.readthedocs.io/en/latest/heritage.html
Meanwhile, I think Guido and some others accept the itertools philosophy
Don't know about Guido, but certainly applies to Raymond. Guido is more a practicality-beats-purity guy. but has no natural attraction to functional languages (Raymond partly does, with his APL background). Guido just sees a function here that would be nice to have.
I do like functional languages, and always have, so it seems to fall on me here to advocate for what those weirdos value. In part, no, the itertools namespace is not a precious resource that must be vigilantly minimized ;-)
but argue that first needs to be there because many of the kinds of people who need it actually can’t just write it themselves (they don’t know about 2-arg next, or they don’t understand the subtleties of leaking StopIteration, or whatever). That’s a pretty different argument. (Not that there can’t be something to both arguments, of course.)
And I expect Raymond would say `first()` isn't needed at all - if it has to be addressed, make it recipe #30.
There's something to that argument too. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JPOU2L... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Dec 10, 2019 at 09:47:08PM -0500, Kyle Stanley wrote:
I think itertools.first() is a good example of something that would be useful and general-purpose enough to be included as a function in itertools.
And I think that this discussion demonstrates why it is so hard to decide which recipes are promoted to full functions. Apologies to Raymond in advance if I fail to channel him correctly, but I think his argument will be that this discussion shows why people ought to implement their own `first` if they need it: - should it raise on an empty iterable or return a default value? - if we should raise, what should we raise? - if the caller doesn't provide a default, should it raise or return a default default, like None? - should it work on non-iterators? if yes, the behaviour between iterators and non-iterators is subtly different and a potential bug magnet for anyone who calls `first` twice on the same argument
If even those who have been involved with Python's development since its infancy aren't overly familiar with 2-arg next(), how could we reasonably expect the average user to be?
I think we're reading too much into Guido's momentary lapse of memory or slight lack of knowledge, whichever the case may be. Nobody can be expected to know *everything*, not even Guido, and we're all entitled to miss the odd thing here or there. But having said that, `next` is a builtin, not some obscure corner of some little-used library. Python has a wonderfully effective interactive interpreter where documentation for `next` is one command away: py> help(next) Help on built-in function next in module builtins: next(...) next(iterator[, default]) Return the next item from the iterator. If default is given and the iterator is exhausted, it is returned instead of raising StopIteration. If you google for `next`, the very first result mentions the default right there on the search page, no need to click through: https://duckduckgo.com/?q=python+next (Your mileage may vary when using other search engines.) In addition, the very first post in this thread suggested using the two-argument form of `next` in their implemention. And there's no `first` recipe, which suggests that Raymond thought it was too obvious to bother with. (I'm kinda-sorta in agreement with that, but maybe this thread shows different.) I'm not saying that it is an unforgivable failure for a developer to not know about the 2-arg form of next, that would be ludicrous. But it's not unreasonable to expect developers to use the 2-arg form of `next`. The suggested `first` is just a simple composition of two well-known builtins; saying that nobody can be expected to know the 2-arg form, or be able to plug the pieces together, is unconvincing. I'm really on the fence with this one. Comparing the two: next(iter(obj), default) itertools.first(obj, default) I can easily see myself preferring the first as obvious and explicit and easier than having to import a module. But if I had already imported the module for other functions, I might prefer the second. Same reason I will often just write `x**(1/2)` or `pow(x, 0.5)` rather than `import math; math.sqrt(x)` unless I've already needed the math module for something else. -- Steven
On Dec 10, 2019, at 17:12, Tim Peters <tim.peters@gmail.com> wrote:
[Andrew Barnert <abarnert@yahoo.com>]
... I think you’re arguing that the philosophy of itertools is just wrong, at least for the kind of code you usually write with it and the kind of people who usually write that code. Is that fair, or am I misrepresenting you here?
It's fair enough, although rather than "wrong" I'd say more that it's inappropriately applying design principles that work well in most of Python's libraries to an area where they don't. The very fact that half the itertools docs are devoted to "recipes" kinda suggests it knows it's leaving its intended audience hanging ;-)
I don’t want to put words in Raymond’s mouth, but I don’t think he thought he was leaving his audience hanging. People will build their own tools out of the toolkit and decide what to share among them, and who knows where it’ll go from there? And where it turned out to go from there is two big, competing, still-evolving collections of tools on PyPI, and itertools even links to one of them, which (at least for me) looks like success. But I suspect this is exactly what you mean by “principles that work well in most of Python’s libraries”. Haskell doesn’t make you cabal install a hackage to get first. And likewise for Clojure, F#, Scala, etc. (Hell, when you npm install underscorejs, you don’t have to also go install more_underscore as well to actually use it.) But I think Python is more dependent on its package ecosystem than most of those languages (despite being the one that advertises “batteries included”), and I’m not sure that’s a bad thing here any more than anywhere else. The set of useful things you can build out of the toolkit doesn’t end with the set that come with Clojure.
Meanwhile, I think Guido and some others accept the itertools philosophy
Don't know about Guido, but certainly applies to Raymond.
Well, he probably wouldn’t have written it to that philosophy if he didn’t agree with it. :)
I do like functional languages, and always have, so it seems to fall on me here to advocate for what those weirdos value.
I’m one of those weirdos too. It may be almost an accident that Python turned out to be a great language for lots of functional techniques that other imperative languages didn’t discover until a decade or two later, but it’s a very happy accident.
In part, no, the itertools namespace is not a precious resource that must be vigilantly minimized ;-)
Sure, but it also doesn’t have to be maximized until it reaches parity with Haskell if it works very well (again, at least for me) to let third-party libraries compete to do that. That doesn’t mean it doesn’t ever need to be extended _at all_, but I think the same conservative attitude that works for the rest of Python actually does work fine here. (Although maybe first does meet that conservative standard.)
but argue that first needs to be there because many of the kinds of people who need it actually can’t just write it themselves (they don’t know about 2-arg next, or they don’t understand the subtleties of leaking StopIteration, or whatever). That’s a pretty different argument. (Not that there can’t be something to both arguments, of course.)
And I expect Raymond would say `first()` isn't needed at all - if it has to be addressed, make it recipe #30.
There's something to that argument too.
Yes. I’m not sure I buy it in this case, but I do think it needs to be answered: Why isn’t it sufficient to just add a first recipe (and anyone who follows the link to more-itertools gets it that way)? But I think by this point, we already have at least two answers (yours and Guido’s), it’s just a question of working out whether either one is sufficient for someone to go try to convince Raymond with it.
For the record, more than 80 messages into this thread, I am no longer interested in pursuing this. In any case, this is not the kind of thing Raymond would ever want to see added to itertools. Let's add a recipe for first() to the list of recipes in the itertools docs. If people want to change itertools' philosophy to add essentially *all* the recipes to the module, putting more-itertools and itertoolz out of business, that's a PEP-worthy undertaking and maybe you can get the new Steering Council behind that. But (as is no secret) I'm not one of those functionally-minded people, and I don't use itertools much in my own hobby endeavors, so I won't mind much one way or another. On Tue, Dec 10, 2019 at 9:44 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Dec 10, 2019, at 17:12, Tim Peters <tim.peters@gmail.com> wrote:
[Andrew Barnert <abarnert@yahoo.com>]
... I think you’re arguing that the philosophy of itertools is just wrong,
at least for
the kind of code you usually write with it and the kind of people who usually write that code. Is that fair, or am I misrepresenting you here?
It's fair enough, although rather than "wrong" I'd say more that it's inappropriately applying design principles that work well in most of Python's libraries to an area where they don't. The very fact that half the itertools docs are devoted to "recipes" kinda suggests it knows it's leaving its intended audience hanging ;-)
I don’t want to put words in Raymond’s mouth, but I don’t think he thought he was leaving his audience hanging. People will build their own tools out of the toolkit and decide what to share among them, and who knows where it’ll go from there? And where it turned out to go from there is two big, competing, still-evolving collections of tools on PyPI, and itertools even links to one of them, which (at least for me) looks like success.
But I suspect this is exactly what you mean by “principles that work well in most of Python’s libraries”. Haskell doesn’t make you cabal install a hackage to get first. And likewise for Clojure, F#, Scala, etc. (Hell, when you npm install underscorejs, you don’t have to also go install more_underscore as well to actually use it.)
But I think Python is more dependent on its package ecosystem than most of those languages (despite being the one that advertises “batteries included”), and I’m not sure that’s a bad thing here any more than anywhere else. The set of useful things you can build out of the toolkit doesn’t end with the set that come with Clojure.
Meanwhile, I think Guido and some others accept the itertools philosophy
Don't know about Guido, but certainly applies to Raymond.
Well, he probably wouldn’t have written it to that philosophy if he didn’t agree with it. :)
I do like functional languages, and always have, so it seems to fall on me here to advocate for what those weirdos value.
I’m one of those weirdos too. It may be almost an accident that Python turned out to be a great language for lots of functional techniques that other imperative languages didn’t discover until a decade or two later, but it’s a very happy accident.
In part, no, the itertools namespace is not a precious resource that must be vigilantly minimized ;-)
Sure, but it also doesn’t have to be maximized until it reaches parity with Haskell if it works very well (again, at least for me) to let third-party libraries compete to do that. That doesn’t mean it doesn’t ever need to be extended _at all_, but I think the same conservative attitude that works for the rest of Python actually does work fine here. (Although maybe first does meet that conservative standard.)
but argue that first needs to be there because many of the kinds of people who need it actually can’t just write it themselves (they don’t know about 2-arg next, or they don’t understand the subtleties of leaking StopIteration, or whatever). That’s a pretty different argument. (Not that there can’t be something to both arguments, of course.)
And I expect Raymond would say `first()` isn't needed at all - if it has to be addressed, make it recipe #30.
There's something to that argument too.
Yes. I’m not sure I buy it in this case, but I do think it needs to be answered: Why isn’t it sufficient to just add a first recipe (and anyone who follows the link to more-itertools gets it that way)? But I think by this point, we already have at least two answers (yours and Guido’s), it’s just a question of working out whether either one is sufficient for someone to go try to convince Raymond with it.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZCESLJ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Wed, 11 Dec 2019 at 01:15, Tim Peters <tim.peters@gmail.com> wrote:
It's fair enough, although rather than "wrong" I'd say more that it's inappropriately applying design principles that work well in most of Python's libraries to an area where they don't. The very fact that half the itertools docs are devoted to "recipes" kinda suggests it knows it's leaving its intended audience hanging ;-)
I'm with Tim on pretty much everything he's said in this thread.
I do like functional languages, and always have, so it seems to fall on me here to advocate for what those weirdos value. In part, no, the itertools namespace is not a precious resource that must be vigilantly minimized ;-)
I'm also fond of functional languages. Just because Tim argues so eloquently, doesn't mean he's the only one who feels like this :-) Not having first() won't ruin any of my code, but having it would definitely remove one of the "speed bumps" I occasionally hit when writing scripts. (The same is true of many of the itertools recipes, BTW). Paul
On Tue, Dec 10, 2019 at 03:50:19PM -0600, Tim Peters wrote:
For me, most of the time, it's to have an obvious, uniform way to spell "non-destructively pick an object from a container (set, dict, list, deque, heap, tuple, custom tree class, ...)". I don't even have iterators in mind then, except as an implementation detail.
You can't *non-destructively* pick the first (or next, or any) element of an iterator. Doing so changes the state of the iterator and consumes the element you just retrieved. Given a container, we have: assert first(container) is first(container) but the same doesn't apply to iterators. It sounds to me that what you actually want is an analogue to Mapping.get that applies to all containers and sequences and leaves iterators completely out of it. a = [2, 4, 8, 16] a.get(0) # returns 2 a.get(100) # returns None by default I could completely get behind this idea! The only tricky part is the "index" isn't well-defined for mappings and sets, or tree-like containers. -- Steven
[Tim]
For me, most of the time, it's to have an obvious, uniform way to spell "non-destructively pick an object from a container (set, dict, list, deque, heap, tuple, custom tree class, ...)". I don't even have iterators in mind then, except as an implementation detail.
[Steven]
You can't *non-destructively* pick the first (or next, or any) element of an iterator.
Obviously. That's why I wrote "container", and then gave 7 concrete examples in case that distinction was too subtle ;-) But do note my "most of the time" too. There are also uses for iterators, but for _me_ those are not most common. For others they may be.
... It sounds to me that what you actually want is an analogue to Mapping.get that applies to all containers and sequences and leaves iterators completely out of it.
No, I want `first()`. It solves more than one problem. For iterators I nearly always use plain `next(iterator)`, but there are (for _me_, rare) cases where I'd like to do, e.g., `first(iterator, sentinel)` instead. I'm not at all bothered that for some arguments `first()` mutates state and for others it doesn't, no more than I'm bothered that `for x in iterable:` may or may not "consume" the iterable.
... I could completely get behind this idea! The only tricky part is the "index" isn't well-defined for mappings and sets, or tree-like containers.
While the meaning of `first()` is clear for any iterable argument.
On Tue, Dec 10, 2019 at 07:21:13PM -0600, Tim Peters wrote:
[Tim]
For me, most of the time, it's to have an obvious, uniform way to spell "non-destructively pick an object from a container (set, dict, list, deque, heap, tuple, custom tree class, ...)". I don't even have iterators in mind then, except as an implementation detail.
[Steven]
You can't *non-destructively* pick the first (or next, or any) element of an iterator.
Obviously. That's why I wrote "container", and then gave 7 concrete examples in case that distinction was too subtle ;-)
It wasn't :-) but we're talking about adding a function to **itertools** not "container tools", one which will behave subtly different with containers and iterators. Your use-case ("first item in a container") is not the same as the semantics "next element of an iterator", even if we call the second one "first". [...]
I'm not at all bothered that for some arguments `first()` mutates state and for others it doesn't, no more than I'm bothered that `for x in iterable:` may or may not "consume" the iterable.
*shrug* Okay, but I am. Iterating over an iterable is a very different use-case. [...]
While the meaning of `first()` is clear for any iterable argument.
Sorry Tim, I have to disagree. The meaning of `first` is: return the first element of a sequence or container (in standard iteration order), OR the *next* element of an iterator and I don't think that this is even a little bit clear from the name. -- Steven
On Wed, Dec 11, 2019 at 5:14 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Dec 10, 2019 at 07:21:13PM -0600, Tim Peters wrote:
While the meaning of `first()` is clear for any iterable argument.
Sorry Tim, I have to disagree. The meaning of `first` is:
return the first element of a sequence or container (in standard iteration order), OR the *next* element of an iterator
and I don't think that this is even a little bit clear from the name.
You know that cliche about how today is the first day of the rest of your life? Your life is an iterator. "Next" and "first" are basically synonymous when you can't go backwards. IMO the distinction you describe here isn't actually significant at all - either way, you get the first element of "whatever remains", and the only difference is whether it's nondestructive (with most containers) or destructive (iterators). ChrisA
On Wed, Dec 11, 2019 at 05:20:13PM +1100, Chris Angelico wrote:
On Wed, Dec 11, 2019 at 5:14 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Dec 10, 2019 at 07:21:13PM -0600, Tim Peters wrote:
While the meaning of `first()` is clear for any iterable argument.
Sorry Tim, I have to disagree. The meaning of `first` is:
return the first element of a sequence or container (in standard iteration order), OR the *next* element of an iterator
and I don't think that this is even a little bit clear from the name.
You know that cliche about how today is the first day of the rest of your life?
There's a difference between the first day of *the rest* of your life and the first day of your life. There's a difference between the first item of *the rest* of the iterator and the first item of the iterator. Cliches and platitudes will only take you so far. What counts here is the behaviour of the code. When you iterate over an iterable using a for loop, you get the same sequence of items whether it is an iterator or not. But that's not what happens if you call `first(iterable)` multiple times. Calling it once is fine, but people will call it multiple times. I am writing some code as we speak to process a bunch of lines of text from an iterable. I'm expecting a mandatory header as the first line, an optional line of dashes "-------", and then one or more lines that need processing. If all I remembered is that `first` works on both iterators and non-iterators, I might write something like this: def process(lines): # can't use next header = first(lines, '') line = first(lines, '') if is_dashes(line): line = first(lines, '') while line: do_stuff(line) line = first(lines, '') Seems reasonable, if you read "first" as meaning "first line in the remaining iterator". But as soon as I pass a concrete sequence of lines, rather than an iterator, I'll have an infinite loop. Let me try to anticipate your likely objection: "Well of course it's not going to work, you're calling `first` instead of `next`. When you want to consume items, you should call `next`." Okay, but isn't it your position that the difference between "first" and "next" is a difference that makes no difference? (See quote below.) Obviously I could re-write that function in many ways, but the simplest fix is to call `lines = iter(lines)` at the top of the function. But if I do that, I don't need "first", I can just use `next`, which reads better. What does "first" give me? I know, I can use it to look-ahead into a sequence: head = first(lines) # like lines[0] but works if it's empty if head is None: print("empty") else: do_stuff(lines) Works fine... until I pass an iterator, and then wonder why the first line is skipped. The thing is, we're fooled by the close similarity of iteration over iterators and other iterables (sequences and containers). Destructive iteration and non-destructive iteration is a big difference. Utility functions like the proposed `first` that try to pretend there is no such difference are, I believe, a gotcha waiting to happen.
Your life is an iterator.
Speak for yourself. My life is a box of chocolates.
"Next" and "first" are basically synonymous when you can't go backwards. IMO the distinction you describe here isn't actually significant at all - either way, you get the first element of "whatever remains", and the only difference is whether it's nondestructive (with most containers) or destructive (iterators).
-- Steven
On Wed, Dec 11, 2019 at 7:46 PM Steven D'Aprano <steve@pearwood.info> wrote:
There's a difference between the first day of *the rest* of your life and the first day of your life.
There's a difference between the first item of *the rest* of the iterator and the first item of the iterator.
Cliches and platitudes will only take you so far. What counts here is the behaviour of the code. When you iterate over an iterable using a for loop, you get the same sequence of items whether it is an iterator or not. But that's not what happens if you call `first(iterable)` multiple times. Calling it once is fine, but people will call it multiple times.
An iterator doesn't HAVE anything other than "the rest". By definition, an iterator's contents is the same as the iterator's remaining contents. Do you consider it a fundamental flaw of the "in" operator that, when used with an iterator, it is destructive?
x = [1, 2, 3, 4, 5] 3 in x True 3 in x True x = iter(x) 3 in x True 3 in x False
Does the meaning of "in" change when used on an iterator? Or is this an acceptable consequence of the inherently destructive nature of querying an iterator?
I am writing some code as we speak to process a bunch of lines of text from an iterable. I'm expecting a mandatory header as the first line, an optional line of dashes "-------", and then one or more lines that need processing. If all I remembered is that `first` works on both iterators and non-iterators, I might write something like this:
def process(lines): # can't use next header = first(lines, '') line = first(lines, '') if is_dashes(line): line = first(lines, '') while line: do_stuff(line) line = first(lines, '')
Seems reasonable, if you read "first" as meaning "first line in the remaining iterator". But as soon as I pass a concrete sequence of lines, rather than an iterator, I'll have an infinite loop.
Let me try to anticipate your likely objection:
"Well of course it's not going to work, you're calling `first` instead of `next`. When you want to consume items, you should call `next`."
Okay, but isn't it your position that the difference between "first" and "next" is a difference that makes no difference? (See quote below.)
Obviously I could re-write that function in many ways, but the simplest fix is to call `lines = iter(lines)` at the top of the function.
But if I do that, I don't need "first", I can just use `next`, which reads better. What does "first" give me?
Nothing. It's not the tool for this job. You're trying to shoehorn first() into a job that isn't its job.
I know, I can use it to look-ahead into a sequence:
head = first(lines) # like lines[0] but works if it's empty if head is None: print("empty") else: do_stuff(lines)
Works fine... until I pass an iterator, and then wonder why the first line is skipped.
The thing is, we're fooled by the close similarity of iteration over iterators and other iterables (sequences and containers). Destructive iteration and non-destructive iteration is a big difference. Utility functions like the proposed `first` that try to pretend there is no such difference are, I believe, a gotcha waiting to happen.
And ordered vs unordered is also a big difference. Should first() raise an error with sets because there's no real concept of "the first element"? With a list, first(x) will remain the same value even if you add more to the end of the list, but unrelated mutations to a set might change which element is "first". Does that mean that first() and next() are undefined for sets? No. We just accept that there are these differences. ChrisA
Chris Angelico writes:
And ordered vs unordered is also a big difference. Should first() raise an error with sets because there's no real concept of "the first element"?
Probably not. I would prefer that it not be implemented at all, but if it is implemented, its behavior should respect the intuition of the majority of those who want it, which seems to me to be "a variant of next() that doesn't raise and returns None by default on an empty iterable."
With a list, first(x) will remain the same value even if you add more to the end of the list, but unrelated mutations to a set might change which element is "first".
Worse, running the same program again with the *same* set can change which element is first, I believe. Also, the elements of the set might have a natural (pre)order, which first() won't respect. I'm not sure if this last is a real problem, given that sequences have the same issue (the sequence's order differs from the natural order). However, to me the fact that a set's iteration order is implicit while a sequence's is explicit suggests it might in some contexts.
Does that mean that first() and next() are undefined for sets?
first() is undefined. next() is defined by reference to iterating over the set (that's why I don't have a problem with iterating over a set).
No. We just accept that there are these differences.
Well, if first() is implemented, we'll have to accept it. It's not clear to me that we should.
Does that mean that first() and next() are undefined for sets?
[Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp>]
first() is undefined. next() is defined by reference to iterating over the set (that's why I don't have a problem with iterating over a set).
Every suggestion here so far has satisfied that, if S is a non-empty set, assert next(iter(S)) is first(S) succeeds. That is, `first()` is _defined_ by reference to iteration order. It's "the first" in that order (hence the name).
Tim Peters writes:
Every suggestion here so far has satisfied that, if S is a non-empty set,
assert next(iter(S)) is first(S)
succeeds. That is, `first()` is _defined_ by reference to iteration order. It's "the first" in that order (hence the name).
The problem I'm concerned with is that sometimes users' definitions of words differ from a computer language's definitions of words. That's why I used the word "natural", which doesn't really have much to do with the way a computer language defines things, but frequently features in human thought. Whether that potential difference matters here is an empirical question. Theoretically, I can say "Explicit is better than implicit." I.e., the call to 'iter' tells us exactly what order is being used. That's *my* opinion in this case, and I don't hold *you* to it just because I'm quoting you. (I am amused, though.) Steve
[Tim]
Every suggestion here so far has satisfied that, if S is a non-empty set,
assert next(iter(S)) is first(S)
succeeds. That is, `first()` is _defined_ by reference to iteration order. It's "the first" in that order (hence the name).
[Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp>]
The problem I'm concerned with is that sometimes users' definitions of words differ from a computer language's definitions of words. That's why I used the word "natural", which doesn't really have much to do with the way a computer language defines things, but frequently features in human thought.
Whether that potential difference matters here is an empirical question.
Theoretically, I can say "Explicit is better than implicit." I.e., the call to 'iter' tells us exactly what order is being used.
But hasn't it already been settled by experience? There is nothing unique about `first(set)` implicitly appealing to iteration order. A set passed as the iterable to _any_ itertools function does exactly the same:
import itertools list(itertools.zip_longest(set("abcde"), range(5))) # can vary from run to run [('a', 0), ('e', 1), ('c', 2), ('d', 3), ('b', 4)] list(itertools.compress(set("abcde"), [1]*5)) # but is consistent within a run ['a', 'e', 'c', 'd', 'b'] list(itertools.takewhile(lambda x: True, set("abcde"))) ['a', 'e', 'c', 'd', 'b']
So do, e.g., some builtins:
list(map(ord, set("abcde"))) [97, 101, 99, 100, 98] [ord(ch) for ch in set("abcde")] [97, 101, 99, 100, 98]
That's *my* opinion in this case, and I don't hold *you* to it just because I'm quoting you. (I am amused, though.)
I just don't see the potential for "new" bafflement if itertools.first works exactly the same way as itertools.anything_else _has_ worked, in this respect, all along. Give users some credit. Programming is baffling, period, at first. But to those who persist, nothing becomes truly unbearable ;-)
On Wed, Dec 11, 2019 at 9:48 PM Tim Peters <tim.peters@gmail.com> wrote:
[Tim]
Every suggestion here so far has satisfied that, if S is a non-empty set,
assert next(iter(S)) is first(S)
succeeds. That is, `first()` is _defined_ by reference to iteration order. It's "the first" in that order (hence the name).
[Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp>]
The problem I'm concerned with is that sometimes users' definitions of words differ from a computer language's definitions of words. That's why I used the word "natural", which doesn't really have much to do with the way a computer language defines things, but frequently features in human thought.
Whether that potential difference matters here is an empirical question.
Theoretically, I can say "Explicit is better than implicit." I.e., the call to 'iter' tells us exactly what order is being used.
But hasn't it already been settled by experience? There is nothing unique about `first(set)` implicitly appealing to iteration order. A set passed as the iterable to _any_ itertools function does exactly the same:
import itertools list(itertools.zip_longest(set("abcde"), range(5))) # can vary from run to run [('a', 0), ('e', 1), ('c', 2), ('d', 3), ('b', 4)] list(itertools.compress(set("abcde"), [1]*5)) # but is consistent within a run ['a', 'e', 'c', 'd', 'b'] list(itertools.takewhile(lambda x: True, set("abcde"))) ['a', 'e', 'c', 'd', 'b']
So do, e.g., some builtins:
list(map(ord, set("abcde"))) [97, 101, 99, 100, 98] [ord(ch) for ch in set("abcde")] [97, 101, 99, 100, 98]
That's *my* opinion in this case, and I don't hold *you* to it just because I'm quoting you. (I am amused, though.)
I just don't see the potential for "new" bafflement if itertools.first works exactly the same way as itertools.anything_else _has_ worked, in this respect, all along. Give users some credit. Programming is baffling, period, at first. But to those who persist, nothing becomes truly unbearable ;-)
I'm also sure the docs will say "Returns the first item yielded by the iterable." That last word is a dead give-away on how the choice will be made on any collection, Sequence or not. 😉 (Doubly true if this goes into itertools.) There's also the contingency of users who will think of the question "how would this function think of what "first" is for non-order collections?" will simply think "iterator" and be done with thinking. IOW I understand the desire to have a function that is fully self-explanatory, but I view take() as more ambiguous as it doesn't say where you will take form (e.g. are you going to treat a list as a deque or a stack?), but first() has a clear understanding the instant you think about this operating on iterables (which is as universal of a data structure concept across collection types as we have).
On Dec 12, 2019, at 10:34, Brett Cannon <brett@python.org> wrote:
I'm also sure the docs will say "Returns the first item yielded by the iterable." That last word is a dead give-away on how the choice will be made on any collection, Sequence or not. 😉 (Doubly true if this goes into itertools.)
The docs for more_itertools.first say it’s equivalent to next of iter, which makes the behavior even more obvious to anyone who understands iteration in Python, even if they’ve never used itertools before.
[Brett Cannon]
I'm also sure the docs will say "Returns the first item yielded by the iterable." That last word is a dead give-away on how the choice will be made on any
collection, > Sequence or not. (Doubly true if this goes into itertools.) > > There's also the contingency of users who will think of the question "how would > this function think of what "first" is for non-order collections?" will simply think > "iterator" and be done with thinking. > > IOW I understand the desire to have a function that is fully self-explanatory, > but I view take() as more ambiguous as it doesn't say where you will take > form (e.g. are you going to treat a list as a deque or a stack?), but first() has > a clear understanding the instant you think about this operating on iterables > (which is as universal of a data structure concept across collection types as > we have).
I agree that, for people who understand Python, "with respect to iteration order" is the only meaning "first" _could_ reasonably have. But I wouldn't object much to changing the name to, e.g., "firstiter" or "iterfirst" if people are overly concerned about that. BTW, you can go a long way in Python without knowing anything about `iter()` or `next()`. But not without mastering `for` loops. That's why I prefer to say that, for ordinary cases, a = first(it) has the same effect as: for a in it: break with a note for "advanced" users that this is also the same as next(iter(it)).
On Dec 12, 2019, at 20:51, Tim Peters <tim.peters@gmail.com> wrote:
BTW, you can go a long way in Python without knowing anything about `iter()` or `next()`. But not without mastering `for` loops. That's why I prefer to say that, for ordinary cases,
a = first(it)
has the same effect as:
for a in it: break
I just had to explain to someone today that a loop like this doesn’t just use up and throw away a value from it, but leaves a bound to that first value. Admittedly, this wasn't a novice, but a pretty experienced C++ and Java developer who doesn’t do much Python, who just didn’t know that Python scopes are functions rather than all compound statements. That’s not a confusion a novice would have. But would a novice actually get that it is usefully assigning the first thing in it to a? It seems obvious to me, but I’m not sure it would to most people who don’t know enough Python to know about iter. For that matter, even if they do know this leaves a bound to something useful, do they know what it does with a set? If not, what have we solved here? And do we even really need to solve this problem How many novices are going to be confused about what first does with a set, and need to know the answer? Plus, if first really has to be this completely understandable to people who don’t know about iter, how can we put it in itertools, a module whose docs start off with a nice friendly introduction about the building blocks of an algebra for iterators that you can compose into powerful tools?
[Tim]
BTW, you can go a long way in Python without knowing anything about `iter()` or `next()`. But not without mastering `for` loops. That's why I prefer to say that, for ordinary cases,
a = first(it)
has the same effect as:
for a in it: break
[Andrew Barnert]
I just had to explain to someone today that a loop like this doesn’t just use up and throw away a value from it, but leaves a bound to that first value.
Admittedly, this wasn't a novice, but a pretty experienced C++ and Java developer who doesn’t do much Python, who just didn’t know that Python scopes are functions rather than all compound statements. That’s not a confusion a novice would have.
But would a novice actually get that it is usefully assigning the first thing in it to a? It seems obvious to me, but I’m not sure it would to most people who don’t know enough Python to know about iter.
I couldn't care less whether things are "obvious" at first - unless they're Dutch ;-) Nothing about programming is obvious at first. What I do care about is explanations that "stick" _after_ someone makes the effort to learn them.
For that matter, even if they do know this leaves a bound to something useful, do they know what it does with a set? If not, what have we solved here?
You seriously want to claim that it's A Mystery to people what iterating over a set does? That even a newbie would be baffled by the output of: for a in {1, 2, 3}: print(a) ? They may indeed be baffled by the _order_ in which 1, 2, and 3 are printed, but not in the slightest by that exactly those 3 values _are_ printed.
And do we even really need to solve this problem How many novices are going to be confused about what first does with a set, and need to know the answer?
My original statement was "you can go a long way in Python without knowing anything about `iter()` or `next()`, which goes far beyond novices. I didn't even mention newbies.
Plus, if first really has to be this completely understandable to people who don’t know about iter, how can we put it in itertools, a module whose docs start off with a nice friendly introduction about the building blocks of an algebra for iterators that you can compose into powerful tools?
As above, I didn't really have newbies in mind. I did have the learning curve all Python programmers need to climb in mind. `next()` and `iter()` didn't even exist in earlier versions of Python, yet we somehow managed ;-) They've _become_ basic to explaining how (among other things) `for` is _implemented_ now, but it's easy to play with `for` all by itself to figure out how `for` works. And I can't imagine a Python course that even mentioned `iter()` or `next()` before covering `for`. I also said "next(iter(it))" should also be given as a more succinct explanation for "advanced" users - but _expect_ that the "for" explanation would be more accessible to more users. Why are you so irked at an attempt to give "the simplest explanation that could possibly work"?
On Dec 12, 2019, at 21:52, Tim Peters <tim.peters@gmail.com> wrote:
I couldn't care less whether things are "obvious" at first - unless they're Dutch ;-) Nothing about programming is obvious at first. What I do care about is explanations that "stick" _after_ someone makes the effort to learn them.
Sure, but you can also explain first just fine by saying it returns the first thing in its argument, and that will stick as well. This whole subthread is an attempt to come up with wording that solves a problem that I don’t think exists: Wes suggested everyone will be confused by using first on a set. I don’t think anyone will be. Do you disagree? If not, why do we need to solve that nonexistent confusion?
For that matter, even if they do know this leaves a bound to something useful, do they know what it does with a set? If not, what have we solved here?
You seriously want to claim that it's A Mystery to people what iterating over a set does? That even a newbie would be baffled by the output of:
for a in {1, 2, 3}: print(a)
No, I want to claim almost the exact opposite: that they won’t be baffled by that, and they won’t be baffled by the fact that first on their set returns 2 either. The first time you have to learn that sets iterate in arbitrary order, you have to learn it. But then you know it, and it’s not baffling when a for statement gives you an arbitrary order just like print did, but it’s also not baffling when list(s) gives you an arbitrary order just like print and for did, or when more_itertools.first gives you the first element of an arbitrary order just like everything else does.
Plus, if first really has to be this completely understandable to people who don’t know about iter, how can we put it in itertools, a module whose docs start off with a nice friendly introduction about the building blocks of an algebra for iterators that you can compose into powerful tools?
As above, I didn't really have newbies in mind. I did have the learning curve all Python programmers need to climb in mind. `next()` and `iter()` didn't even exist in earlier versions of Python, yet we somehow managed ;-)
Neither did itertools. And, just as with next and iter, you can go a long way in Python without learning itertools. And by the time you get to looking for functions in itertools to use on a set, you’re not going to be baffled by the fact that a set’s order is just as arbitrary when used with itertools functions as it is with everything else.
Why are you so irked at an attempt to give "the simplest explanation that could possibly work"?
I’m not, but the simplest explanation is just “Return the first item of *iterable*.” Of course you also need to follow up with the details, like “If iterable* is empty, return *default*, or raise ValueError if no *default* is provided.” And add an example, and maybe a sentence on why you might want this function. But the first sentence explains what it does, and it’s hard to get any simpler than that. I’m not all that irked, but it does annoy me a bit that it’s so easy for someone to derail a thread that’s making progress just by raising a spurious problem. People jump to trying to come up with the best solution without asking whether the problem actually needs to be solved.
[Andrew Barnert <abarnert@yahoo.com>]
Sure, but you can also explain first just fine by saying it returns the first thing in its argument, and that will stick as well.
We have different meanings for "fine" in this context. "The first thing in its argument" is English prose, and prone to misunderstanding. I'm absolutely fine with starting the docs that way, but not with letting it end there. It's too sloppy. There are (at least) two simple and rigorously, exhaustively correct ways to explain normal (iterable `it` not empty/exhausted) behavior. That a = first(it) binds `a` to the same object that for a in it: break would bind it to, or that a = next(iter(it)) would bind it to. The first way requires no knowledge of the `next` or `iter` builtins. I happen to use those a lot, because I often implement "general purpose" functions that operate _on_ iterables (along the lines of what most functions in `itertools` do). But I routinely see thousands of lines of Python code that never use them explicitly, not even once. I rarely see even dozens of lines of Python code that don't explicitly use `for`. For that reason I remain of the opinion that the former (`for`) way would be more accessible to more people. But both are useful explanations. Why you're so determined to fight against adding a brief, 100% true, explanation in the docs remains unclear to me.
This whole subthread is an attempt to come up with wording that solves a problem that I don’t think exists:
Writing good docs is always "a problem", in that it's not trivial. Newbies need an intuitive hint, experts may need 100% rigorously true and exhaustive specification.
Wes suggested everyone will be confused by using first on a set. I don’t think anyone will be. Do you disagree?
Again, nothing is obvious at first. I, again, aim at explanations that "stick": hard to forget _after_ they're learned.
If not, why do we need to solve that nonexistent confusion?
Well, why document first() at all? ;-)
... and they won’t be baffled by the fact that first on their set returns 2 either.
If only that were true, Stackoverflow could shut down ;-) ...
As above, I didn't really have newbies in mind. I did have the learning curve all Python programmers need to climb in mind. `next()` and `iter()` didn't even exist in earlier versions of Python, yet we somehow managed ;-)
Neither did itertools.
My point exactly ;-) first() can be - and "should be" - rigorously explained without reference to iter(), next(), or itertools.
And, just as with next and iter, you can go a long way in Python without learning itertools.
Nobody is asking anyone to learn itertools here. That just happens to be the most natural place for `first()` to live that anyone has suggested, and follows the prior art of the itertoolz and more-itertools packages. To _use_ it no more requires learning anything about itertools than, e.g., using fractions.gcd requires learning anything about fractions.Fraction. However, in that case, the fractions module was a weird place to put an advertised general-purpose integer function like gcd, so gcd eventually (3.5) moved to the math module (while it _really_ belongs in a doesn't-exist-yet imath module). Do you have a better suggestion for which namespace `first` should live in? There does, so far, appear to be consensus that it's not compelling enough to merit adding to the builtins.
... And by the time you get to looking for functions in itertools to use on a set, you’re not going to be baffled by the fact that a set’s order is just as arbitrary when used with itertools functions as it is with everything else.
I suppose it's possible someone will stumble into `first` by searching itertools for functions to use on a set, but I think this other way is much more likely: someone has a set, and asks a question (whether on a mailing list, an online forum, or to a colleague): "I want to get an element from the set, but not destructively. I've been doing `a = set.pop(); set.add(a)` but that code smells. Is there a better way?" "Sure! Do `a = itertools.first(set)` More, that way will work for any iterable object, although it will consume the next object from an iterator."
... I’m not all that irked, but it does annoy me a bit that it’s so easy for someone to derail a thread that’s making progress just by raising a spurious problem. People jump to trying to come up with the best solution without asking whether the problem actually needs to be solved.
Writing good docs doesn't happen by accident or magic, and I don't agree "returns the first thing in its argument" is _sufficient_ for "good docs". Thinking about the docs is also essential to making progress.
Tim Peters writes:
But hasn't it already been settled by experience? There is nothing unique about `first(set)` implicitly appealing to iteration order.
I'm not sure. I wouldn't think to use "first" on a set in my own code (and in others' code I'd consider it a code smell in many contexts, such as the example below). I rarely think of the order involved in iteration: I think of "for" as "for each". My focus is on a set of objects and what I do with each, not necessarily limited by the von Neumann architecture. I'm also bothered that "first(x); first(x)" has "first, second" semantics when x is an iterator and "first, first" semantics when x is not an iterator. I'd like to be able to generically recommend column_labels = first(table) for row in table: process(row, column_labels) because it's very elegant, but it's probably a bug if table is a list of tuples rather than an open file object. For me any use of "first" that doesn't throw away the rest of the iterable would be suspect, because changing a sequence to an iterator or the other way around is a frequent refactoring for me. It's still elegant if used carefully: itable = iter(table) column_labels = first(itable) for row in itable: process(row, column_labels) but then "first" is just a slightly inefficient way of spelling next.
I just don't see the potential for "new" bafflement if itertools.first works exactly the same way as itertools.anything_else _has_ worked, in this respect, all along.
Perhaps not. While I don't really trust a vote of the posters to this list on issues of potential for bafflement ;-), the fact that nobody else seems concerned is, uh, "suggestive".
Give users some credit.
I'm happy to do that at 21.6% per annum, but wouldn't it be more Pythonic to cut them some slack and keep them out of debt? Steve
Stephen J. Turnbull wrote:
Worse, running the same program again with the *same* set can change which element is first, I believe.
Yes, this is correct. For example, if you had a minimal Python file named "some_set.py" with only the following: ``` s = set('abcdefg') ``` and then run something like this "import some_set; print(list(some_set.s))" the order of the elements will be randomized each time. Within each individual interpreter, the order will not change though (including if you del and then import some_set again): ``` [aeros:~]$ python -c "import some_set; print(list(some_set.s)); print(list(some_set.s))" ['f', 'a', 'c', 'g', 'b', 'd', 'e'] ['f', 'a', 'c', 'g', 'b', 'd', 'e'] [aeros:~]$ python -c "import some_set; print(list(some_set.s)); print(list(some_set.s))" ['a', 'd', 'f', 'b', 'e', 'g', 'c'] ['a', 'd', 'f', 'b', 'e', 'g', 'c'] [aeros:~]$ python -c "import some_set; print(list(some_set.s)); del some_set; import some_set; print(list(some_set.s))" ['f', 'b', 'g', 'e', 'c', 'a', 'd'] ['f', 'b', 'g', 'e', 'c', 'a', 'd'] [aeros:~]$ python -c "import some_set; print(list(some_set.s)); del some_set; import some_set; print(list(some_set.s))" ['c', 'e', 'a', 'f', 'g', 'b', 'd'] ['c', 'e', 'a', 'f', 'g', 'b', 'd'] ``` IIUC, the "ordering seed" for all sets is randomly determined when the interpreter is first created, and remains the same throughout its lifespan. I'm not 100% certain that it's interpreter dependent, but it seems like it would be. Personally, I don't think this makes `first()` any worse off though. I think it should be expected that attempting to extract the "first" element from an unordered container would not consistently return same one. On Wed, Dec 11, 2019 at 8:25 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Chris Angelico writes:
And ordered vs unordered is also a big difference. Should first() raise an error with sets because there's no real concept of "the first element"?
Probably not. I would prefer that it not be implemented at all, but if it is implemented, its behavior should respect the intuition of the majority of those who want it, which seems to me to be "a variant of next() that doesn't raise and returns None by default on an empty iterable."
With a list, first(x) will remain the same value even if you add more to the end of the list, but unrelated mutations to a set might change which element is "first".
Worse, running the same program again with the *same* set can change which element is first, I believe. Also, the elements of the set might have a natural (pre)order, which first() won't respect. I'm not sure if this last is a real problem, given that sequences have the same issue (the sequence's order differs from the natural order). However, to me the fact that a set's iteration order is implicit while a sequence's is explicit suggests it might in some contexts.
Does that mean that first() and next() are undefined for sets?
first() is undefined. next() is defined by reference to iterating over the set (that's why I don't have a problem with iterating over a set).
No. We just accept that there are these differences.
Well, if first() is implemented, we'll have to accept it. It's not clear to me that we should. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7722U6... Code of Conduct: http://python.org/psf/codeofconduct/
12.12.19 03:22, Stephen J. Turnbull пише:
I would prefer that it not be implemented at all, but if it is implemented, its behavior should respect the intuition of the majority of those who want it, which seems to me to be "a variant of next() that doesn't raise and returns None by default on an empty iterable."
This is not what the majority expects and is not how first() from more-itertools works. A variant of next() that doesn't raise and returns None on an empty iterable is a 2-arguments next().
On Fri, Dec 13, 2019 at 09:24:20AM +0200, Serhiy Storchaka wrote:
12.12.19 03:22, Stephen J. Turnbull пише:
I would prefer that it not be implemented at all, but if it is implemented, its behavior should respect the intuition of the majority of those who want it, which seems to me to be "a variant of next() that doesn't raise and returns None by default on an empty iterable."
This is not what the majority expects and is not how first() from more-itertools works.
I don't know who this majority is, or how you know they are a majority, but in this thread, the OP Juancarlo's implementation behaved as Stephen suggested; Guido's version behaved as Stephen suggested; my version behaved as Stephen suggested. I don't recall who wants a version of first that raises, or when that would be useful. It would greatly limit it's usefulness in expressions, since it needs to be guarded by a try...except. -- Steven
[Steven D'Aprano] ...
I don't recall who wants a version of first that raises,
I do, for one. But I want both (raising and non-raising) behaviors at times, and the more-itertools version supplies both.
or when that would be useful.
There is no practical way to assert than an iterable isn't exhausted. In most of my algorithms, I "know" that first's argument is not empty or exhausted, and it's a logic error if they are. So I want an attempt to retrieve something that doesn't exist to raise. Stuff like: _empty = object() a = first(it, _empty) if a is _empty: raise ValueError(...) is a PITA. I can't generally _assume_, e.g., that None is a magical value in this context.
It would greatly limit it's usefulness in expressions, since it needs to be guarded by a try...except.
The non-raising version is obtained by explicitly passing a default to return in case the iterable is empty/exhausted. [from a different reply]
py> {1:'a'}.get(999) is None True
As Guido said before he tuned out, it's the _sole purpose_ of dict.get(key) not to raise, so there's no reason to require an explicit default in that context. If you want a spelling that does raise, fine,,, dict[key] - or dict.__getitem__(key) is what you want. The more-itertools first() works not like dict.get(), but dict.pop(): raise an exception in the endcase by default, but that can be overridden by explicitly passing a value to return in that case. And for the same reason: raising and non-raising versions are both desirable,
13.12.19 12:45, Steven D'Aprano пише:
On Fri, Dec 13, 2019 at 09:24:20AM +0200, Serhiy Storchaka wrote:
12.12.19 03:22, Stephen J. Turnbull пише:
I would prefer that it not be implemented at all, but if it is implemented, its behavior should respect the intuition of the majority of those who want it, which seems to me to be "a variant of next() that doesn't raise and returns None by default on an empty iterable."
This is not what the majority expects and is not how first() from more-itertools works.
I don't know who this majority is, or how you know they are a majority, but in this thread, the OP Juancarlo's implementation behaved as Stephen suggested; Guido's version behaved as Stephen suggested; my version behaved as Stephen suggested.
I don't recall who wants a version of first that raises, or when that would be useful. It would greatly limit it's usefulness in expressions, since it needs to be guarded by a try...except.
Yes, Guido's version does not raise, as well as the OP's version. But all others discuss a version which is equivalent to next(iter(iterable)), but raises a ValueError instead of StopIteration. It makes more sense, because such version is less trivial and its use is less errorprone. You can silence an exception by passing the default argument, but getting an exception if first() does not raise is more complex. If all this time you discussed a version that does not raise, sorry, I missed this.
On 11/12/19 9:45 pm, Steven D'Aprano wrote:
But that's not what happens if you call `first(iterable)` multiple times. Calling it once is fine, but people will call it multiple times.
Would it help if it were called "one" instead of "first"? -- Greg
On Thu, Dec 12, 2019 at 12:53:26AM +1300, Greg Ewing wrote:
On 11/12/19 9:45 pm, Steven D'Aprano wrote:
But that's not what happens if you call `first(iterable)` multiple times. Calling it once is fine, but people will call it multiple times.
Would it help if it were called "one" instead of "first"?
Only in the sense that if I saw a function called "one", I would have absolutely no idea whatsoever what it did, so I would be forced to read the docs, which hopefully would explicitly document the two different kinds of behaviour. -- Steven
On Dec 11, 2019, at 03:57, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 11/12/19 9:45 pm, Steven D'Aprano wrote:
But that's not what happens if you call `first(iterable)` multiple times. Calling it once is fine, but people will call it multiple times.
Would it help if it were called "one" instead of "first"?
I’d expect one to be “like first, but raise if there are two or more elements”, because that’s what it means in a number of functional languages and database libraries, and more-itertools.
Greg Ewing writes:
On 11/12/19 9:45 pm, Steven D'Aprano wrote:
But that's not what happens if you call `first(iterable)` multiple times. Calling it once is fine, but people will call it multiple times.
Would it help if it were called "one" instead of "first"?
That would be my preference.
On Thu, Dec 12, 2019 at 10:22:33AM +0900, "Stephen J. Turnbull" <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Greg Ewing writes:
On 11/12/19 9:45 pm, Steven D'Aprano wrote:
But that's not what happens if you call `first(iterable)` multiple times. Calling it once is fine, but people will call it multiple times.
Would it help if it were called "one" instead of "first"?
That would be my preference.
take_one()? takeone()? take1()? Oleg. -- Oleg Broytman https://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
take_one()? takeone()? take1()?
That could work. The docs can mention that for anything with ordering the result is guaranteed to be the *first* in the order. Do note that the main use case for the function is to consume only the first result of an iterable, generators in particular. In that context, *takeone()* sounds a lot like if any random result is fine. The docs can be simpler (less special cases) if it's called *first()* by just explaining that the result is *next(iter(it))* with provisions for non-yielding iterators and default return values. -- Juancarlo *Añez*
11.12.19 10:45, Steven D'Aprano пише:
The thing is, we're fooled by the close similarity of iteration over iterators and other iterables (sequences and containers). Destructive iteration and non-destructive iteration is a big difference. Utility functions like the proposed `first` that try to pretend there is no such difference are, I believe, a gotcha waiting to happen.
This is a good argument against first(). I was only -0 before, but now I am closer to strong -1. To use first() correctly you should know that it is barely a combination of iter() and next(), and if you know this, you no longer need first(). Using first() without knowing this is errorprone.
On Thu, Dec 12, 2019 at 11:35 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
11.12.19 10:45, Steven D'Aprano пише:
The thing is, we're fooled by the close similarity of iteration over iterators and other iterables (sequences and containers). Destructive iteration and non-destructive iteration is a big difference. Utility functions like the proposed `first` that try to pretend there is no such difference are, I believe, a gotcha waiting to happen.
This is a good argument against first().
But this is true all across Python -- probably for historical reasons, you can generally use either an iterator or iterable in the same context, and the "destructive" nature will be different. Even for loops, which I'm sure we all agree are going to be used by EVERY python programmer: In [21]: my_list = [3,4,5,6,7,8] In [22]: my_iterator = iter(my_list) In [23]: for i in my_list: ...: print(i) ...: 3 4 5 6 7 8 In [24]: for i in my_list: ...: print(i) ...: 3 4 5 6 7 8 In [25]: for i in my_iterator: ...: print(i) ...: 3 4 5 6 7 8 In [26]: for i in my_iterator: ...: print(i) ...: # Hey what happened to the contents???? My point is that the distinction between an iterable and iterator is potentially going to bite people in most contexts in Python -- there's nothing special about the proposed first() in this regard. The other key thing to remember is that in most contexts, folks are working with iterables, not iterators(directly) anyway, which is why this does not constantly mess up novices. This thread has gotten pretty out of hand (well, not more than many on this list :-) ) -- we don't NEED to get into the whole theory of what itertools is for, functional programming, etc -- there is a simple question on the table: Do we add a "first()" function to the standard library? And if so, where do we put it, How exactly should it work? On that second point, I think we all agree that it does not belong in __builtins__, so it needs to go somewhere, and itertools seems to make the most sense. Not because it is a "building block[s] of an algebra for iterators ", but because no one has suggested another place to put it. I can see telling folks: If you want to get the first item from a container or other iterable, you can use the itertools.first() function to do that. I suppose they *might* get confused when they read about "an algebra for iterators", or even the first line in the module docstring: "Functional tools for creating and using iterators.", but I doubt it -- I suspect they'll only read the docs for the function itself anyway. And it IS a "tool for using iterators" so is it reallyTHAT confusing? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 14/12/19 6:44 am, Christopher Barker wrote:
I think we all agree that it does not belong in __builtins__,
Do we? I'm not convinced. We already have all() and any() in builtins, which are similar in that they operate on iterators or iterables. -- Greg
[Christopher Barker]
I think we all agree that it does not belong in __builtins__,
[Greg Ewing]
Do we?
Nobody yet has argued in favor of it - or even suggested it.
I'm not convinced.
And that remains true even now ;-) The new part here is that yours is the first message to mention it that did _not_ say outright that first() does not belong in the builtins.
We already have all() and any() in builtins, which are similar in that they operate on iterators or iterables.
Also things like map() and zip(), but things like that predate itertools. My view is that first() just isn't likely to be used often enough to merit making it a builtin. all() and any() are. If we had it to do over, I bet zip() would have been assigned to itertools. map() is too close to call, although these days, in new code, I usually see a list comprehension where I used to see map().
On Fri, 13 Dec 2019 at 22:47, Tim Peters <tim.peters@gmail.com> wrote:
[Christopher Barker]
I think we all agree that it does not belong in __builtins__,
[Greg Ewing]
Do we?
Nobody yet has argued in favor of it - or even suggested it.
I'm not convinced.
And that remains true even now ;-) The new part here is that yours is the first message to mention it that did _not_ say outright that first() does not belong in the builtins.
We already have all() and any() in builtins, which are similar in that they operate on iterators or iterables.
Also things like map() and zip(), but things like that predate itertools. My view is that first() just isn't likely to be used often enough to merit making it a builtin. all() and any() are. If we had it to do over, I bet zip() would have been assigned to itertools. map() is too close to call, although these days, in new code, I usually see a list comprehension where I used to see map().
I think that first could get wider usage than next. Outside of implementing abstract iterator tools my experience is that the bulk of situations in which next is used/suggested would be better handled by (the raising version of) first. -- Oscar
Would the builtins import look like this: if not hasattr(__builtins__, 'first'): from more_itertools.more import first Or this: import sys if sys.version_info[:2] < (3,9): from more_itertools.more import first Having the same argspec as more_itertools.more.first would get us a free backport. On Fri, Dec 13, 2019, 6:05 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Fri, 13 Dec 2019 at 22:47, Tim Peters <tim.peters@gmail.com> wrote:
[Christopher Barker]
I think we all agree that it does not belong in __builtins__,
[Greg Ewing]
Do we?
Nobody yet has argued in favor of it - or even suggested it.
I'm not convinced.
And that remains true even now ;-) The new part here is that yours is the first message to mention it that did _not_ say outright that first() does not belong in the builtins.
We already have all() and any() in builtins, which are similar in that they operate on iterators or iterables.
Also things like map() and zip(), but things like that predate itertools. My view is that first() just isn't likely to be used often enough to merit making it a builtin. all() and any() are. If we had it to do over, I bet zip() would have been assigned to itertools. map() is too close to call, although these days, in new code, I usually see a list comprehension where I used to see map().
I think that first could get wider usage than next. Outside of implementing abstract iterator tools my experience is that the bulk of situations in which next is used/suggested would be better handled by (the raising version of) first.
-- Oscar _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YSCTQU... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Dec 13, 2019 at 06:27:41PM -0500, Wes Turner wrote:
Would the builtins import look like this:
if not hasattr(__builtins__, 'first'):
Not that. `__builtins__` is a private CPython implementation detail, so the above is always wrong in user code. Better: import builtins try: builtins.first except AttributeError: ... You don't even need the builtins import: try: first except AttributeError: ... Remember how we say "not every one-liner needs to be a builtin"? Why is this trivial one-line wrapper around next important enough to be a builtin?
import sys if sys.version_info[:2] < (3,9): from more_itertools.more import first
Feature detection is better and more reliable than version checks. -- Steven
On Fri, Dec 13, 2019, at 19:24, Steven D'Aprano wrote:
`__builtins__` is a private CPython implementation detail, so the above is always wrong in user code. Better:
Wait, it is? Is there then no portable way to do the things like: - providing an alternate __builtins__ to evaluated code, with some changed or removed - providing an alternate __builtins__ for the current module, to override things like __import__, __build_class__, etc [that cannot be simply replaced with globals] to change the behavior of import/class statements without having an effect on other modules? (This is very far from the topic of first(), and maybe belongs on python-list, but I was surprised to hear that... I don't want to start a long digression here though.)
I think we all agree that it does not belong in __builtins__,
Do we? I'm not convinced. We already have all() and any() in builtins, which are similar in that they operate on iterators or iterables.
Good point — I was assuming with all the hostility (OK, skepticism) to the idea, that it was a non-starter. But maybe some of that skepticism is due to the association with itertools. If it was just me: I probably would put it in builtins. But not before looking around to see how common a name “first” is in existing code. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
[Steven D'Aprano <steve@pearwood.info>]
It wasn't :-) but we're talking about adding a function to **itertools** not "container tools", one which will behave subtly different with containers and iterators. Your use-case ("first item in a container") is not the same as the semantics "next element of an iterator", even if we call the second one "first".
All itertools functions accept arbitrary iterables, which includes all iterators of all kinds, and includes everything else `iter()` can be applied to (such as most containers). `first()` would be exactly the same as all other itertools functions in this respect. Yes, you _can_ make mistakes because of it. That gets a yawn from me. It's nothing new, and comes with the territory of Python being a multi-paradigm language. ...
While the meaning of `first()` is clear for any iterable argument.
Sorry Tim, I have to disagree. The meaning of `first` is:
return the first element of a sequence or container (in standard iteration order), OR the *next* element of an iterator
As I said, the meaning _is_ clear to you.
and I don't think that this is even a little bit clear from the name.
I didn't claim it was clear from the name alone. As always, what I care about is whether the name is sufficient to remind users of the meaning _after_ they've learned what it means. For a non-empty iterable `it`, a = first(it) is the same as: for a in it: break Once that's grasped, it's essentially impossible to forget what `first()` means (it's simply the first value returned from iterating over `it`).
I think a solution nobody has proposed in this thread is relaxing the next builtin, so it calls iter() on the argument when it doesn't implement the iterator protocol. That would make next([1, 2]) == 1, and also make next([], "missing") == "missing". After that all that is needed is educating the users a bit about the 2-argument form of next (which after this thread, sounds like a good idea by itself anyway). I don't think the performance impact should be big (it would add a check and a method call but only on a path that currently raises a TypeError, which shouldn't be typically in a hot path). Compatibility-wise, it potentially turns TypeError raising code into non-raising code, which might make a difference if someone is catching this and doing something special with it, but IMHO I don't think that's something that should happen a lot (and is the kind of backward incompatible changes that are tolerated between releases. Do you see any drawbacks I missed? Do you think this fails to cover the original problems in any way? On Wed, 11 Dec 2019 at 01:22, Tim Peters <tim.peters@gmail.com> wrote:
[Tim]
For me, most of the time, it's to have an obvious, uniform way to spell "non-destructively pick an object from a container (set, dict, list, deque, heap, tuple, custom tree class, ...)". I don't even have iterators in mind then, except as an implementation detail.
[Steven]
You can't *non-destructively* pick the first (or next, or any) element of an iterator.
Obviously. That's why I wrote "container", and then gave 7 concrete examples in case that distinction was too subtle ;-)
But do note my "most of the time" too. There are also uses for iterators, but for _me_ those are not most common. For others they may be.
... It sounds to me that what you actually want is an analogue to Mapping.get that applies to all containers and sequences and leaves iterators completely out of it.
No, I want `first()`. It solves more than one problem. For iterators I nearly always use plain `next(iterator)`, but there are (for _me_, rare) cases where I'd like to do, e.g., `first(iterator, sentinel)` instead.
I'm not at all bothered that for some arguments `first()` mutates state and for others it doesn't, no more than I'm bothered that `for x in iterable:` may or may not "consume" the iterable.
... I could completely get behind this idea! The only tricky part is the "index" isn't well-defined for mappings and sets, or tree-like containers.
While the meaning of `first()` is clear for any iterable argument. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PRMVKU... Code of Conduct: http://python.org/psf/codeofconduct/
On Dec 11, 2019, at 07:43, Daniel Moisset <dfmoisset@gmail.com> wrote: I think a solution nobody has proposed in this thread is relaxing the next builtin, so it calls iter() on the argument when it doesn't implement the iterator protocol. That would make next([1, 2]) == 1, and also make next([], "missing") == "missing".
while True: yield (next(it), next(it)) I’ve seen this in real-life code, and more complicated versions of the same idea. What does it do? It gives you pairs of elements from it (not overlapping ones like pairwise, but adjacent ones). But with your chain, that’s no longer true. It gives you pairs of elements if it is an Iterator, infinite pairs of the first element twice if it’s a container. That’s no longer a useful function. And consider Steven’s case of processing a file with a header. Even when you don’t need the optional second line of hyphens, it’s still a problem. The standard way to write that is: def process(lines): header = next(lines, '') for line in lines: do_stuff(header, line) And with your change, calling it with a container (say because you called readlines, which you should almost never do but people actually do all the time) means it will fail silently and subtly instead of raising.
Compatibility-wise, it potentially turns TypeError raising code into non-raising code, which might make a difference if someone is catching this and doing something special with it, but IMHO I don't think that's something that should happen a lot
If someone is checking for an Iterator by EAFP’ing next instead of LBYL’ing collections.abc.Iterator, this would break their code. I don’t know how often people do that, but I don’t think code that does that would be wrong or bad; using EAFP instead of LBYL is usually considered a good thing in Python. But also, most of the time, even when you aren’t testing for the error, you’re relying on it. The pairs and process functions above are only correct because of that TypeError. Take it away, and the standard idioms for at least two common things are no longer valid.
On 12/12/19 4:39 am, Daniel Moisset wrote:
I think a solution nobody has proposed in this thread is relaxing the next builtin, so it calls iter() on the argument when it doesn't implement the iterator protocol. Do you think this fails to cover the > original problems in any way?
It would still raise StopIteration if the iterator is empty and no default is specified. Part of the problem being addressed is that this has a tendency to turn obvious bugs into non-obvious ones. -- Greg
On Tue, Dec 10, 2019 at 1:50 PM Tim Peters <tim.peters@gmail.com> wrote:
[Brett Cannon <brett@python.org>]
Thinking out loud here...
What idiom are we trying to replace with one that's more obviously and whose semantics are easy to grasp?
For me, most of the time, it's to have an obvious, uniform way to spell "non-destructively pick an object from a container (set, dict, list, deque, heap, tuple, custom tree class, ...)". I don't even have iterators in mind then, except as an implementation detail. For that reason, raising `StopIteration` if the container is empty grates. The _value_ (the state of the container) I passed is inappropriate then, so more-itertool's ValueError makes more sense to me.
Fair enough. I will fully admit that while I'm currently learning Clojure the fact they have such a uniform approach to containers is enviable, hence why this discussion interests me. :) (Although I was already a functional fan.)
The toolz.itertoolz version of `first()` differs. That one is just next(iter(argument)). No default. I like the more-itertools flavor better.
As to which idiom it intends to replace, _that's_ the annoyance being addressed: there is no "one obvious way to do it" now. Although for each individual container type, there's sometimes an obvious way to do it for objects of that type (e.g., object[0] for a list or heap, or object.root for a rooted tree class).
`first(iterable)` that raises is StopIteration is `next(iter(iterable))`. `first(iterable)` that defaults to None and doesn't raise is `next(iter(iterable), None)`. Now only if the raised exception changes do you end up with something like Tim's examples where more than one line is definitely needed.
Talking about "needed" is treating this like an axiomatic system where redundancy is in Very Bad Taste.
Sorry, "needed" was too strong of a word. It's more about justification for including in the stdlib and deciding to support it for a decade or more versus the answer we give for simple one-liners of "put in your personal toolbox if you don't want to type it out every time".
But, to the contrary, in functional languages the _implementers_ think very hard about creating a minimal core, but the user interface supplies everything _useful_ and sometimes doesn't even note whether a thing is part of the minimal core.
Yep, and the general abstraction to a universally applicable core is very nice.
When I want `first()`, I want `first()`. I don't care how it's implemented, and I couldn't care less that I _could_ write it myself by composing other functions in a brief one-liner.
Sure, but the question I think that this thread and me are proposing are what "first()" means to everyone. I think you and I are on the same page, but it's a question as to whether others are as well. :)
So I think the question is what problem are we trying to solve here? Is it the lack of knowledge of the 2-argument next() form? Or is it that people are regularly wanting the first item from an iterable and when it doesn't exist they want to raise an exception different from StopIteration (and what is that alternative exception)?
If it's the former then I think the case could be made that more education of the one-liner might be all that's needed here. Now Guido did the research and showed that the stdlib doesn't seem to realize this form really exists, so it might be quite the education. ;)
`first()` definitely isn't _needed_. Without it, people will continue reinventing their own ad hoc methods of getting it done, and they'll succeed.
But if it's the latter then there's a greater savings in complexity from providing first() with those semantics. But once again the question becomes how often does that come up?
Often enough that both relevant packages (more-itertools and toolz.itertoolz) have supplied it for years, although with different endcase behavior. Certainly not often enough to merit being __bulitin__.
I agree.
I obviously have no answers to provide. :) My gut is suggesting that if it's the one-liner replacement it might not be worth it, but if it's to raise a different exception I could see more of a use for adding something to the stdlib.
As above, `first()` is an atomic concept in my head. It _can_ be synthesized out of more basic concepts, but in the ways I think about getting a problem solved, it's a primitive. As a primitive, passing an empty argument is a ValueError in the absence of also passing an explicit default to return in that case.
Fair enough.
I can live without it, but that's not really the point ;-)
:)
[]Brett Cannon <brett@python.org>]
... Sorry, "needed" was too strong of a word. It's more about justification for including in the stdlib and deciding to support it for a decade or more versus the answer we give for simple one-liners of "put in your personal toolbox if you don't want to type it out every time".
It's slightly tricky to get right. Good enough for me ;-) Note that I coupled this with leaving it in Python too. `itertools` has been quite an inactive module this decade, and looks like it would have enjoyed few commits at all _except_ that it's coded in C. Commits have been mostly due to C fashion changes, like Use PyXXX_GET_SIZE macros rather than Py_SIZE for concrete types. Renamed Py_SETREF to Py_XSETREF _Py_identifier to _Py_IDENTIFIER and various bug fixes due to lack of 100% perfect type checking (& such) at the C level. The support burden for C code is much higher than for Python code.
... Sure, but the question I think that this thread and me are proposing are what "first()" means to everyone. I think you and I are on the same page, but it's a question as to whether others are as well. :)
Except that we're not jumping in cold here. I've been pushing to adopt _exactly_ what more-itertools has done for years already. Steven appears to think `first()` shouldn't exist at all unless it's restricted to container types and spelled in some other way. I'm not sure anyone else has complained about the intended semantics. People always argue about end cases, but more-itertools gave a choice in a simple way: for an exhausted iterator, raise an exception, which can be suppressed by supplying a default to return instead. Nobody has actually argued that it should always - or never - raise an exception in that case. Guido's _implementation_ happened to never raise an exception, but not even he could be bothered to say a word about why that may be _desirable_. And I snipped the rest because it was so relentlessly agreeable ;-)
On Dec 9, 2019, at 18:23, Juancarlo Añez <apalala@gmail.com> wrote:
So while 1-arg next() and the try/except StopIteration pattern are essential and well-known, 2-arg next() is relatively esoteric and consequently (I think) not nearly as well-known.
And knowing that you must use iter() for 2-arg next() to (maybe) work right is idiosyncratic.
It takes a "Python historian" to understand why it may be correct to use:
the_first_item_if_ordered = next(iter(container), default='not found')
Why “may be correct”? It’s always correct. You can always call iter on any Iterable, you can always call next on the result with a default, so this always works. And you don’t need to be a Python historian to know why it works; it follows directly from the documentation of the two functions and the meaning of Iterable. (You may need to be a Python historian to understand why people often don’t remember this and therefore don’t use it, but that seems like the kind of thing you’d expect to go to a historian for.)
While the semantics of first() (whichever the chosen implementation) are straightforward to explain:
one_item_if_any = first(return_a_set(), default=-1)
or: the_first_item = first(sorted(container))
But they both work exactly as well with next: one_item_if_any = next(iter(return_a_set()), default=-1) That’s exactly what first means, and the doc string for the more_itertools version even directly tells you that it’s just a slightly shorter way to write the same thing. If the argument for first is that it can do things you can’t do otherwise, or that there’s some subtle and complicated case in which next may not work that only historians can understand, the argument is just wrong. The real argument for first is that it’s (hopefully) more discoverable than 2-arg next. (That, and Tim’s argument that we should lower the bar for inclusion in itertools.)
I agree with others in that the "default" argument should be explicit instead of implied. It's how dict.get(), and dict.pop(), etc., work. The exceptions raised when nothing can be returned from first() and there is no default= should be the same.
KeyError? Why? I think the ValueError suggested by many people in this thread (and used in multiple places in more_itertools) makes more sense. Trying to get the first value out of an empty Iterable is a lot like trying to use tuple unpacking on an empty Iterable; it’s not much like trying to look up a key in an empty dict.
On Mon, Dec 09, 2019 at 10:19:43PM -0400, Juancarlo Añez wrote:
And knowing that you must use *iter()* for 2-arg *next()* to (maybe) work right is idiosyncratic.
If you think that knowing to use `iter` before calling `next` is somehow advanced or difficult knowledge, I think that you have an exaggerated idea of the ineptitude of the average Python programmer. Anyone who calls `next` directly on a list or other non-iterator will get a TypeError: TypeError: 'list' object is not an iterator which makes it pretty obvious that the solution is to call `iter` first. I know absolute beginners won't read error messages, but that's a skill that people learn pretty quickly.
It takes a "Python historian" to understand why it *may be correct* to use:
the_first_item_if_ordered = next(iter(container), default='not found')
What do you mean by "may be correct"? Can you give an example of when it isn't correct, assuming `container` is an iterable?
While the semantics of *first()* (whichever the chosen implementation) are straightforward to explain:
one_item_if_any = first(return_a_set(), default=-1)
I don't think it's more straightforward than the `next` version. There are at least two gotchas, or possibly two sides of the same gotcha, one minor and one (in my opinion) major. The first is that, in a sense, the name `first` is misleading: it doesn't return the *first* item from an iterator, since the first item may be long gone; it returns the *next* item of an iterator. If I have `letters = iter("abcde...z")` and have already advanced into the middle of the iterator, a naive user might expect that first(letters) will return "a" rather than whatever the next letter happens to be. But the more serious gotcha is that `first` behaves very differently when called repeatedly on an iterator compared to other iterables.
I agree with others in that the "*default*" argument should be explicit instead of implied. It's how *dict.get()*, and *dict.pop()*, etc., work.
When you say "explicit instead of implied", do you mean that there is no default value for the default? If so, that's not how dict.get works: py> {}.get('some key') is None True -- Steven
On Sun, 8 Dec 2019 at 18:42, Guido van Rossum <guido@python.org> wrote:
We're not changing next(). It's too fundamental to change even subtly.
I don't think that anyone has proposed to change the behaviour of next. I have suggested that if there is to be a new function very similar to next then it can also solve another problem with next which is the case where there should be no default and an empty iterator should raise (something other than StopIteration).
We might add itertools.first(), but not builtins.first(). This kind of functionality is not fundamental but it's easy to get slightly wrong (witness many hasty attempts in these threads).
itertools.first() should be implemented in C, but its semantics should be given by this (well, let me see if I can get it right):
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
This version assumes a default default of None so it can't be used to raise on an empty iterable:
print(first([])) None
-- Oscar
On Sun, Dec 8, 2019 at 2:23 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Sun, 8 Dec 2019 at 18:42, Guido van Rossum <guido@python.org> wrote:
We're not changing next(). It's too fundamental to change even subtly.
I don't think that anyone has proposed to change the behaviour of next. I have suggested that if there is to be a new function very similar to next then it can also solve another problem with next which is the case where there should be no default and an empty iterator should raise (something other than StopIteration).
Isn't that much less common?
We might add itertools.first(), but not builtins.first(). This kind of functionality is not fundamental but it's easy to get slightly wrong (witness many hasty attempts in these threads).
itertools.first() should be implemented in C, but its semantics should be given by this (well, let me see if I can get it right):
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
This version assumes a default default of None so it can't be used to raise on an empty iterable:
print(first([])) None
The whole point of first() would be to make it *not* raise. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Sun, Dec 08, 2019 at 10:37:51AM -0800, Guido van Rossum wrote:
We're not changing next(). It's too fundamental to change even subtly.
We might add itertools.first(), but not builtins.first(). This kind of functionality is not fundamental but it's easy to get slightly wrong (witness many hasty attempts in these threads).
What do you think of my suggestion that we promote the itertools recipe "take" into a function? https://mail.python.org/archives/list/python-ideas@python.org/message/O5RYM6... I don't think that "first N items" is a case of YAGNI, because I absolutely *have* needed it. In the past I've done this: items = list(items) # Pad with None to make sure that there are enough items. items.extend*[(None,)*(count-len(items))] items = iter(items) values = [next(items) for i in range(count)] but using itertools is much better: values = take(count, items, default=None) Using `take` is more reliable than repeated calls to `first`: values = [first(items, default=None) for i in range(count)] due to the difference in behaviour when passed an iterator versus a non-iterator iterable. It's true that repeated calls to `take` will have the same issue, but for the simple use-case of wanting the first N elements, including N=1, `take` avoids needing an explicit loop and so avoids the surprising difference between iterators and other iterables. -- Steven
On 2019-12-09 01:04, Steven D'Aprano wrote:
On Sun, Dec 08, 2019 at 10:37:51AM -0800, Guido van Rossum wrote:
We're not changing next(). It's too fundamental to change even subtly.
We might add itertools.first(), but not builtins.first(). This kind of functionality is not fundamental but it's easy to get slightly wrong (witness many hasty attempts in these threads).
What do you think of my suggestion that we promote the itertools recipe "take" into a function?
https://mail.python.org/archives/list/python-ideas@python.org/message/O5RYM6...
I don't think that "first N items" is a case of YAGNI, because I absolutely *have* needed it. In the past I've done this:
items = list(items) # Pad with None to make sure that there are enough items. items.extend*[(None,)*(count-len(items))] items = iter(items) values = [next(items) for i in range(count)]
but using itertools is much better:
values = take(count, items, default=None)
Using `take` is more reliable than repeated calls to `first`:
values = [first(items, default=None) for i in range(count)]
due to the difference in behaviour when passed an iterator versus a non-iterator iterable.
It's true that repeated calls to `take` will have the same issue, but for the simple use-case of wanting the first N elements, including N=1, `take` avoids needing an explicit loop and so avoids the surprising difference between iterators and other iterables.
Why is the count first? Why not have the (definitely required) items first and let the count have a default of 1?
On Mon, Dec 09, 2019 at 01:44:15AM +0000, MRAB wrote:
values = take(count, items, default=None)
[...]
Why is the count first? Why not have the (definitely required) items first and let the count have a default of 1?
I lifted the bulk of the function, including the signature, from the recipe in the itertools documentation. I suspect the reason the recipe specifies the count first is because that follows the standard order in English: "take two of the eggs" rather than "take eggs two of". -- Steven
[Steven D'Aprano <steve@pearwood.info>] wrote:
values = take(count, items, default=None)
[MRAB]
Why is the count first? Why not have the (definitely required) items first and let the count have a default of 1?
[Steven]
I lifted the bulk of the function, including the signature, from the recipe in the itertools documentation.
I suspect the reason the recipe specifies the count first is because that follows the standard order in English:
"take two of the eggs"
rather than "take eggs two of".
Part of it, but I believe it's more following prior art, like Haskell's take: http://zvon.org/other/haskell/Outputprelude/take_f.html In that language, the case for putting the count first is overwhelming: all functions in Haskell take a single argument, and currying is ubiquitous. Being able, e.g., to write take3 = take 3 to get a function that returns the first 3 elements of whatever that function is applied to is far more useful, e.g., than being able to write take_from_x = take x to get a function such that `take_from_x n` returns the first `n` elements of `x`. The same follows in a weaker way in Python via fans of functools.partial
On Dec 9, 2019, at 17:16, Tim Peters <tim.peters@gmail.com> wrote:
[Steven D'Aprano <steve@pearwood.info>] wrote:
values = take(count, items, default=None)
[MRAB]
Why is the count first? Why not have the (definitely required) items first and let the count have a default of 1?
[Steven]
I lifted the bulk of the function, including the signature, from the recipe in the itertools documentation.
I suspect the reason the recipe specifies the count first is because that follows the standard order in English:
"take two of the eggs"
rather than "take eggs two of".
Part of it, but I believe it's more following prior art, like Haskell's take:
http://zvon.org/other/haskell/Outputprelude/take_f.html
In that language, the case for putting the count first is overwhelming: all functions in Haskell take a single argument, and currying is ubiquitous.
Being able, e.g., to write
take3 = take 3
to get a function that returns the first 3 elements of whatever that function is applied to is far more useful, e.g., than being able to write
take_from_x = take x
to get a function such that `take_from_x n` returns the first `n` elements of `x`.
But also, the decision isn’t as important in Haskell, because you can always flip take and then curry that. So you can just follow the convention everywhere without worrying about whether this is one of the rare cases where the other way around might actually be useful more often.
On Mon, Dec 09, 2019 at 07:12:20PM -0600, Tim Peters wrote:
Part of it, but I believe it's more following prior art, like Haskell's take:
http://zvon.org/other/haskell/Outputprelude/take_f.html
In that language, the case for putting the count first is overwhelming: all functions in Haskell take a single argument, and currying is ubiquitous. [ snip details ]
Oh nice! I should remember the usefulness of currying and functools.partial when designing function signatures. -- Steven
On Sun, Dec 8, 2019 at 5:20 PM Steven D'Aprano <steve@pearwood.info> wrote:
What do you think of my suggestion that we promote the itertools recipe "take" into a function?
https://mail.python.org/archives/list/python-ideas@python.org/message/O5RYM6...
I'll leave it to others to weigh in on that. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
[Steven D'Aprano <steve@pearwood.info>]
What do you think of my suggestion that we promote the itertools recipe "take" into a function?
https://mail.python.org/archives/list/python-ideas@python.org/message/O5RYM6...
That it's independent of whether `first()` should be added. I would _much_ rather write - and read: a = first(iterable, default) than a = take(1, iterable, default)[0] for much the same reasons I'd much rather write and read "2" than "int(10 / 5)" ;-) But, as stated before, I'm not a minimalist when it comes to itertools. In turn, I'd much rather write & read`take(n, iterable, default) than the stuff it takes to plug in "enough" defaults, when needed, without it.
On Mon, Dec 9, 2019 at 10:39 AM Tim Peters <tim.peters@gmail.com> wrote:
[Steven D'Aprano <steve@pearwood.info>]
What do you think of my suggestion that we promote the itertools recipe "take" into a function?
https://mail.python.org/archives/list/python-ideas@python.org/message/O5RYM6...
That it's independent of whether `first()` should be added.
I would _much_ rather write - and read:
a = first(iterable, default)
than
a = take(1, iterable, default)[0]
for much the same reasons I'd much rather write and read "2" than "int(10 / 5)" ;-)
Ditto from me. I bumped up against calling next() recently for the first item and had to rely on the fact that I controlled the code to ignore StopIteration since it was in a small script, but I have needed just the first item so many times before and didn't want any exception to propagate out that I would have loved to have such a function instead of adding to add in a 'try' as well. -Brett
But, as stated before, I'm not a minimalist when it comes to itertools. In turn, I'd much rather write & read`take(n, iterable, default) than the stuff it takes to plug in "enough" defaults, when needed, without it. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QO75UX... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Dec 09, 2019 at 12:35:28PM -0600, Tim Peters wrote:
I would _much_ rather write - and read:
a = first(iterable, default)
than
a = take(1, iterable, default)[0]
for much the same reasons I'd much rather write and read "2" than "int(10 / 5)" ;-)
Fair enough. If you're binding directly to a result, you can avoid the subscripting by using sequence unpacking, which might look nicer: (a,) = take(1, iterable, default) or write your own one-liner helper :-) I'm thinking that, given Raymond's long reluctance to add additional functions to itertools, it might be easier to add `take` since it is a strictly more powerful function than `first`. If we can only get one, I'd go for `take` since it can do everything `first` would do and more. -- Steven
On 9/12/19 7:37 am, Guido van Rossum wrote:
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
Can you provide any insight into why you think it's better for it never to raise an exception, as opposed to raising something other than StopIteration when the iterator is empty and no default is specified? There seem to be two kinds of use case for this: 1. The iterator may or may not be empty, and you don't want the hassle of having to catch an exception. 2. You expect the iterator to never be empty; if it is, then it's a bug, and you would like to get an exception, but not StopIteration because that can mess other things up. Your version of the function seems to be aimed exclusively at case 1. If it were to raise ValueError on an empty iterable unless a default were explicitly given, it would address both cases. -- Greg
[Guido]
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
[Greg Ewing <greg.ewing@canterbury.ac.nz>]
Can you provide any insight into why you think it's better for it never to raise an exception, as opposed to raising something other than StopIteration when the iterator is empty and no default is specified?
Worth pursuing. As Wes Turner's post just reminded me, the more-itertools `first()` raises an exception (ValueError) when the iterable argument is exhausted and a default is not supplied. And I think that's The Rightest Thing, for reasons you explained (repeated here for completeness, with no new comments from me, beyond that #2 is my most common use case):"
There seem to be two kinds of use case for this:
1. The iterator may or may not be empty, and you don't want the hassle of having to catch an exception.
2. You expect the iterator to never be empty; if it is, then it's a bug, and you would like to get an exception, but not StopIteration because that can mess other things up.
Your version of the function seems to be aimed exclusively at case 1. If it were to raise ValueError on an empty iterable unless a default were explicitly given, it would address both cases.
On 12/9/2019 4:54 PM, Greg Ewing wrote:
On 9/12/19 7:37 am, Guido van Rossum wrote:
def first(it, /, default=None): it = iter(it) try: return next(it) except StopIteration: return default
Can you provide any insight into why you think it's better for it never to raise an exception, as opposed to raising something other than StopIteration when the iterator is empty and no default is specified?
There seem to be two kinds of use case for this:
1. The iterator may or may not be empty, and you don't want the hassle of having to catch an exception.
2. You expect the iterator to never be empty; if it is, then it's a bug, and you would like to get an exception, but not StopIteration because that can mess other things up.
Your version of the function seems to be aimed exclusively at case 1. If it were to raise ValueError on an empty iterable unless a default were explicitly given, it would address both cases.
I agree that case 2 is more common for me. It's the same reason I like: s = '' # some code s[0] to raise an exception: I'm expecting something to be there, and it's a programming error if it's not. Eric
On Tue, Dec 10, 2019 at 10:54:10AM +1300, Greg Ewing wrote:
Can you provide any insight into why you think it's better for it never to raise an exception, as opposed to raising something other than StopIteration when the iterator is empty and no default is specified?
Speaking for myself, not Guido, functions which raise are often difficult to use, especially if you can't "Look Before You Leap", since you have to wrap them in a try...except block to use them. Since we don't have an expression form to catch exceptions, you have to lift the possibly-may-fail call out of the expression. E.g. with strings you can write variants of: result = spam() if thestring.find(x) > 2 else eggs() versus something like this: try: idx = thestring.index(x) except IndexError: idx = -1 result = spam() if idx > 2 else eggs() which is much less convenient and much more boilerplatey.
Your version of the function seems to be aimed exclusively at case 1. If it were to raise ValueError on an empty iterable unless a default were explicitly given, it would address both cases.
You can always pass an impossible value and test for it. If your iterables are never None, pass None as the default. Or NotImplemented also makes a good sentinel. If either of those could be a legitimate value, make your own sentinel: NULL = object() result = first(iterable, NULL) if result is NULL: handle_error() -- Steven
On Tue, 10 Dec 2019 at 00:26, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Dec 10, 2019 at 10:54:10AM +1300, Greg Ewing wrote:
Can you provide any insight into why you think it's better for it never to raise an exception, as opposed to raising something other than StopIteration when the iterator is empty and no default is specified?
Speaking for myself, not Guido, functions which raise are often difficult to use, especially if you can't "Look Before You Leap", since you have to wrap them in a try...except block to use them.
If the function allows a default to be supplied instead of raising then you don't need try/except: val = first(obj) # raises on empty val = first(obj, default) # gives default on empty That's how next works and also how first from more-itertools works. -- Oscar
On Sun, Dec 08, 2019 at 01:45:08PM +0000, Oscar Benjamin wrote:
On Sat, 7 Dec 2019 at 00:43, Steven D'Aprano <steve@pearwood.info> wrote:
[...]
But there's a major difference in behaviour depending on your input, and one which is surely going to lead to bugs from people who didn't realise that iterator arguments and iterable arguments will behave differently:
# non-iterator iterable py> obj = [1, 2, 3, 4] py> [first(obj) for __ in range(5)] [1, 1, 1, 1, 1]
# iterator py> obj = iter([1, 2, 3, 4]) py> [first(obj) for __ in range(5)] [1, 2, 3, 4, None]
We could document the difference in behaviour, but it will still bite people and surprise them.
This kind of confusion can come with iterators and iterables all the time.
Do you have some concrete examples of where this is common, because I don't recall seeing anything like this ever. Since next() doesn't accept a non-iterator, this confusion doesn't come up for next. I suppose it could come up with itertools islice: py> s = "abcdefghijklmn" py> [list(itertools.islice(s, 0, 1)) for __ in range(5)] [['a'], ['a'], ['a'], ['a'], ['a']] but I've never seen that in real code, so I wouldn't say it happens all the time, or even a lot of the time. YMMV I guess, but I like to think I have a reasonable grasp of the kinds of gotchas people often trip over, and this isn't one of them. The closest I can think of is the perennial gotcha that you can iterate over a sequence as often as you like, but an iterator only once.
I can see that the name "first" is potentially confusing. Another possible name is "take" which might make more sense in the context of partially consumed iterators.
There's already a take() in the itertools recipes. Rather than add a new "first" itertools function, I'd rather promote `take` out of the recipes and give it an optional default, something like this: def take(n, iterable, /, *default): if default: (default,) = default iterable = chain(iterable, repeat(default)) return list(islice(iterable, n)) "Get the first item" with a default then becomes: a = take(1, iterable, default)[0] "Get the first 5 items" becomes: a, b, c, d, e = take(5, iterable, default) If you want to distinguish the case where the iterable is empty or shorter than you expect, you can pass a known sentinel and check for that, or pass no default at all and test the length of the resulting list. -- Steven
Excuse me, but extraordinarily I agree with D'Aprano :D Usually if you want the first element of an iterable, you have just to do: ``` it = iter(iterable) first = next(it) ``` Yes, `first()` is really sexy... but a simple question: where is the iterator? With the code above, I can continue to use the iterable to access the rest of the iterable. `first()` creates an iterator, uses it and throw it away. What a waste! Greta Thunberg is very angry with you all :D So 200 posts for one line less? I really don't catch the point.
On Thu, Dec 26, 2019 at 7:58 PM Marco Sulla via Python-ideas < python-ideas@python.org> wrote:
So 200 posts for one line less? I really don't catch the point.
You apparently did not read the posts, because the point was whether it raises or returns a default value, not whether it saves one line or ten.
Eric Fahlgren wrote:
You apparently did not read the posts, because the point was whether it raises or returns a default value, not whether it saves one line or ten.
You apparently don't know Python: ``` next(iterator) # raises an exception if no more elements next(iterator, _sentinel) # returns _sentinel if no more elements ``` So, what's the advantage of having `first()`? Furthermore, **how can you be sure this is the __real first element__ of the iterable, if you pass an iterator?** The disadvantage is that you hide the iterator, that could be useful later in the code. KISSes.
Hi folks, moderator here. No need to respond. thanks, --titus On Fri, Dec 27, 2019 at 03:37:14PM -0000, Marco Sulla via Python-ideas wrote:
Eric Fahlgren wrote:
You apparently did not read the posts, because the point was whether it raises or returns a default value, not whether it saves one line or ten.
You apparently don't know Python:
``` next(iterator) # raises an exception if no more elements next(iterator, _sentinel) # returns _sentinel if no more elements ```
So, what's the advantage of having `first()`? Furthermore, **how can you be sure this is the __real first element__ of the iterable, if you pass an iterator?**
The disadvantage is that you hide the iterator, that could be useful later in the code.
KISSes. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VJMO4H... Code of Conduct: http://python.org/psf/codeofconduct/
-- C. Titus Brown, ctbrown@ucdavis.edu
On 12/27/2019 10:37 AM, Marco Sulla via Python-ideas wrote:
Eric Fahlgren wrote:
You apparently did not read the posts, because the point was whether it raises or returns a default value, not whether it saves one line or ten. You apparently don't know Python:
Please be more respectful.
``` next(iterator) # raises an exception if no more elements next(iterator, _sentinel) # returns _sentinel if no more elements ```
So, what's the advantage of having `first()`?
As discussed, it would raise a different exception. One not involved in the normal looping control mechanism.
Furthermore, **how can you be sure this is the __real first element__ of the iterable, if you pass an iterator?**
The disadvantage is that you hide the iterator, that could be useful later in the code.
You could pass in an iterator, and continue to use it. Many examples have shown this. Eric
participants (26)
-
Anders Hovmöller
-
Andrew Barnert
-
Barry Scott
-
Brett Cannon
-
C. Titus Brown
-
Chris Angelico
-
Christopher Barker
-
Daniel Moisset
-
Eric Fahlgren
-
Eric V. Smith
-
Greg Ewing
-
Guido van Rossum
-
Juancarlo Añez
-
Kirill Balunov
-
Kyle Stanley
-
Marco Sulla
-
MRAB
-
Oleg Broytman
-
Oscar Benjamin
-
Paul Moore
-
Random832
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Tim Peters
-
Wes Turner