On Sat, Dec 7, 2019, 11:30 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 7, 2019, at 18:09, Wes Turner <wes.turner@gmail.com> wrote:
On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 7, 2019, at 07:33, Wes Turner <wes.turner@gmail.com> wrote:
+1 for itertools.first(seq, default=Exception) *and* itertools.one(seq,
default=Exception)
What does default=Exception mean? What happens if you pass a different value? Does it do one thing if the argument is a type that’s a subclass of Exception (or of BaseException?) and a different thing if it’s any other value?
That's a good point: Exception is a bad sentinel value. Is None a good default value? What if the genexpr'd iterable is [None, 2, 3]
Here are more_itertools.more.one() and more_itertools.more.first() without docstrings from https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more... :
``` def one(iterable, too_short=None, too_long=None): it = iter(iterable) try: value = next(it) except StopIteration: raise too_short or ValueError('too few items in iterable (expected 1)') try: next(it) except StopIteration: pass else: raise too_long or ValueError('too many items in iterable (expected 1)') return value def first(iterable, default=_marker): try: return next(iter(iterable)) except StopIteration: # I'm on the edge about raising ValueError instead of StopIteration. At # the moment, ValueError wins, because the caller could conceivably # want to do something different with flow control when I raise the # exception, and it's weird to explicitly catch StopIteration. if default is _marker: raise ValueError('first() was called on an empty iterable, and no ' 'default value was provided.') return default ``` I would argue that there could be subclasses of ValueError for .one() that would also be appropriate for .first() (and/or .take(iterable, count=1, default=_default) class TooShortValueError(ValueError): class TooLongValueError(ValueError): (Where, again, SQLAlchemy has NoResultFound and MultipleResultsFound) The names are less important than being able to distinguish the difference between the cases. And then itertools.one() could be interface-compatible with this in more_itertools.more.one() def one(iterable, too_short=TooShortValueError, too_long=TooLongValueError):
That’s a common issue in Python. When you can’t use None as a sentinel because it could be a valid user input or return value, you just create a private module or class attribute that can’t equal anything the user could pass in, like this:
_sentinel = object()
And then:
def spam(stuff, default=_sentinel): if default is _sentinel: do single-argument stuff here else: do default-value stuff here
`None` is not a good default value for .first() (or .one()) because None may be the first item in the iterable. It should be necessary to explicitly specify default=None if that's what's expected.
This seems like the kind of thing that should be explained somewhere in every tutorial (including the official one), but most people end up finding it only by accident, reading some code that uses it and trying to figure out what it does and why. The same way people figure out how useful two-argument iter is, and a couple other things.
I'll second a recommendation to note the existence of two-argument iter() and two-argument next() in the docstring for itertools.first()
Also, “seq” implies that you’re expecting these to be used on sequences,
not general iterables. In that case, why not just use [0]?
I chose `seq` as the argument because I was looking at toolz.itertoolz.first(),which has no default= argument.
Ah. I’m not sure why toolz uses seq for arguments that are usually iterators, but I guess that’s not horrible or anything. In itertools and more-itertools, the argument is usually called iterable, and seq is reserved specifically for the ones that should be sequences, like chunked(iterable, n) vs. sliced(seq, n). But as useful as that convention is, I suppose it’s not a universal thing that everyone knows and follows; it’s not documented or explained anywhere, you just kind of have to guess the distinction from the names.
Though, .first() (or .one()) on an unordered iterable is effectively first(shuffle(iterable)), which *could* raise an annotation exception at compile time.
I’m not sure what you mean by an “annotation exception”. You mean an error from static type checking in mypy or something? I’m not sure why it would be an error, unless you got the annotation wrong. It should be Iterable, and that will work for iterators and sequences and sets and so on just fine.
Also, it’s not really like shuffle, because even most “unordered iterables” in Python, like sets, actually have an order. It’s not guaranteed to be a meaningful one, but it’s not guaranteed to be meaningless either. If you need that (e.g., you’re creating a guessing game where you don’t want the answer to be the same every time anyone runs the game, or for security reasons), you really do need to explicitly randomize. For example, if s = set(range(10,0,-1)), it’s not guaranteed anywhere that next(iter(s)) will be 0, but it is still always 0 in any version of CPython. Worse, whatever next(iter(s)) is, if you call next(iter(s)) again (without mutating s in between), you’ll get the same value from the new Iterator in any version of any Python implementation.
Taking first(unordered_sequence) *is* like shuffle. Sets merely seem to be ordered when the items are integers that hash to said integer: https://stackoverflow.com/a/45589593 Does .first() need to solve for this with type annotations and/or just a friendly docstring reminder?
But if you don’t care whether it’s meaningful or meaningless, first, one, etc. on a set are fine.
Sets are unordered iterables and so aren't sequences; arent OrderedIterables.
Right, but a sequence isn’t just an ordered iterable, it’s also random-access indexable (plus a few other things). An itertools.count(), a typical sorteddict type, a typical linked list, etc. are all ordered but not sequences.
In terms of math, itertools.count() is an infinite ordered sequence (for which there is not implementation of lookup by subscript) In terms of Python, the generator returned by itertools.count() is an Iterable (hasattr('__iter__')) that does not implement __getitem__ ( doesn't implement the Mapping abstract type interface ). https://github.com/python/typeshed/blob/master/stdlib/3/itertools.pyi : _N = TypeVar('_N', int, float) def count(start: _N = ..., step: _N = ...) -> Iterator[_N]: ... # more general types? A collections.abc.Ordered type might make sense if Reversible does not imply Ordered. A hasattr('__iter_ordered__') might've made sense. hasattr('__getitem__') !=> Sequence Sequence => hasattr('__getitem__') The more-itertools functions that require sequences (and name them seq)
usually require indexing or slicing.
That may be a good convention. But in terms of type annotations - https://docs.python.org/3/library/collections.abc.html - [x] Iterable (__iter__) - [x] Collection (__getitem__, __iter__, __len__) - [x] Mapping / MutableMapping (Collection) - [x] Sequence / MutableSequence (Sequence, Reversible, Collection (Iterable)) - [x] Reversible - [ ] Ordered Does 'Reversible' => (imply) Ordered; which would then be redundant? Math definition (setting aside a Number sequence-element type restriction): Sequence = AllOf(Iterable, Ordered) More_itertools convention, AFAIU?: seq => AllOf(Iterable, Mapping, Ordered) seq => all(hasattr(x) for x in (' __iter__', '__getitem__')) How does this apply to .first()? If I call .first() on an unordered Iterable like a set, I may not get the first item; this can/may/will sometimes fail: assert first({'a', 'b'}) == 'a' If there was an Ordered ABC (maybe unnecessarily in addition to Reversible), we could specify: # collections.abc class OrderedIterable(Iterable, Ordered): pass # itertools def first(Iterable: OrderedIterable, default=_default): And then type checking would fail at linting time. But then we'd want take(iterable: Iterable, count=1, default=_default) for use with unordered iterables like sets. Implicit in a next() call is a hasattr(obj, '__iter__') check; but a user calling .first() may or may not be aware that there is no check that the passed Iterable is ordered. Type annotations could catch that mistake. "Dicts are now insertion-ordered (when there are no deletes), so everything is ordered and .first() is deterministic" is not correct and documentation in .first() may be pedantic but not redundant.