[Python-ideas] Re: Argumenting in favor of first()

Dec. 9, 2019

      On Dec 9, 2019, at 12:08, Wes Turner <wes.turner@gmail.com> wrote:
...
...
On Sat, Dec 7, 2019, 11:30 PM Andrew Barnert <abarnert@yahoo.com> wrote:
...
On Dec 7, 2019, at 18:09, Wes Turner <wes.turner@gmail.com> wrote:
...
On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote:
I would argue that there could be subclasses of ValueError for .one() that would also be appropriate for .first() (and/or .take(iterable, count=1, default=_default)
...
...
The names are less important than being able to distinguish the difference between the cases.
But again, the need to be able to distinguish is, while not nonexistent, pretty rare. And cases where you need to distinguish them but don’t care what the types are otherwise are even less common. So, is that common enough to be worth adding two more exception types to Python (or just to itertools) that aren’t used anywhere else? Just saying that they might be useful somewhere doesn’t answer that question.
...
...
That’s a common issue in Python. When you can’t use None as a sentinel because it could be a valid user input or return value, you just create a private module or class attribute that can’t equal anything the user could pass in, like this:
_sentinel = object()
And then:
def spam(stuff, default=_sentinel):
        if default is _sentinel:
            do single-argument stuff here
        else:
            do default-value stuff here
`None` is not a good default value for .first() (or .one()) because None may be the first item in the iterable.
Yes. And, as I said, this is a common case in Python, with a standard idiom (which more-itertools uses) to deal with it.
...
...
This seems like the kind of thing that should be explained somewhere in every tutorial (including the official one), but most people end up finding it only by accident, reading some code that uses it and trying to figure out what it does and why. The same way people figure out how useful two-argument iter is, and a couple other things.
I'll second a recommendation to note the existence of two-argument iter() and two-argument next() in the docstring for itertools.first()
I don’t think 2-arg iter belongs anywhere near first, just that it belongs somewhere in itertools tutorials, and maybe the module docs.

As for 2-arg next, notice that the existing docs for more_itertools.first cover that by saying “If is marginally shorter than next(iter(iterable), default)”. I think maybe a stdlib version of first should be a bit less dismissive of its own value, but noting thIs relationship is really all you need to teach people 2-arg next.
...
...
...
...
Though, .first() (or .one()) on an unordered iterable is effectively first(shuffle(iterable)), which *could* raise an annotation exception at compile time.
I’m not sure what you mean by an “annotation exception”. You mean an error from static type checking in mypy or something? I’m not sure why it would be an error, unless you got the annotation wrong. It should be Iterable, and that will work for iterators and sequences and sets and so on just fine.
Also, it’s not really like shuffle, because even most “unordered iterables” in Python, like sets, actually have an order. It’s not guaranteed to be a meaningful one, but it’s not guaranteed to be meaningless either. If you need that (e.g., you’re creating a guessing game where you don’t want the answer to be the same every time anyone runs the game, or for security reasons), you really do need to explicitly randomize. For example, if s = set(range(10,0,-1)), it’s not guaranteed anywhere that next(iter(s)) will be 0, but it is still always 0 in any version of CPython. Worse, whatever next(iter(s)) is, if you call next(iter(s)) again (without mutating s in between), you’ll get the same value from the new Iterator in any version of any Python implementation.
Taking first(unordered_sequence) *is* like shuffle. Sets merely seem to be ordered when the items are integers that hash to said integer:
It’s not a matter of when they seem to have some specific order. It’s that they always do have an order, even if it often isn’t a meaningful one. If you need to actually guarantee not having a meaningful order, you need to ask for that explicitly (whether with shuffle or something else).
...
Does .first() need to solve for this with type annotations and/or just a friendly docstring reminder?
Solve for what? People should know that sets have no guarantee about the meaningfulness of their order, but the right place to teach that is on sets, not on every function that works with iterables.
...
...
Right, but a sequence isn’t just an ordered iterable, it’s also random-access indexable (plus a few other things). An itertools.count(), a typical sorteddict type, a typical linked list, etc. are all ordered but not sequences.
In terms of math, itertools.count() is an infinite ordered sequence (for which there is not implementation of lookup by subscript)
Right, and? Sorted dicts and linked lists are ordered but not sequences despite being generally finite, and even Sized. That isn’t the key distinction that makes a Sequence; it’s being random-access indexable (which, e.g., a linked list usually isn’t, because it can’t do it in constant time).
...
In terms of Python, the generator returned by itertools.count() is an Iterable (hasattr('__iter__')) that does not implement __getitem__ ( doesn't implement the Mapping abstract type interface ).
Not quite. The Mapping interface is not for everything that’s indexable, it’s for everything that’s subscriptable by keys rather than indexes. Things that are subscriptable by indexes are Sequences, not Mappings. Almost nothing is both. (The fact that Python’s type system can’t distinguish those—e.g., you can use ints as keys and as indexes—is why neither of these can be an implicit structural ABC like Iterable, and instead they need to register types manually.)
...
https://github.com/python/typeshed/blob/master/stdlib/3/itertools.pyi :
_N = TypeVar('_N', int, float)
def count(start: _N = ...,
          step: _N = ...) -> Iterator[_N]: ...  # more general types?
A collections.abc.Ordered type might make sense if Reversible does not imply Ordered.
A hasattr('__iter_ordered__') might've made sense.
But what would ordered mean here? Just that there is some ordering? That the ordering is consistent between iterations if nothing is mutated? That it’s consistent even after mutations except for the mutated bits? Something even more strict?

If you don’t have any code that needs to switch on any of those distinctions, there’s no need for an ABC.
...
hasattr('__getitem__') !=> Sequence
Sequence => hasattr('__getitem__')
Yes. Mappings also have __getitem__ and they’re not Sequences. And not-quite-Mapping types. And “old-style sequence protocol” types (which can be consistently indexed from 0 up to the smallest int that raises IndexError, but don’t necessarily have __len__, or even __iter__). And so on.
...
...
The more-itertools functions that require sequences (and name them seq) usually require indexing or slicing.
That may be a good convention.
But in terms of type annotations
- https://docs.python.org/3/library/collections.abc.html
  - [x] Iterable (__iter__)
  - [x] Collection (__getitem__, __iter__, __len__)
  - [x] Mapping / MutableMapping (Collection)
  - [x] Sequence  / MutableSequence (Sequence, Reversible, Collection (Iterable))
  - [x] Reversible 
  - [ ] Ordered
Does 'Reversible' => (imply) Ordered; which would then be redundant?
Which more-itertools functions require testing for Reversible, or Ordered, but not Sequence? There might be some of the former, but I doubt there are any of the latter. Most take Iterable, the rest take Sequence or Iterator, and I don’t think anything is left out, or had to be crammed into either of those as a hacky workaround or anything. So what are you trying to fix here?
...
Math definition (setting aside a Number sequence-element type restriction):
Sequence = AllOf(Iterable, Ordered)
So your Ordered implies Sized and Container?
...
More_itertools convention, AFAIU?:
seq => AllOf(Iterable, Mapping, Ordered)
  seq => all(hasattr(x) for x in (' __iter__', '__getitem__'))
I think it’s a lot simpler. seq => Sequence. Theremay be a bit of looseness in that some functions can take an old-style half-sequence or various other things, but no more than any other code annotated with Sequence in Python.
...
How does this apply to .first()?
If I call .first() on an unordered Iterable like a set, I may not get the first item; this can/may/will sometimes fail:
assert first({'a', 'b'}) == 'a'
But 'a' isn’t the first element in the set just because it came first in the display. Consider this:

    assert first(sortedlist('zyx')) == 'z'

Clearly that should fail, because the first item in a sorted list of those letters is x, not z. The fact that you constructed it with z first isn’t relevant; they’re kept in sorted order, and x sorts before z. But surely you wouldn’t say that a sorted list isn’t ordered?

Meanwhile, notice that in either case, first(it) always returns the same thing that list(it)[0] would (except for a different exception if it is empty). That’s guaranteed by the way iteration works. In that sense, all iterables are ordered. There are other senses in which that’s not true, but without having a specific sense in mind that you’re trying to distinguish, the word doesn’t help anything.
...
If there was an Ordered ABC (maybe unnecessarily in addition to Reversible), we could specify:
# collections.abc
  class OrderedIterable(Iterable, Ordered):
      pass
So your Ordered doesn’t imply Iterable? What kinds of things are ordered but not Iterable?

Again, what are you actually trying to solve this this distinction?
...
Implicit in a next() call is a hasattr(obj, '__iter__') check
No there isn’t. It’s almost always true, because the only things that normally have __next__ are iterators, and they always have __iter__ as well. But there’s no need to check for that. If you create a type that has __next__ but not __iter__ for some reason, you expect that it can’t be used in a for loop, but why shouldn’t it be usable in a next call? Why would we want to go out of our way to block that when nobody ever does it, and it would be a clear “consenting adults” case if anyone ever did?
...
; but a user calling .first() may or may not be aware that there is no check that the passed Iterable is ordered. Type annotations could catch that mistake.
Only with some meaningful (and universally meaningful) definition of “ordered”. And I don’t know what definition you have in mind, or even could have in mind, that would alleviate potential confusion.