[Python-ideas] Re: Argumenting in favor of first()

9 Dec 2019

      On Sat, Dec 7, 2019, 11:30 PM Andrew Barnert <abarnert@yahoo.com> wrote:
...
On Dec 7, 2019, at 18:09, Wes Turner <wes.turner@gmail.com> wrote:
On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote:
...
On Dec 7, 2019, at 07:33, Wes Turner <wes.turner@gmail.com> wrote:
...

+1 for itertools.first(seq, default=Exception) *and* itertools.one(seq,
default=Exception)
What does default=Exception mean? What happens if you pass a different
value? Does it do one thing if the argument is a type that’s a subclass of
Exception (or of BaseException?) and a different thing if it’s any other
value?
That's a good point: Exception is a bad sentinel value. Is None a good
default value? What if the genexpr'd iterable is [None, 2, 3]
Here are more_itertools.more.one() and more_itertools.more.first() without
docstrings from
https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more...
:
```
def one(iterable, too_short=None, too_long=None):
    it = iter(iterable)

    try:
        value = next(it)
    except StopIteration:
        raise too_short or ValueError('too few items in iterable (expected
1)')

    try:
        next(it)
    except StopIteration:
        pass
    else:
        raise too_long or ValueError('too many items in iterable (expected
1)')

    return value

def first(iterable, default=_marker):
    try:
        return next(iter(iterable))
    except StopIteration:
        # I'm on the edge about raising ValueError instead of
StopIteration. At
        # the moment, ValueError wins, because the caller could conceivably
        # want to do something different with flow control when I raise the
        # exception, and it's weird to explicitly catch StopIteration.
        if default is _marker:
            raise ValueError('first() was called on an empty iterable, and
no '
                             'default value was provided.')
        return default
```

I would argue that there could be subclasses of ValueError for .one() that
would also be appropriate for .first() (and/or .take(iterable, count=1,
default=_default)

class TooShortValueError(ValueError):
class TooLongValueError(ValueError):

(Where, again, SQLAlchemy has NoResultFound and MultipleResultsFound)

The names are less important than being able to distinguish the difference
between the cases.

And then itertools.one() could be interface-compatible with this in
more_itertools.more.one()

def one(iterable, too_short=TooShortValueError, too_long=TooLongValueError):
...
That’s a common issue in Python. When you can’t use None as a sentinel
because it could be a valid user input or return value, you just create a
private module or class attribute that can’t equal anything the user could
pass in, like this:
_sentinel = object()
And then:
def spam(stuff, default=_sentinel):
        if default is _sentinel:
            do single-argument stuff here
        else:
            do default-value stuff here
`None` is not a good default value for .first() (or .one()) because None
may be the first item in the iterable.
It should be necessary to explicitly specify default=None if that's what's
expected.
...
This seems like the kind of thing that should be explained somewhere in
every tutorial (including the official one), but most people end up finding
it only by accident, reading some code that uses it and trying to figure
out what it does and why. The same way people figure out how useful
two-argument iter is, and a couple other things.
I'll second a recommendation to note the existence of two-argument iter()
and two-argument next() in the docstring for itertools.first()
...
Also, “seq” implies that you’re expecting these to be used on sequences,
...
not general iterables. In that case, why not just use [0]?
I chose `seq` as the argument because I was looking at
toolz.itertoolz.first(),which has no default= argument.
Ah. I’m not sure why toolz uses seq for arguments that are usually
iterators, but I guess that’s not horrible or anything. In itertools and
more-itertools, the argument is usually called iterable, and seq is
reserved specifically for the ones that should be sequences, like
chunked(iterable, n) vs. sliced(seq, n). But as useful as that convention
is, I suppose it’s not a universal thing that everyone knows and follows;
it’s not documented or explained anywhere, you just kind of have to guess
the distinction from the names.
Though, .first() (or .one()) on an unordered iterable is effectively
first(shuffle(iterable)), which *could* raise an annotation exception at
compile time.
I’m not sure what you mean by an “annotation exception”. You mean an error
from static type checking in mypy or something? I’m not sure why it would
be an error, unless you got the annotation wrong. It should be Iterable,
and that will work for iterators and sequences and sets and so on just fine.
Also, it’s not really like shuffle, because even most “unordered
iterables” in Python, like sets, actually have an order. It’s not
guaranteed to be a meaningful one, but it’s not guaranteed to be
meaningless either. If you need that (e.g., you’re creating a guessing game
where you don’t want the answer to be the same every time anyone runs the
game, or for security reasons), you really do need to explicitly randomize.
For example, if s = set(range(10,0,-1)), it’s not guaranteed anywhere that
next(iter(s)) will be 0, but it is still always 0 in any version of
CPython. Worse, whatever next(iter(s)) is, if you call next(iter(s)) again
(without mutating s in between), you’ll get the same value from the new
Iterator in any version of any Python implementation.
Taking first(unordered_sequence) *is* like shuffle. Sets merely seem to be
ordered when the items are integers that hash to said integer:

https://stackoverflow.com/a/45589593

Does .first() need to solve for this with type annotations and/or just a
friendly docstring reminder?
...
But if you don’t care whether it’s meaningful or meaningless, first, one,
etc. on a set are fine.
Sets are unordered iterables and so aren't sequences; arent
OrderedIterables.
Right, but a sequence isn’t just an ordered iterable, it’s also
random-access indexable (plus a few other things). An itertools.count(), a
typical sorteddict type, a typical linked list, etc. are all ordered but
not sequences.
In terms of math, itertools.count() is an infinite ordered sequence (for
which there is not implementation of lookup by subscript)

In terms of Python, the generator returned by itertools.count() is an
Iterable (hasattr('__iter__')) that does not implement __getitem__ (
doesn't implement the Mapping abstract type interface ).

https://github.com/python/typeshed/blob/master/stdlib/3/itertools.pyi :

_N = TypeVar('_N', int, float)
def count(start: _N = ...,
          step: _N = ...) -> Iterator[_N]: ...  # more general types?

A collections.abc.Ordered type might make sense if Reversible does not
imply Ordered.
A hasattr('__iter_ordered__') might've made sense.

hasattr('__getitem__') !=> Sequence
Sequence => hasattr('__getitem__')

The more-itertools functions that require sequences (and name them seq)
...
usually require indexing or slicing.
That may be a good convention.
But in terms of type annotations
- https://docs.python.org/3/library/collections.abc.html
  - [x] Iterable (__iter__)
  - [x] Collection (__getitem__, __iter__, __len__)
  - [x] Mapping / MutableMapping (Collection)
  - [x] Sequence  / MutableSequence (Sequence, Reversible, Collection
(Iterable))
  - [x] Reversible
  - [ ] Ordered

Does 'Reversible' => (imply) Ordered; which would then be redundant?

Math definition (setting aside a Number sequence-element type restriction):

  Sequence = AllOf(Iterable, Ordered)

More_itertools convention, AFAIU?:

  seq => AllOf(Iterable, Mapping, Ordered)
  seq => all(hasattr(x) for x in (' __iter__', '__getitem__'))

How does this apply to .first()?

If I call .first() on an unordered Iterable like a set, I may not get the
first item; this can/may/will sometimes fail:

  assert first({'a', 'b'}) == 'a'

If there was an Ordered ABC (maybe unnecessarily in addition to
Reversible), we could specify:

  # collections.abc
  class OrderedIterable(Iterable, Ordered):
      pass

  # itertools
  def first(Iterable: OrderedIterable, default=_default):

And then type checking would fail at linting time.
But then we'd want take(iterable: Iterable, count=1, default=_default) for
use with unordered iterables like sets.

Implicit in a next() call is a hasattr(obj, '__iter__') check; but a user
calling .first() may or may not be aware that there is no check that the
passed Iterable is ordered. Type annotations could catch that mistake.

"Dicts are now insertion-ordered (when there are no deletes), so everything
is ordered and .first() is deterministic" is not correct and documentation
in .first() may be pedantic but not redundant.
...