[Python-ideas] Re: Argumenting in favor of first()

Dec. 8, 2019

      On Dec 7, 2019, at 18:09, Wes Turner <wes.turner@gmail.com> wrote:
...
...
On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 7, 2019, at 07:33, Wes Turner <wes.turner@gmail.com> wrote:
...

+1 for itertools.first(seq, default=Exception) *and* itertools.one(seq, default=Exception)
What does default=Exception mean? What happens if you pass a different value? Does it do one thing if the argument is a type that’s a subclass of Exception (or of BaseException?) and a different thing if it’s any other value?
That's a good point: Exception is a bad sentinel value. Is None a good default value? What if the genexpr'd iterable is [None, 2, 3]
That’s a common issue in Python. When you can’t use None as a sentinel because it could be a valid user input or return value, you just create a private module or class attribute that can’t equal anything the user could pass in, like this:

    _sentinel = object()

And then:

    def spam(stuff, default=_sentinel):
        if default is _sentinel:
            do single-argument stuff here
        else:
            do default-value stuff here

This seems like the kind of thing that should be explained somewhere in every tutorial (including the official one), but most people end up finding it only by accident, reading some code that uses it and trying to figure out what it does and why. The same way people figure out how useful two-argument iter is, and a couple other things.
...
...
Also, “seq” implies that you’re expecting these to be used on sequences, not general iterables. In that case, why not just use [0]?
I chose `seq` as the argument because I was looking at toolz.itertoolz.first(),which has no default= argument.
Ah. I’m not sure why toolz uses seq for arguments that are usually iterators, but I guess that’s not horrible or anything. In itertools and more-itertools, the argument is usually called iterable, and seq is reserved specifically for the ones that should be sequences, like chunked(iterable, n) vs. sliced(seq, n). But as useful as that convention is, I suppose it’s not a universal thing that everyone knows and follows; it’s not documented or explained anywhere, you just kind of have to guess the distinction from the names.
...
Though, .first() (or .one()) on an unordered iterable is effectively first(shuffle(iterable)), which *could* raise an annotation exception at compile time.
I’m not sure what you mean by an “annotation exception”. You mean an error from static type checking in mypy or something? I’m not sure why it would be an error, unless you got the annotation wrong. It should be Iterable, and that will work for iterators and sequences and sets and so on just fine.

Also, it’s not really like shuffle, because even most “unordered iterables” in Python, like sets, actually have an order. It’s not guaranteed to be a meaningful one, but it’s not guaranteed to be meaningless either. If you need that (e.g., you’re creating a guessing game where you don’t want the answer to be the same every time anyone runs the game, or for security reasons), you really do need to explicitly randomize. For example, if s = set(range(10,0,-1)), it’s not guaranteed anywhere that next(iter(s)) will be 0, but it is still always 0 in any version of CPython. Worse, whatever next(iter(s)) is, if you call next(iter(s)) again (without mutating s in between), you’ll get the same value from the new Iterator in any version of any Python implementation.

But if you don’t care whether it’s meaningful or meaningless, first, one, etc. on a set are fine.
...
Sets are unordered iterables and so aren't sequences; arent OrderedIterables.
Right, but a sequence isn’t just an ordered iterable, it’s also random-access indexable (plus a few other things). An itertools.count(), a typical sorteddict type, a typical linked list, etc. are all ordered but not sequences. The more-itertools functions that require sequences (and name them seq) usually require indexing or slicing.
...
...
Arguably, first, and maybe some of it’s cousins, should go into the recipes. And I don’t see any reason they shouldn’t be identical to the versions in more-itertools, but if there is one, it should be coordinated with Erik Rose in some way so they stay in sync.
Oh hey, "more-itertools". I should've found that link in the cpython docs.
Well, it was only added to the docs in, I believe, 3.8, so a lot of people probably haven’t seen the link yet. (That’s always a problem for a widely-used decades-old language that evolves over 18-month cycles and carefully preserves backward compatibility; you can’t expect everyone to always know the latest of anything the way you can with something like Swift. But if we’re talking about further changes beyond what’s in 3.8, I think we have to assume that the docs change will start being effective before anything new we propose.)
...
https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more... :
def first(iterable, default=_marker)
That makes more sense than default=Exception.
FWIW, more-itertools .one() raises ValueError (or whatever's passed as too_short= or too_long= kwargs). Default subclasses of ValueError may not be justified?
I don’t _think_ they are. It’s probably pretty rare that you want to switch on the type programmatically (e.g., use different handlers for too short and too long), and it’s pretty trivial to add your own subclasses. You do often want to be able to distinguish them as a human when debugging your code, but that’s already taken care of by the exception message text. (That’s only documented in the examples, but is that a problem?)
...
...
Maybe first is so useful, so much more so than all of the other very useful recipes, including things like consume, flatten, and unique (which IIRC were the ones that convinced everyone it’s time to add a more-itertools link to the docs), that it needs to be slightly more discoverable—e.g., by itertools.<TAB> completion? But that seems unlikely given that they’ve been recipes for decades and first wasn’t.
def itertools._check_more_itertools():
   """ https://more-itertools.readthedocs.io/en/stable/api.html """
I’m not sure what this is intended to mean. Are you suggesting we could add this as an empty function just so that tab completion, dir, help, IDE mechanisms, etc. could make it more discoverable? 

If so, you’d want to give it a non-private name (most of those things will ignore a name starting with an underscore; some of them will use __all__ to override it, but others won’t.) But otherwise, it might not be a bad idea. A lot of people do explore by IDE completion, apparently.
...
...
And it seems even less likely for one, which nobody has mentioned in this thread yet.
If there’s a general argument that linking to more-itertools hasn’t helped anything, or that the recipes are still useless until someone makes the often-proposed/never-followed-through change of finding a way to make the recipes individually searchable and linkable, or whatever, that’s fine, but it’s not really an argument against making a special case for one that isn’t made for unique or consume.
Is programming by Exception faster or preferable to a sys.version_info conditional?
I don’t know if it’s faster (and I doubt that matters), and I’m sure you could argue other pros and cons each way, but it is a long-standing common idiom (at least back to the 2.5 days, when half the web services on the internet probably started by importing json with a fallback to simplejson) to do it your first way:
...
try:
   from itertools import one, first
except ImportError:
   from more_itertools.more import one, first
But why does one need to be added to itertools in the first place? Is it really that much more common a need than flatten, consume, etc., or so much harder to write yourself (maybe not inherently, but because its target audience is more novice-y), or what?

You need some argument for that to overcome the status quo, and expanding itertools making it harder to find the stuff that really is necessary to have there, and the fact that you’d either have to implement it in C or convert itertools to a Python-and-C module to do it, etc. Otherwise, either just adding it to the recipes, or doing nothing at all, seems like the right choice.