[Python-ideas] for/except/else

Thu Mar 2 22:36:29 EST 2017

On 2 March 2017 at 21:06, Wolfgang Maier <
wolfgang.maier at biologie.uni-freiburg.de> wrote:

> On 02.03.2017 06:46, Nick Coghlan wrote:
>
>> The proposal in this thread then has the significant downside of only
>> covering the "nested side effect" case:
>>
>>     for item in iterable:
>>         if condition(item):
>>             break
>>     except break:
>>         operation(item)
>>     else:
>>         condition_was_never_true(iterable)
>>
>> While being even *less* amenable to being pushed down into a helper
>> function (since converting the "break" to a "return" would bypass the
>> "except break" clause).
>>
>
> I'm actually not quite buying this last argument. If you wanted to
> refactor this to "return" instead of "break", you could simply put the
> return into the except break block. In many real-world situations with
> multiple breaks from a loop this could actually make things easier instead
> of worse.
>

Fair point - so that would be even with the "single nested side effect"
case, but simpler when you had multiple break conditions (and weren't
already combined them with "and").

> Personally, the "nested side effect" form makes me uncomfortable every
> time I use it because the side effects on breaking or not breaking the loop
> don't end up at the same indentation level and not necessarily together.
> However, I'm gathering from the discussion so far that not too many people
> are thinking like me about this point, so maybe I should simply adjust my
> mind-set.
>

This is why I consider the "search only" form of the loop, where the else
clause either sets a default value, or else prevents execution of the code
after the loop body (via raise, return, or continue), to be the preferred
form: there aren't any meaningful side effects hidden away next to the
break statement. If I can't do that, I'm more likely to switch to a classic
flag variable that gets checked post-loop execution than I am to push the
side effect inside the loop body:

    search_result = _not_found = object()
    for item in iterable:
        if condition(item):
            search_result = item
            break
    if search_result is _not_found:
        # Handle the "not found" case
    else:
        # Handle the "found" case

> All that said, this is a very nice abstract view on things! I really
> learned quite a bit from this, thank you :)
>
> As always though, reality can be expected to be quite a bit more
> complicated than theory so I decided to check the stdlib for real uses of
> break. This is quite a tedious task since break is used in many different
> ways and I couldn't come up with a good automated way of classifying them.
> So what I did is just go through stdlib code (in reverse alphabetical
> order) containing the break keyword and put it into categories manually. I
> only got up to socket.py before losing my enthusiasm, but here's what I
> found:
>
> - overall I looked at 114 code blocks that contain one or more breaks
>

Thanks for doing that research :)

> Of the remaining 19 non-trivial cases
>
> - 9 are variations of your classical search idiom above, i.e., there's an
> else clause there and nothing more is needed
>
> - 6 are variations of your "nested side-effects" form presented above with
> debatable (see above) benefit from except break
>
> - 2 do not use an else clause currently, but have multiple breaks that do
> partly redundant things that could be combined in a single except break
> clause
>

Those 8 cases could also be reviewed to see whether a flag variable might
be clearer than relying on nested side effects or code repetition.

> - 1 is an example of breaking out of two loops; from sre_parse._parse_sub:
>
> [...]
>     # check if all items share a common prefix
>     while True:
>         prefix = None
>         for item in items:
>             if not item:
>                 break
>             if prefix is None:
>                 prefix = item[0]
>             elif item[0] != prefix:
>                 break
>         else:
>             # all subitems start with a common "prefix".
>             # move it out of the branch
>             for item in items:
>                 del item[0]
>             subpatternappend(prefix)
>             continue # check next one
>         break
> [...]
>

This is a case where a flag variable may be easier to read than loop state
manipulations:

    may_have_common_prefix = True
    while may_have_common_prefix:
        prefix = None
        for item in items:
            if not item:
                may_have_common_prefix = False
                break
            if prefix is None:
                prefix = item[0]
            elif item[0] != prefix:
                may_have_common_prefix = False
                break
        else:
            # all subitems start with a common "prefix".
            # move it out of the branch
            for item in items:
                del item[0]
            subpatternappend(prefix)

Although the whole thing could likely be cleaned up even more via
itertools.zip_longest:

    for first_uncommon_idx, aligned_entries in
enumerate(itertools.zip_longest(*items)):
        if not all_true_and_same(aligned_entries):
            break
    else:
        # Everything was common, so clear all entries
        first_uncommon_idx = None
    for item in items:
        del item[:first_uncommon_idx]

(Batching the deletes like that may even be slightly faster than deleting
common entries one at a time)

Given the following helper function:

    def all_true_and_same(entries):
        itr = iter(entries)
        try:
            first_entry = next(itr)
        except StopIteration:
            return False
        if not first_entry:
            return False
        for entry in itr:
            if not entry or entry != first_entry:
                return False
        return True

>
> - finally, 1 is a complicated break dance to achieve sth that clearly
> would have been easier with except break; from typing.py:
>
> [...]
>     def __subclasscheck__(self, cls):
>         if cls is Any:
>             return True
>         if isinstance(cls, GenericMeta):
>             # For a class C(Generic[T]) where T is co-variant,
>             # C[X] is a subclass of C[Y] iff X is a subclass of Y.
>             origin = self.__origin__
>             if origin is not None and origin is cls.__origin__:
>                 assert len(self.__args__) == len(origin.__parameters__)
>                 assert len(cls.__args__) == len(origin.__parameters__)
>                 for p_self, p_cls, p_origin in zip(self.__args__,
>                                                    cls.__args__,
>                                                    origin.__parameters__):
>                     if isinstance(p_origin, TypeVar):
>                         if p_origin.__covariant__:
>                             # Covariant -- p_cls must be a subclass of
> p_self.
>                             if not issubclass(p_cls, p_self):
>                                 break
>                         elif p_origin.__contravariant__:
>                             # Contravariant.  I think it's the opposite.
> :-)
>                             if not issubclass(p_self, p_cls):
>                                 break
>                         else:
>                             # Invariant -- p_cls and p_self must equal.
>                             if p_self != p_cls:
>                                 break
>                     else:
>                         # If the origin's parameter is not a typevar,
>                         # insist on invariance.
>                         if p_self != p_cls:
>                             break
>                 else:
>                     return True
>                 # If we break out of the loop, the superclass gets a
> chance.
>         if super().__subclasscheck__(cls):
>             return True
>         if self.__extra__ is None or isinstance(cls, GenericMeta):
>             return False
>         return issubclass(cls, self.__extra__)
> [...]
>

I think is another case that is asking for the inner loop to be factored
out to a named function, not for reasons of re-use, but for reasons of
making the code more readable and self-documenting :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170303/6fcbba6c/attachment.html>