[Python-ideas] Re: Exception spaces

10 Apr 2020

      On Sat, Apr 11, 2020 at 3:31 AM Greg Ewing  wrote:
...
On 11/04/20 2:34 am, Chris Angelico wrote:
...
AttributeError, KeyError/IndexError, and GeneratorExit (and
StopAsyncIteration) want to say hi too.
Okay, there are a few others. But the important thing for our
purposes is that APIs designed specifically to use exceptions
for flow control (such as StopIteration and GeneratorExit)
define their own special exceptions for the purpose. Nothing
else raises them, so it's usually fairly safe to catch them.
On the other hand, I don't really think of AttributeError,
KeyError or IndexError as exceptions intended primarily for
flow control. To my mind they're in the same category as
TypeError and ValueError -- they signal that you tried to
do something that can't be done.
While you *can* use them for flow control, you need to be
careful how you go about it. And there is usually another
way to get the same result that doesn't require catching an
exception, e.g, dict.get().
The only one of these that can be a bit of a problem is
AttributeError, because there is no other way to attempt
to get an attribute that may or may not be there (even
hasattr() calls getattr() under the covers and catches
AttributeError).
They all serve the same broad purpose - as mentioned, the protocol
consists of "return this value" and needs a way to say "there is no
value". StopIteration means "there is nothing to yield".
AttributeError means "there is no such attribute". KeyError means
"there is no such key". None of them are inherently flow control
(apart from StopIteration's association with for loops); they're
fundamentally about a protocol that has to be capable of returning
*any* value, and also capable of signalling the absence of a value.

(IMO dict.get() is the same as hasattr() - unless you duplicate the
lookup code into it, the most logical way to implement it is to
attempt a __getitem__ and, if it raises, return the default.)

With every one of these protocols, it's entirely possible to mask a
bug *in the protocol function itself* this way. If you're implementing
__getitem__, you need to be careful not to accidentally leak a
KeyError. One option would be to wrap everything:

def __getitem__(self, item):
    try:
        if ...:
            return ...
    except KeyError:
        raise RuntimeError
    # If we didn't hit a return, then the key wasn't found
    raise KeyError

Tada! No leakage. But much more commonly, __getitem__ is going to
delegate in a way that means that a "leaked" KeyError is actually the
correct behaviour. So it might be necessary to guard just *part* of
the function:

class LazyMapper:
    def __init__(self, func, base): ...
    def __getitem__(self, item):
        orig = self.base[item] # If this raises, let it raise
        try:
            return self.func(orig) # If this raises, it's a bug
        except KeyError:
            raise RuntimeError

There's no way to solve this problem from the outside. The protocol
exists for a reason, and exception handling is an extremely convenient
way to delegate the entire protocol. The ONLY way to know which
KeyErrors are bugs and which are correct use of protocol is to write
the try/except accordingly.

With this proposed "espace" mechanism, what you'd end up with is
exactly the same, just inverted:

def __getitem__(self, item):
    try:
        orig = self.base[item]
    except KeyError:
        raise KeyError in espace
    return self.func(orig)

You STILL need to indicate which exceptions are proper use of protocol
and which are leaks; only now, you can do that with a brand new
mechanism with lots of overhead, instead of using the mechanism that
we already have.

The only reason that StopIteration is special in generators is that
they *already* have a way to produce any value or to signal that there
are no more values: "yield" and "return". That means it's actually
possible to place the guard around the outside; and it also means that
the implementation of the function doesn't clearly and obviously make
use of an exception as protocol. If you're writing a __next__ method,
it should be obvious that StopIteration is significant. If you're
calling next(it) inside a __next__ method, then you're clearly and
logically delegating. That's normal use of protocol. (And, in fact,
there was some pushback to PEP 479 on the basis that generators were
legitimately delegating to an unguarded next() call as part of
protocol.)

If you really want to handle things externally in some way, the only
way would be to change the internal and external protocols. Maybe your
__getitem__ function could be written to wrap its return value in a
tuple, or to return an empty tuple if the item couldn't be found. Then
you could have a simple wrapper:

def __getitem__(self, key):
    try:
        result = self.__real__getitem__(key)
    except KeyError:
        raise RuntimeError
    if result: return result[0]
    raise KeyError

But then it becomes that much harder for __real__getitem__ to delegate
to something else. I don't think it really benefits us any.

ChrisA