[Python-ideas] PEP 532: A circuit breaking operator and protocol

Tue Nov 8 11:15:11 EST 2016

On Sat, Nov 05, 2016 at 07:50:44PM +1000, Nick Coghlan wrote:
> Hi folks,
> 
> As promised, here's a follow-up to the withdrawn PEP 531 that focuses
> on coming up with a common protocol driven solution to conditional
> evaluation of subexpressions that also addresses the element-wise
> comparison chaining problem that Guido noted when rejecting PEP 335.

This feels much more promising to me, but I'm still not quite convinced, 
possibly because it's a very complicated proposal. I've had to read it 
multiple times to really understand it.

I wonder whether part of the difficulty is the size of the proposal. 
Perhaps this should be split into two PEPs: one to describe the protocol 
alone, and a second to propose built-ins and new syntax that takes 
advantage of the protocol. That might help keep this proposal in 
digestible pieces.

(By the way, I really like the "circuit breaker" name.)

> Inspired by PEP 335, PEP 505, PEP 531, and the related discussions, this PEP
> proposes the addition of a new protocol-driven circuit breaking operator to
> Python that allows the left operand to decide whether or not the expression
> should short circuit and return a result immediately, or else continue
> on with evaluation of the right operand::
> 
>     exists(foo) else bar
>     missing(foo) else foo.bar()
> 
> These two expressions can be read as:
> 
> * "the expression result is 'foo' if it exists, otherwise it is 'bar'"
> * "the expression result is 'foo' if it is missing, otherwise it is 'foo.bar()'"

These built-in names are quite problematic. I've seen people on Reddit 
and elsewhere who understood this as a form of "is this variable 
defined?" which is badly wrong. That is, they understood:

    exists(foo) else bar

as being equivalent to:

    try:
        return foo
    except NameError:
        # variable foo doesn't exist/isn't defined
        return bar

So I think that is going to be an area of confusion.

I'm not really keen on these names for the proposed built-ins. The names 
are much too generic for what they actually do, which is test for None, 
not just some generic idea of "existence".

I don't really have a better solution if your aim is to dispense with 
the ?? operator. Some (bad) ideas: isnone, is_none, nullity. None of 
these are as compact as the ?? operator.

On the other hand, if we keep the ?? operator, and implicitly define it 
as:

    lhs_expression ?? rhs_expression
    => exists(lhs_expression) else rhs_expression

then there's no need for the user to explicitly write out exists() in 
their code, and the name can be as precise and as long as we need:

    # ?? operator
    types.NoneCoalesingType(lhs_expression) else rhs_expression

> Execution of these expressions relies on a new circuit breaking protocol that
> implicitly avoids repeated evaluation of the left operand while letting
> that operand fully control the result of the expression, regardless of whether
> it skips evaluation of the right operand or not::
> 
>     _lhs = LHS
>     type(_lhs).__then__(_lhs) if _lhs else type(_lhs).__else__(_lhs, RHS)

My first reaction on reading that was to wonder if you had written the 
first two terms backwards. My reading was that *any* truthy value would 
trigger the __then__ method call, and any falsey value the __else__ 
call, which would make this (almost) just another spelling of the 
existing `or` operator.

My thought was that people could write:

    None else 2

which would return NoneType.__else__(None, 2), which I presumed 
would be 2. But I now think that's wrong.

I thought that the intention was to give `object` default __then__ and 
__else__ methods, and over-ride them as needed, but I now think that's 
wrong. I think that should be clarified early in the PEP, as I wasted a 
lot of time thinking about this wrongly.

So I now think that most objects will not define __then__ and __else__, 
and consequently code like `None else 2` will raise an error. Is that 
right?

I now think that only the builtins `exists` and `missing`, and the 
CircuitBreaker actually need __then__ and __else__ methods. (Plus of 
course any user-defined types.)

[...]
> In addition to being usable as simple boolean operators (e.g. as in
> ``assert all(exists, items)`` or ``if any(missing, items):``), these circuit
> breakers will allow existence checking fallback operations (aka None-coalescing
> operations) to be written as::
> 
>     value = exists(expr1) else exists(expr2) else expr3

I take it that your intention is to avoid the ?? None-coalescing 
operator. Your exists() version risks being misunderstood as testing for 
the existence of the expression/name (e.g. catching NameError, 
LookupError), and it's still quite verbose compared to the ?? 
operator but without being any more explicit in what it is doing.

    # your exists() proposal
    value = exists(expr1) else exists(expr2) else expr3

    # ?? operator
    value = expr1 ?? expr2 ?? expr3

[...]
> Overloading logical inversion (``not``)
> ---------------------------------------
> 
> Any circuit breaker definition will have a logical inverse that is still a
> circuit breaker, but inverts the answer as to whether or not to short circuit
> the expression evaluation. For example, the ``exists`` and ``missing`` circuit
> breakers proposed in this PEP are each other's logical inverse.
> 
> A new protocol method, ``__not__(self)``, will be introduced to permit circuit
> breakers and other types to override ``not`` expressions to return their
> logical inverse rather than a coerced boolean result.

So how will this effect the existing semantics of `not`?

Currently, I think `not obj` is equivalent to `not bool(obj)`, where 
bool calls the __bool__ method, or __len__ if that's not defined, and 
falls back to True. I think your proposal means that this will change 
to something close to this:

    __not__ = getattr(type(obj), '__not__', None)
    if __not__ is not None:
        x = __not__(obj)
        if x is not NotImplemented:
            return x
    flag = bool(obj)
    return False if flag else True

That means that now any object, not just Circuit Breakers, can customise 
how they respond to `not`.

[...]
> This PEP has been designed specifically to address the risks and concerns
> raised when discussing PEPs 335, 505 and 531.
> 
> * it defines a new operator and adjusts the definition of chained comparison
>   rather than impacting the existing ``and`` and ``or`` operators

That's a point in its favour.

> * the changes to the ``not`` unary operator are defined in such a way that
>   control flow optimisations based on the existing semantics remain valid

I think the changes to `not` are neutral.

> * rather than the cryptic ``??``, it uses ``else`` as the operator keyword in
>   exactly the same sense as it is already used in conditional expressions

I don't think that ?? is really cryptic.

Like any symbol, it has to be learned, but ?? is used in a number of 
other major languages. Its also obviously related to the ?. and ?[] 
sugar, so once people learn that they are related to None testing they 
will have a strong clue that ?? is too.

I completely accept that it is not self-explanatory. That's the cost of 
symbols: they have to be learned, because they aren't self-explanatory 
in the way that well-named functions may be. But many things need to be 
learned, including such fundamental necessary (pseudo-)operators as . 
for attribute lookup and [] for item lookup, as well as maths operators 
like ** etc.

To fairly label a symbol as "cryptic", I would expect that their effects 
are mysterious, hard to understand or complicated. "await" and "async" 
are cryptic because to understand them, you have to grok asyncronous 
programming, and that's hard. Nevertheless, they're important enough 
that they deserve to be syntax, hard to understand or not.

But ?? is not hard to understand. It has a very simple explanation:

    expr1 ?? expr2

is just equivalent to 

    _tmp = expr1
    _tmp if _tmp is not None else expr2

except it doesn't create a temporary variable.

> * it defines a general purpose short-circuiting binary operator that can even
>   be used to express the existing semantics of ``and`` and ``or`` rather than
>   focusing solely and inflexibly on existence checking

Another point in its favour.

> * it names the proposed builtins in such a way that they provide a strong
>   mnemonic hint as to when the expression containing them will short-circuit
>   and skip evaluating the right operand

This is my biggest problem with your proposal. I just don't think this 
part is correct. I think that the exists() and missing() builtins are 
the weakest part of the proposal:

- exists() can be easily misunderstood as testing for NameError or
  LookupError, rather than whether the value is None;

- likewise for missing(), in reverse.

The problem here is that exists() *seems* to be so self-evident and 
obvious that there's no need to read the documentation in any detail. 
It's actually misleading -- while it is true that `is None` is a 
special case of existence checking, for applications that treat None 
as a missing value, that's not what most people expect "existence" to 
mean, and I'm not convinced that those coming from languages with a ?? 
operator think of it as existence checking either.

That's why they call it Null Coalesing rather than Existence Checking.

Whereas ?? is just unfamiliar enough to keep the user on their toes and 
discourage them from making assumptions. If they know the ?? operator 
from another language, there won't be any big surprises for them. If 
they don't, they should find it documented as both the actual 
implementation in terms of the `else` circuit breaker (for experts!) and 
as a conceptually simpler form in terms of if...else.

* * * 

Whew! Nick, this is a big, complex PEP, and thank you for taking the 
time to write it. But its also hard to understand -- there's a lot of 
detail, and there are places where it is easy for the reader to get 
mislead by their assumptions and only get corrected deep, deep into the 
PEP. At least that's my experience.

I think I'd find this PEP easier to understand if it were split into 
two, like Guido's type-hints PEPs: one to explain the protocol, and one 
to detail the new built-ins and syntactic sugar which rely on the 
protocol. Or maybe that's just me.

I really like this PEP as a way of introducing configurable short- 
circuiting behaviour, without messing with `and` and `or`. That's 
really, really nice. I like your decision to keep the ?. and ?[] sugar, 
as short-cuts for code based on this Circuit Breaker protocol.

But I cannot support the exists() and missing() builtins as they stand. 
I think the circuit breakers themselves work well as a concept, but:

- I think that the names will be more harmful than helpful;

- I don't think that having to explicitly call a circuit breaker is a 
  good substitute for the ?? operator.

If I absolutely had to choose between this and nothing, I'd say +1 for 
this. But if I had to choose between ?? as a operator, and a generic 
circuit breaking protocol with no operator but exists() builtin instead, 
well, that would be a really hard decision.

-- 
Steve