[Python-ideas] PEP 532: A circuit breaking operator and protocol

Sat Nov 5 05:50:44 EDT 2016

Hi folks,

As promised, here's a follow-up to the withdrawn PEP 531 that focuses
on coming up with a common protocol driven solution to conditional
evaluation of subexpressions that also addresses the element-wise
comparison chaining problem that Guido noted when rejecting PEP 335.

I quite like how it came out, but see the "Risks & Concerns" section
for a discussion of an internal inconsistency it would introduce into
the language if accepted as currently written, and the potentially
far-reaching consequences actually resolving that inconsistency might
have on the way people write their Python code (if the PEP was
subsequently approved).

Regards,
Nick.

=================================
PEP: 532
Title: A circuit breaking operator and protocol
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-Oct-2016
Python-Version: 3.7
Post-History: 5-Nov-2016

Abstract
========

Inspired by PEP 335, PEP 505, PEP 531, and the related discussions, this PEP
proposes the addition of a new protocol-driven circuit breaking operator to
Python that allows the left operand to decide whether or not the expression
should short circuit and return a result immediately, or else continue
on with evaluation of the right operand::

    exists(foo) else bar
    missing(foo) else foo.bar()

These two expressions can be read as:

* "the expression result is 'foo' if it exists, otherwise it is 'bar'"
* "the expression result is 'foo' if it is missing, otherwise it is 'foo.bar()'"

Execution of these expressions relies on a new circuit breaking protocol that
implicitly avoids repeated evaluation of the left operand while letting
that operand fully control the result of the expression, regardless of whether
it skips evaluation of the right operand or not::

    _lhs = LHS
    type(_lhs).__then__(_lhs) if _lhs else type(_lhs).__else__(_lhs, RHS)

To properly support logical negation of circuit breakers, a new ``__not__``
protocol method would also be introduced allowing objects to control
the result of ``not obj`` expressions.

As shown in the basic example above, the PEP further proposes the addition of
builtin ``exists`` and ``missing`` circuit breakers that provide conditional
branching based on whether or not an object is ``None``, but return the
original object rather than the existence checking wrapper when the expression
evaluation short circuits.

In addition to being usable as simple boolean operators (e.g. as in
``assert all(exists, items)`` or ``if any(missing, items):``), these circuit
breakers will allow existence checking fallback operations (aka None-coalescing
operations) to be written as::

    value = exists(expr1) else exists(expr2) else expr3

and existence checking precondition operations (aka None-propagating
or None-severing operations) to be written as::

    value = missing(obj) else obj.field.of.interest
    value = missing(obj) else obj["field"]["of"]["interest"]

A change to the definition of chained comparisons is also proposed, where
the comparison chaining will be updated to use the circuit breaking operator
rather than the logical disjunction (``and``) operator if the left hand
comparison returns a circuit breaker as its result.

While there are some practical complexities arising from the current handling
of single-valued arrays in NumPy, this change should be sufficient to allow
elementwise chained comparison operations for matrices, where the result
is a matrix of boolean values, rather than tautologically returning ``True``
or raising ``ValueError``.

Relationship with other PEPs
============================

This PEP is a direct successor to PEP 531, replacing the existence checking
protocol and the new ``?then`` and ``?else`` syntactic operators defined there
with a single protocol driven ``else`` operator and adjustments to the ``not``
operator. The existence checking use cases are taken from that PEP.

It is also a direct successor to PEP 335, which proposed the ability to
overload the ``and`` and ``or`` operators directly, with the ability to
overload the semantics of comparison chaining being one of the consequences
of that change. The proposal in this PEP to instead handle the element-wise
comparison use case by changing the semantic definition of comparison chaining
is drawn from Guido's rejection of PEP 335.

This PEP competes with the dedicated null-coalescing operator in PEP 505,
proposing that improved support for null-coalescing operations be offered
through a more general protocol-driven short circuiting operator and related
builtins, rather than through a dedicated single-purpose operator.

It doesn't compete with PEP 505's proposed shorthands for existence checking
attribute access and subscripting, but instead offers an alternative underlying
semantic framework for defining them:

* ``EXPR?.attr`` would be syntactic sugar for ``missing(EXPR) else EXPR.attr``
* ``EXPR?[key]`` would be syntactic sugar for ``missing(EXPR) else EXPR[key]``

In both cases, the dedicated syntactic form could be optimised to avoid
actually creating the circuit breaker instance.

Specification
=============

The circuit breaking operator (``else``)
----------------------------------------

Circuit breaking expressions would be written using ``else`` as a new binary
operator, akin to the existing ``and`` and ``or`` logical operators::

    LHS else RHS

Ignoring the hidden variable assignment, this is semantically equivalent to::

    _lhs = LHS
    type(_lhs).__then__(_lhs) if _lhs else type(_lhs).__else__(_lhs, RHS)

The key difference relative to the existing ``or`` operator is that the value
determining which branch of the conditional expression gets executed *also*
gets a chance to postprocess the results of the expressions on each of the
branches.

As part of the short-circuiting behaviour, interpreter implementations
are expected to access only the protocol method needed for the branch
that is actually executed, but it is still recommended that circuit
breaker authors that always return ``True`` or always return ``False`` from
``__bool__`` explicitly raise ``NotImplementedError`` with a suitable
message from branch methods that are never expected to be executed (see the
comparison chaining use case in the Rationale section below for an example
of that).

It is proposed that the ``else`` operator use a new precedence level that binds
less tightly than the ``or`` operator by adjusting the relevant line in
Python's grammar from the current::

    test: or_test ['if' or_test 'else' test] | lambdef

to instead be::

    test: else_test ['if' or_test 'else' test] | lambdef
    else_test: or_test ['else' test]

The definition of ``test_nocond`` would remain unchanged, so circuit
breaking expressions would require parentheses when used in the ``if``
clause of comprehensions and generator expressions just as conditional
expressions themselves do.

This grammar definition means precedence/associativity in the otherwise
ambiguous case of ``expr1 if cond else expr2 else epxr3`` resolves as
``(expr1 if cond else expr2) else epxr3``.

A guideline will also be added to PEP 8 to say "don't do that", as such a
construct will be inherently confusing for readers, regardless of how the
interpreter executes it.

Overloading logical inversion (``not``)
---------------------------------------

Any circuit breaker definition will have a logical inverse that is still a
circuit breaker, but inverts the answer as to whether or not to short circuit
the expression evaluation. For example, the ``exists`` and ``missing`` circuit
breakers proposed in this PEP are each other's logical inverse.

A new protocol method, ``__not__(self)``, will be introduced to permit circuit
breakers and other types to override ``not`` expressions to return their
logical inverse rather than a coerced boolean result.

To preserve the semantics of existing language optimisations, ``__not__``
implementations will be obliged to respect the following invariant::

    assert not bool(obj) == bool(not obj)

Chained comparisons
-------------------

A chained comparison like ``0 < x < 10`` written as::

    LEFT_BOUND left_op EXPR right_op RIGHT_BOUND

is currently roughly semantically equivalent to::

    _expr = EXPR
    _lhs_result = LEFT_BOUND left_op _expr
    _expr_result = _lhs_result and (_expr right_op RIGHT_BOUND)

This PEP proposes that this be changed to explicitly check if the left
comparison returns a circuit breaker, and if so, use ``else`` rather than
``and`` to implement the comparison chaining::

    _expr = EXPR
    _lhs_result = LEFT_BOUND left_op _expr
    if hasattr(type(_lhs_result), "__then__"):
        _expr_result = _lhs_result else (_expr right_op RIGHT_BOUND)
    else:
        _expr_result = _lhs_result and (_expr right_op RIGHT_BOUND)

This allows types like NumPy arrays to control the behaviour of chained
comparisons by returning circuit breakers from comparison operations.

Existence checking comparisons
------------------------------

Two new builtins implementing the new protocol are proposed to encapsulate the
notion of "existence checking": seeing if a value is ``None`` and either
falling back to an alternative value (an operation known as "None-coalescing")
or passing it through as the result of the overall expression (an operation
known as "None-severing" or "None-propagating").

These builtins would be defined as follows::

    class CircuitBreaker:
        """Base circuit breaker type (available as types.CircuitBreaker)"""
        def __init__(self, value, condition, inverse_type):
            self.value = value
            self._condition = condition
            self._inverse_type = inverse_type
        def __bool__(self):
            return self._condition
        def __not__(self):
            return self._inverse_type(self.value)
        def __then__(self):
            return self.value
        def __else__(self, other):
            if other is self:
                return self.value
            return other

    class exists(types.CircuitBreaker):
        """Circuit breaker for 'EXPR is not None' checks"""
        def __init__(self, value):
            super().__init__(value, value is not None, missing)

    class missing(types.CircuitBreaker):
        """Circuit breaker for 'EXPR is None' checks"""
        def __init__(self, value):
            super().__init__(value, value is None, exists)

Aside from changing the definition of ``__bool__`` to be based on
``is not None`` rather than normal truth checking, the key characteristic of
``exists`` is that when it is used as a circuit breaker, it is *ephemeral*:
when it is told that short circuiting has taken place, it returns the original
value, rather than the existence checking wrapper.

``missing`` is defined as the logically inverted counterpart of ``exists``:
``not exists(obj)`` is semantically equivalent to ``missing(obj)``.

The ``__else__`` implementations for both builtin circuit breakers are defined
such that the wrapper will always be removed even if you explicitly pass the
circuit breaker to both sides of the ``else`` expression::

    breaker = exists(foo)
    assert (breaker else breaker) is foo
    breaker = missing(foo)
    assert (breaker else breaker) is foo

Other conditional constructs
----------------------------

No changes are proposed to if statements, while statements, conditional
expressions, comprehensions, or generator expressions, as the boolean clauses
they contain are already used for control flow purposes.

However, it's worth noting that while such proposals are outside the scope of
this PEP, the circuit breaking protocol defined here would be sufficient to
support constructs like::

    while exists(dynamic_query()) as result:
        ... # Code using result

and:

    if exists(re.search(pattern, text)) as match:
        ... # Code using match

Leaving the door open to such a future extension is the main reason for
recommending that circuit breaker implementations handle the ``self is other``
case in ``__else__`` implementations the same way as they handle the
short-circuiting behaviour in ``__then__``.

Style guide recommendations
---------------------------

The following additions to PEP 8 are proposed in relation to the new features
introduced by this PEP:

* In the absence of other considerations, prefer the use of the builtin
  circuit breakers ``exists`` and ``missing`` over the corresponding
  conditional expressions

* Do not combine conditional expressions (``if-else``) and circuit breaking
  expressions (the ``else`` operator) in a single expression - use one or the
  other depending on the situation, but not both.

Rationale
=========

Adding a new operator
---------------------

Similar to PEP 335, early drafts of this PEP focused on making the existing
``and`` and ``or`` operators less rigid in their interpretation, rather than
proposing new operators. However, this proved to be problematic for a few
reasons:

* defining a shared protocol for both ``and`` and ``or`` was confusing, as
  ``__then__`` was the short-circuiting outcome for ``or``, while ``__else__``
  was the short-circuiting outcome for ``and``
* the ``and`` and ``or`` operators have a long established and stable meaning,
  so readers would inevitably be surprised if their meaning now became
  dependent on the type of the left operand. Even new users would be confused
  by this change due to 25+ years of teaching material that assumes the
  current well-known semantics for these operators
* Python interpreter implementations, including CPython, have taken advantage
  of the existing semantics of ``and`` and ``or`` when defining runtime and
  compile time optimisations, which would all need to be reviewed and
  potentially discarded if the semantics of those operations changed

Proposing a single new operator instead resolves all of those issues -
``__then__`` always indicates short circuiting, ``__else__`` only indicates
"short circuiting" if the circuit breaker itself is also passed in as the
right operand, and the semantics of ``and`` and ``or`` remain entirely
unchanged. While the semantics of the unary ``not`` operator do change, the
invariant required of ``__not__`` implementations means that existing
expression optimisations in boolean contexts will remain valid.

As a result of that design simplification, the new protocol and operator would
even allow us to expose ``operator.true`` and ``operator.false``
as circuit breaker definitions if we chose to do so::

    class true(types.CircuitBreaker):
        """Circuit breaker for 'bool(EXPR)' checks"""
        def __init__(self, value):
            super().__init__(value, bool(value), when_false)

    class false(types.CircuitBreaker):
        """Circuit breaker for 'not bool(EXPR)' checks"""
        def __init__(self, value):
            super().__init__(value, not bool(value), when_true)

Given those circuit breakers:

* ``LHS or RHS`` would be roughly ``operator.true(LHS) else RHS``
* ``LHS and RHS`` would be roughly ``operator.false(LHS) else RHS``

Naming the operator and protocol
--------------------------------

The names "circuit breaking operator", "circuit breaking protocol" and
"circuit breaker" are all inspired by the phrase "short circuiting operator":
the general language design term for operators that only conditionally
evaluate their right operand.

The electrical analogy is that circuit breakers in Python detect and handle
short circuits in expressions before they trigger any exceptions similar to the
way that circuit breakers detect and handle short circuits in electrical
systems before they damage any equipment or harm any humans.

The Python level analogy is that just as a ``break`` statement lets you
terminate a loop before it reaches its natural conclusion, a circuit breaking
expression lets you terminate evaluation of the expression and produce a result
immediately.

Using an existing keyword
-------------------------

Using an existing keyword has the benefit of allowing the new expression to
be introduced without a ``__future__`` statement.

``else`` is semantically appropriate for the proposed new protocol, and the
only syntactic ambiguity introduced arises when the new operator is combined
with the explicit ``if-else`` conditional expression syntax.

Element-wise chained comparisons
--------------------------------

In ultimately rejecting PEP 335, Guido van Rossum noted [1_]:

    The NumPy folks brought up a somewhat separate issue: for them,
    the most common use case is chained comparisons (e.g. A < B < C).

To understand this observation, we first need to look at how comparisons work
with NumPy arrays::

    >>> import numpy as np
    >>> increasing = np.arange(5)
    >>> increasing
    array([0, 1, 2, 3, 4])
    >>> decreasing = np.arange(4, -1, -1)
    >>> decreasing
    array([4, 3, 2, 1, 0])
    >>> increasing < decreasing
    array([ True,  True, False, False, False], dtype=bool)

Here we see that NumPy array comparisons are element-wise by default, comparing
each element in the lefthand array to the corresponding element in the righthand
array, and producing a matrix of boolean results.

If either side of the comparison is a scalar value, then it is broadcast across
the array and compared to each individual element::

    >>> 0 < increasing
    array([False,  True,  True,  True,  True], dtype=bool)
    >>> increasing < 4
    array([ True,  True,  True,  True, False], dtype=bool)

However, this broadcasting idiom breaks down if we attempt to use chained
comparisons::

    >>> 0 < increasing < 4
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element
is ambiguous. Use a.any() or a.all()

The problem is that internally, Python implicitly expands this chained
comparison into the form::

    >>> 0 < increasing and increasing < 4
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element
is ambiguous. Use a.any() or a.all()

And NumPy only permits implicit coercion to a boolean value for single-element
arrays where ``a.any()`` and ``a.all()`` can be assured of having the same
result::

    >>> np.array([False]) and np.array([False])
    array([False], dtype=bool)
    >>> np.array([False]) and np.array([True])
    array([False], dtype=bool)
    >>> np.array([True]) and np.array([False])
    array([False], dtype=bool)
    >>> np.array([True]) and np.array([True])
    array([ True], dtype=bool)

The proposal in this PEP would allow this situation to be changed by updating
the definition of element-wise comparison operations in NumPy to return a
dedicated subclass that implements the new circuit breaking protocol and also
changes the result array's interpretation in a boolean context to always
return ``False`` and hence never trigger the short-circuiting behaviour::

    class ComparisonResultArray(np.ndarray):
        def __bool__(self):
            return False
        def _raise_NotImplementedError(self):
            msg = ("Comparison array truth values are ambiguous outside "
                   "chained comparisons. Use a.any() or a.all()")
            raise NotImplementedError(msg)
        def __not__(self):
            self._raise_NotImplementedError()
        def __then__(self):
            self._raise_NotImplementedError()
        def __else__(self, other):
            return np.logical_and(self, other.view(ComparisonResultArray))

With this change, the chained comparison example above would be able to return::

    >>> 0 < increasing < 4
    ComparisonResultArray([ False,  True,  True,  True, False], dtype=bool)

Existence checking expressions
------------------------------

An increasingly common requirement in modern software development is the need
to work with "semi-structured data": data where the structure of the data is
known in advance, but pieces of it may be missing at runtime, and the software
manipulating that data is expected to degrade gracefully (e.g. by omitting
results that depend on the missing data) rather than failing outright.

Some particularly common cases where this issue arises are:

* handling optional application configuration settings and function parameters
* handling external service failures in distributed systems
* handling data sets that include some partial records

At the moment, writing such software in Python can be genuinely awkward, as
your code ends up littered with expressions like:

* ``value1 = expr1.field.of.interest if expr1 is not None else None``
* ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None``
* ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not
None else expr5``

PEP 531 goes into more detail on some of the challenges of working with this
kind of data, particularly in data transformation pipelines where dealing with
potentially missing content is the norm rather than the exception.

The combined impact of the proposals in this PEP is to allow the above sample
expressions to instead be written as:

* ``value1 = missing(expr1) else expr1.field.of.interest``
* ``value2 = missing(expr2) else expr2.["field"]["of"]["interest"]``
* ``value3 = exists(expr3) else exists(expr4) else expr5``

In these forms, significantly more of the text presented to the reader is
immediately relevant to the question "What does this code do?", while the
boilerplate code to handle missing data by passing it through to the output
or falling back to an alternative input, has shrunk to two uses of the new
``missing`` builtin, and two uses of the new ``exists`` builtin.

In the first two examples, the 31 character boilerplate suffix
``if exprN is not None else None`` (minimally 27 characters for a single letter
variable name) has been replaced by a 19 character ``missing(expr1) else``
prefix (minimally 15 characters with a single letter variable name), markedly
improving the signal-to-pattern-noise ratio of the lines (especially if it
encourages the use of more meaningful variable and field names rather than
making them shorter purely for the sake of expression brevity). The additional
syntactic sugar proposals in PEP 505 would further reduce this boilerplate to
a single ``?`` character that also eliminated the repetition of the expession
being checked for existence.

In the last example, not only are two instances of the 21 character boilerplate,
`` if exprN is not None`` (minimally 17 characters) replaced with the
8 character function call ``exists()``, but that function call is placed
directly around the original expression, eliminating the need to duplicate it
in the conditional existence check.

Risks and concerns
==================

This PEP has been designed specifically to address the risks and concerns
raised when discussing PEPs 335, 505 and 531.

* it defines a new operator and adjusts the definition of chained comparison
  rather than impacting the existing ``and`` and ``or`` operators
* the changes to the ``not`` unary operator are defined in such a way that
  control flow optimisations based on the existing semantics remain valid
* rather than the cryptic ``??``, it uses ``else`` as the operator keyword in
  exactly the same sense as it is already used in conditional expressions
* it defines a general purpose short-circuiting binary operator that can even
  be used to express the existing semantics of ``and`` and ``or`` rather than
  focusing solely and inflexibly on existence checking
* it names the proposed builtins in such a way that they provide a strong
  mnemonic hint as to when the expression containing them will short-circuit
  and skip evaluating the right operand

Possible confusion with conditional expressions
-----------------------------------------------

The proposal in this PEP is essentially for an "implied ``if``" where if you
omit the ``if`` clause from a conditional expression, you invoke the circuit
breaking protocol instead. That is::

    exists(foo) else calculate_default()

invokes the new protocol, but::

    foo.field.of.interest if exists(foo) else calculate_default()

bypasses it entirely, *including* the non-short-circuiting ``__else__`` method.

This mostly wouldn't be a problem for the proposed ``types.CircuitBreaker``
implementation (and hence the ``exists`` and ``missing`` builtins), as the
only purpose the extended protocol serves in that case is to remove the
wrapper in the short-circuiting case - the ``__else__`` method passes the
right operand through unchanged.

However, this discrepancy could potentially be eliminated entirely by also
updating conditional expressions to use the circuit breaking protocol if
the condition defines those methods. In that case, ``__then__`` would need
to be updated to accept the left operand as a parameter, with short-circuiting
indicated by passing in the circuit breaker itself::

    class CircuitBreaker:
        """Base circuit breaker type (available as types.CircuitBreaker)"""
        def __init__(self, value, condition, inverse_type):
            self.value = value
            self._condition = condition
            self._inverse_type = inverse_type
        def __bool__(self):
            return self._condition
        def __not__(self):
            return self._inverse_type(self.value)
        def __then__(self, other):
            if other is not self:
                return other
            return self.value # Short-circuit, remove the wrapper
        def __else__(self, other):
            if other is not self:
                return other
            return self.value # Short-circuit, remove the wrapper

With this symmetric protocol, the definition of conditional expressions
could be updated to also make the ``else`` clause optional::

    test: else_test ['if' or_test ['else' test]] | lambdef
    else_test: or_test ['else' test]

(We would avoid the apparent simplification to ``else_test ('if' else_test)*``
in order to make it easier to correctly preserve the semantics of normal
conditional expressions)

Given that expanded definition, the following statements would be
functionally equivalent::

    foo = calculate_default() if missing(foo)
    foo = calculate_default() if foo is None else foo

Just as the base proposal already makes the following equivalent::

    foo = exists(foo) else calculate_default()
    foo = foo if foo is not None else calculate_default()

The ``if`` based circuit breaker form has the virtue of reading significantly
better when used for conditional imperative commands like debug messages::

    print(some_expensive_query()) if verbosity > 2

If we went down this path, then ``operator.true`` would need to be declared
as the nominal implicit circuit breaker when the condition didn't define the
circuit breaker protocol itself (so the above example would produce ``None``
if the debugging message was printed, and ``False`` otherwise)

The main objection to this expansion of the proposal is that it makes it a
more intrusive change that may potentially affect the behaviour of existing
code, while the main point in its favour is that allowing both ``if`` and
``else`` as circuit breaking operators and also supporting the circuit breaking
protocol for normal conditional expressions would be significantly more
self-consistent than special-casing a bare ``else`` as a stand-alone operator.

Design Discussion
=================

Arbitrary sentinel objects
--------------------------

Unlike PEPs 505 and 531, this proposal readily handles custom sentinel objects::

    class defined(types.CircuitBreaker):
        MISSING = object()
        def __init__(self, value):
            super().__init__(self, value is not self.MISSING, undefined)

    class undefined(types.CircuitBreaker):
        def __init__(self, value):
            super().__init__(self, value is defined.MISSING, defined)

    # Using the sentinel to check whether or not an argument was supplied
    def my_func(arg=defined.MISSING):
        arg = defined(arg) else calculate_default()

Implementation
==============

As with PEP 505, actual implementation has been deferred pending in-principle
interest in the idea of making these changes - aside from the possible
syntactic ambiguity concerns covered by the grammer proposals above, the
implementation isn't really the hard part of these proposals, the hard part
is deciding whether or not this is a change where the long term benefits for
new and existing Python users outweigh the short term costs involved in the
wider ecosystem (including developers of other implementations, language
curriculum developers, and authors of other Python related educational
material) adjusting to the change.

...TBD...

Acknowledgements
================

Thanks go to Mark E. Haase for feedback on and contributions to earlier drafts
of this proposal. However, his clear and exhaustive explanation of the original
protocol design that modified the semantics of ``if-else`` conditional
expressions to use an underlying ``__then__``/``__else__`` protocol helped
convince me it was too complicated to keep, so this iteration contains neither
that version of the protocol, nor Mark's explanation of it.

References
==========

.. [1] PEP 335 rejection notification
   (http://mail.python.org/pipermail/python-dev/2012-March/117510.html)

Copyright
=========

This document has been placed in the public domain under the terms of the
CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia