PEP 531: Existence checking operators

Hi folks, After the recent discussions of PEP 505's null-coalescing operator (and the significant confusion around why anyone would ever want a feature like that), I was inspired to put together a competing proposal that focuses more on defining a new "existence checking" protocol that generalises the current practicises of: * obj is not None (many different use cases) * obj is not Ellipsis (in multi-dimensional slicing) * obj is not NotImplemented (in operand coercion) * math.isnan(value) * cmath.isnan(value) * decimal.getcontext().is_nan(value) Given that protocol as a basis, it then proceeds to define "?then" and "?else" as existence checking counterparts to the truth-checking "and" and "or", as well as "?.", "?[]" and "?=" as abbreviations for particular idiomatic uses of "?then" and "?else". I think this approach frames the discussion in a more productive way, as it gives us a series of questions to consider in order where a collective answer of "No" at any point would be enough to kill this particular proposal (or parts of it), but precisely *where* we say "No" will determine which future alternatives might be worth considering: 1. Do we collectively agree that "existence checking" is a useful general concept that exists in software development and is distinct from the concept of "truth checking"? 2. Do we collectively agree that the Python ecosystem would benefit from an existence checking protocol that permits generalisation of algorithms (especially short circuiting ones) across different "data missing" indicators, including those defined in the language definition, the standard library, and custom user code? 3. Do we collectively agree that it would be easier to use such a protocol effectively if existence-checking equivalents to the truth-checking "and" and "or" control flow operators were available? Only if we have at least some level of consensus on the above questions regarding whether or not this is a conceptual modeling problem worth addressing at the language level does it then make sense to move on to the more detailed questions regarding the specific proposed *solution* to the problem in the PEP: 4. Do we collectively agree that "?then" and "?else" would be reasonable spellings for such operators? 5a. Do we collectively agree that "access this attribute only if the object exists" would be a particularly common use case for such operators? 5b. Do we collectively agree that "access this subscript only if the object exists" would be a particularly common use case for such operators? 5c. Do we collectively agree that "bind this value to this target only if the value currently bound to the target nominally doesn't exist" would be a particularly common use case for such operators? 6a. Do we collectively agree that 'obj?.attr' would be a reasonable spelling for "access this attribute only if the object exists"? 6b. Do we collectively agree that 'obj?[expr]' would be a reasonable spelling for "access this subscript only if the object exists"? 6c. Do we collectively agree that 'target ?= expr' would be a reasonable spelling for "bind this value to this target only if the value currently bound to the target nominally doesn't exist"? To be clear, this would be a *really* big addition to the language that would have significant long term ramifications for how the language gets taught to new developers. At the same time, asking whether or not an object represents an absence of data rather than the truth of a proposition seems to me like a sufficiently common problem in a wide enough variety of domains that it may be worth elevating to the level of giving it dedicated syntactic support. Regards, Nick. Rendered HTML version: https://www.python.org/dev/peps/pep-0531/ =============================== PEP: 531 Title: Existence checking operators Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan <ncoghlan@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 25-Oct-2016 Python-Version: 3.7 Post-History: 28-Oct-2016 Abstract ======== Inspired by PEP 505 and the related discussions, this PEP proposes the addition of two new control flow operators to Python: * Existence-checking precondition ("exists-then"): ``expr1 ?then expr2`` * Existence-checking fallback ("exists-else"): ``expr1 ?else expr2`` as well as the following abbreviations for common existence checking expressions and statements: * Existence-checking attribute access: ``obj?.attr`` (for ``obj ?then obj.attr``) * Existence-checking subscripting: ``obj?[expr]`` (for ``obj ?then obj[expr]``) * Existence-checking assignment: ``value ?= expr`` (for ``value = value ?else expr``) The common ``?`` symbol in these new operator definitions indicates that they use a new "existence checking" protocol rather than the established truth-checking protocol used by if statements, while loops, comprehensions, generator expressions, conditional expressions, logical conjunction, and logical disjunction. This new protocol would be made available as ``operator.exists``, with the following characteristics: * types can define a new ``__exists__`` magic method (Python) or ``tp_exists`` slot (C) to override the default behaviour. This optional method has the same signature and possible return values as ``__bool__``. * ``operator.exists(None)`` returns ``False`` * ``operator.exists(NotImplemented)`` returns ``False`` * ``operator.exists(Ellipsis)`` returns ``False`` * ``float``, ``complex`` and ``decimal.Decimal`` will override the existence check such that ``NaN`` values return ``False`` and other values (including zero values) return ``True`` * for any other type, ``operator.exists(obj)`` returns True by default. Most importantly, values that evaluate to False in a truth checking context (zeroes, empty containers) will still evaluate to True in an existence checking context Relationship with other PEPs ============================ While this PEP was inspired by and builds on Mark Haase's excellent work in putting together PEP 505, it ultimately competes with that PEP due to significant differences in the specifics of the proposed syntax and semantics for the feature. It also presents a different perspective on the rationale for the change by focusing on the benefits to existing Python users as the typical demands of application and service development activities are genuinely changing. It isn't an accident that similar features are now appearing in multiple programming languages, and while it's a good idea for us to learn from how other language designers are handling the problem, precedents being set elsewhere are more relevant to *how* we would go about tackling this problem than they are to whether or not we think it's a problem we should address in the first place. Rationale ========= Existence checking expressions ------------------------------ An increasingly common requirement in modern software development is the need to work with "semi-structured data": data where the structure of the data is known in advance, but pieces of it may be missing at runtime, and the software manipulating that data is expected to degrade gracefully (e.g. by omitting results that depend on the missing data) rather than failing outright. Some particularly common cases where this issue arises are: * handling optional application configuration settings and function parameters * handling external service failures in distributed systems * handling data sets that include some partial records It is the latter two cases that are the primary motivation for this PEP - while needing to deal with optional configuration settings and parameters is a design requirement at least as old as Python itself, the rise of public cloud infrastructure, the development of software systems as collaborative networks of distributed services, and the availability of large public and private data sets for analysis means that the ability to degrade operations gracefully in the face of partial service failures or partial data availability is becoming an essential feature of modern programming environments. At the moment, writing such software in Python can be genuinely awkward, as your code ends up littered with expressions like: * ``value1 = expr1.field.of.interest if expr1 is not None else None`` * ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None`` * ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not None else expr5`` If these are only occasional, then expanding out to full statement forms may help improve readability, but if you have 4 or 5 of them in a row (which is a fairly common situation in data transformation pipelines), then replacing them with 16 or 20 lines of conditional logic really doesn't help matters. Expanding the three examples above that way hopefully helps illustrate that:: _expr1 = expr1 if _expr1 is not None: value1 = _expr1.field.of.interest else: value1 = None _expr2 = expr2 if _expr2 is not None: value2 = _expr2["field"]["of"]["interest"] else: value2 = None _expr3 = expr3 if _expr3 is not None: value3 = _expr3 else: _expr4 = expr4 if _expr4 is not None: value3 = _expr4 else: value3 = expr5 The combined impact of the proposals in this PEP is to allow the above sample expressions to instead be written as: * ``value1 = expr1?.field.of.interest`` * ``value2 = expr2?["field"]["of"]["interest"]`` * ``value3 = expr3 ?else expr4 ?else expr5`` In these forms, almost all of the information presented to the reader is immediately relevant to the question "What does this code do?", while the boilerplate code to handle missing data by passing it through to the output or falling back to an alternative input, has shrunk to two uses of the ``?`` symbol and two uses of the ``?else`` keyword. In the first two examples, the 31 character boilerplate clause `` if exprN is not None else None`` (minimally 27 characters for a single letter variable name) has been replaced by a single ``?`` character, substantially improving the signal-to-pattern-noise ratio of the lines (especially if it encourages the use of more meaningful variable and field names rather than making them shorter purely for the sake of expression brevity). In the last example, two instances of the 21 character boilerplate, `` if exprN is not None`` (minimally 17 characters) are replaced with single characters, again substantially improving the signal-to-pattern-noise ratio. Furthermore, each of our 5 "subexpressions of potential interest" is included exactly once, rather than 4 of them needing to be duplicated or pulled out to a named variable in order to first check if they exist. The existence checking precondition operator is mainly defined to provide a clear conceptual basis for the existence checking attribute access and subscripting operators: * ``obj?.attr`` is roughly equivalent to ``obj ?then obj.attr`` * ``obj?[expr]``is roughly equivalent to ``obj ?then obj[expr]`` The main semantic difference between the shorthand forms and their expanded equivalents is that the common subexpression to the left of the existence checking operator is evaluated only once in the shorthand form (similar to the benefit offered by augmented assignment statements). Existence checking assignment ----------------------------- Existence-checking assignment is proposed as a relatively straightforward expansion of the concepts in this PEP to also cover the common configuration handling idiom: * ``value = value if value is not None else expensive_default()`` by allowing that to instead be abbreviated as: * ``value ?= expensive_default()`` This is mainly beneficial when the target is a subscript operation or subattribute, as even without this specific change, the PEP would still permit this idiom to be updated to: * ``value = value ?else expensive_default()`` The main argument *against* adding this form is that it's arguably ambiguous and could mean either: * ``value = value ?else expensive_default()``; or * ``value = value ?then value.subfield.of.interest`` The second form isn't at all useful, but if this concern was deemed significant enough to address while still keeping the augmented assignment feature, the full keyword could be included in the syntax: * ``value ?else= expensive_default()`` Alternatively, augmented assignment could just be dropped from the current proposal entirely and potentially reconsidered at a later date. Existence checking protocol --------------------------- The existence checking protocol is including in this proposal primarily to allow for proxy objects (e.g. local representations of remote resources) and mock objects used in testing to correctly indicate non-existence of target resources, even though the proxy or mock object itself is not None. However, with that protocol defined, it then seems natural to expand it to provide a type independent way of checking for ``NaN`` values in numeric types - at the moment you need to be aware of the exact data type you're working with (e.g. builtin floats, builtin complex numbers, the decimal module) and use the appropriate operation (e.g. ``math.isnan``, ``cmath.isnan``, ``decimal.getcontext().is_nan()``, respectively) Similarly, it seems reasonable to declare that the other placeholder builtin singletons, ``Ellipsis`` and ``NotImplemented``, also qualify as objects that represent the absence of data moreso than they represent data. Proposed symbolic notation -------------------------- Python has historically only had one kind of implied boolean context: truth checking, which can be invoked directly via the ``bool()`` builtin. As this PEP proposes a new kind of control flow operation based on existence checking rather than truth checking, it is considered valuable to have a reminder directly in the code when existence checking is being used rather than truth checking. The mathematical symbol for existence assertions is U+2203 'THERE EXISTS': ``∃`` Accordingly, one possible approach to the syntactic additions proposed in this PEP would be to use that already defined mathematical notation: * ``expr1 ∃then expr2`` * ``expr1 ∃else expr2`` * ``obj∃.attr`` * ``obj∃[expr]`` * ``target ∃= expr`` However, there are two major problems with that approach, one practical, and one pedagogical. The practical problem is the usual one that most keyboards don't offer any easy way of entering mathematical symbols other than those used in basic arithmetic (even the symbols appearing in this PEP were ultimately copied & pasted from [3]_ rather than being entered directly). The pedagogical problem is that the symbols for existence assertions (``∃``) and universal assertions (``∀``) aren't going to be familiar to most people the way basic arithmetic operators are, so we wouldn't actually be making the proposed syntax easier to understand by adopting ``∃``. By contrast, ``?`` is one of the few remaining unused ASCII punctuation characters in Python's syntax, making it available as a candidate syntactic marker for "this control flow operation is based on an existence check, not a truth check". Taking that path would also have the advantage of aligning Python's syntax with corresponding syntax in other languages that offer similar features. Drawing from the existing summary in PEP 505 and the Wikipedia articles on the "safe navigation operator [1]_ and the "null coalescing operator" [2]_, we see: * The ``?.`` existence checking attribute access syntax precisely aligns with: * the "safe navigation" attribute access operator in C# (``?.``) * the "optional chaining" operator in Swift (``?.``) * the "safe navigation" attribute access operator in Groovy (``?.``) * the "conditional member access" operator in Dart (``?.``) * The ``?[]`` existence checking attribute access syntax precisely aligns with: * the "safe navigation" subscript operator in C# (``?[]``) * the "optional subscript" operator in Swift (``?[].``) * The ``?else`` existence checking fallback syntax semantically aligns with: * the "null-coalescing" operator in C# (``??``) * the "null-coalescing" operator in PHP (``??``) * the "nil-coalescing" operator in Swift (``??``) To be clear, these aren't the only spelling of these operators used in other languages, but they're the most common ones, and the ``?`` symbol is the most common syntactic marker by far (presumably prompted by the use of ``?`` to introduce the "then" clause in C-style conditional expressions, which many of these languages also offer). Proposed keywords ----------------- Given the symbolic marker ``?``, it would be syntactically unambiguous to spell the existence checking precondition and fallback operations using the same keywords as their truth checking counterparts: * ``expr1 ?and expr2`` (instead of ``expr1 ?then expr2``) * ``expr1 ?or expr2`` (instead of ``expr1 ?else expr2``) However, while syntactically unambiguous when written, this approach makes the code incredibly hard to *pronounce* (What's the pronunciation of "?"?) and also hard to *describe* (given reused keywords, there's no obvious shorthand terms for "existence checking precondition (?and)" and "existence checking fallback (?or)" that would distinguish them from "logical conjunction (and)" and "logical disjunction (or)"). We could try to encourage folks to pronounce the ``?`` symbol as "exists", making the shorthand names the "exists-and expression" and the "exists-or expression", but there'd be no way of guessing those names purely from seeing them written in a piece of code. Instead, this PEP takes advantage of the proposed symbolic syntax to introduce a new keyword (``?then``) and borrow an existing one (``?else``) in a way that allows people to refer to "then expressions" and "else expressions" without ambiguity. These keywords also align well with the conditional expressions that are semantically equivalent to the proposed expressions. For ``?else`` expressions, ``expr1 ?else expr2`` is equivalent to:: _lhs_result = expr1 _lhs_result if operator.exists(_lhs_result) else expr2 Here the parallel is clear, since the ``else expr2`` appears at the end of both the abbreviated and expanded forms. For ``?then`` expressions, ``expr1 ?then expr2`` is equivalent to:: _lhs_result = expr1 expr2 if operator.exists(_lhs_result) else _lhs_result Here the parallel isn't as immediately obvious due to Python's traditionally anonymous "then" clauses (introduced by ``:`` in ``if`` statements and suffixed by ``if`` in conditional expressions), but it's still reasonably clear as long as you're already familiar with the "if-then-else" explanation of conditional control flow. Risks and concerns ================== Readability ----------- Learning to read and write the new syntax effectively mainly requires internalising two concepts: * expressions containing ``?`` include an existence check and may short circuit * if ``None`` or another "non-existent" value is an expected input, and the correct handling is to propagate that to the result, then the existence checking operators are likely what you want Currently, these concepts aren't explicitly represented at the language level, so it's a matter of learning to recognise and use the various idiomatic patterns based on conditional expressions and statements. Magic syntax ------------ There's nothing about ``?`` as a syntactic element that inherently suggests ``is not None`` or ``operator.exists``. The main current use of ``?`` as a symbol in Python code is as a trailing suffix in IPython environments to request help information for the result of the preceding expression. However, the notion of existence checking really does benefit from a pervasive visual marker that distinguishes it from truth checking, and that calls for a single-character symbolic syntax if we're going to do it at all. Conceptual complexity --------------------- This proposal takes the currently ad hoc and informal concept of "existence checking" and elevates it to the status of being a syntactic language feature with a clearly defined operator protocol. In many ways, this should actually *reduce* the overall conceptual complexity of the language, as many more expectations will map correctly between truth checking with ``bool(expr)`` and existence checking with ``operator.exists(expr)`` than currently map between truth checking and existence checking with ``expr is not None`` (or ``expr is not NotImplemented`` in the context of operand coercion, or the various NaN-checking operations in mathematical libraries). As a simple example of the new parallels introduced by this PEP, compare:: all_are_true = all(map(bool, iterable)) at_least_one_is_true = any(map(bool, iterable)) all_exist = all(map(operator.exists, iterable)) at_least_one_exists = any(map(operator.exists, iterable)) Design Discussion ================= Subtleties in chaining existence checking expressions ----------------------------------------------------- Similar subtleties arise in chaining existence checking expressions as already exist in chaining logical operators: the behaviour can be surprising if the right hand side of one of the expressions in the chain itself returns a value that doesn't exist. As a result, ``value = arg1 ?then f(arg1) ?else default()`` would be dubious for essentially the same reason that ``value = cond and expr1 or expr2`` is dubious: the former will evaluate ``default()`` if ``f(arg1)`` returns ``None``, just as the latter will evaluate ``expr2`` if ``expr1`` evaluates to ``False`` in a boolean context. Ambiguous interaction with conditional expressions -------------------------------------------------- In the proposal as currently written, the following is a syntax error: * ``value = f(arg) if arg ?else default`` While the following is a valid operation that checks a second condition if the first doesn't exist rather than merely being false: * ``value = expr1 if cond1 ?else cond2 else expr2`` The expression chaining problem described above means that the argument can be made that the first operation should instead be equivalent to: * ``value = f(arg) if operator.exists(arg) else default`` requiring the second to be written in the arguably clearer form: * ``value = expr1 if (cond1 ?else cond2) else expr2`` Alternatively, the first form could remain a syntax error, and the existence checking symbol could instead be attached to the ``if`` keyword: * ``value = expr1 if? cond else expr2`` Existence checking in other truth-checking contexts --------------------------------------------------- The truth-checking protocol is currently used in the following syntactic constructs: * logical conjunction (and-expressions) * logical disjunction (or-expressions) * conditional expressions (if-else expressions) * if statements * while loops * filter clauses in comprehensions and generator expressions In the current PEP, switching from truth-checking with ``and`` and ``or`` to existence-checking is a matter of substituting in the new keywords, ``?then`` and ``?else`` in the appropriate places. For other truth-checking contexts, it proposes either importing and using the ``operator.exists`` API, or else continuing with the current idiom of checking specifically for ``expr is not None`` (or the context appropriate equivalent). The simplest possible enhancement in that regard would be to elevate the proposed ``exists()`` API from an operator module function to a new builtin function. Alternatively, the ``?`` existence checking symbol could be supported as a modifier on the ``if`` and ``while`` keywords to indicate the use of an existence check rather than a truth check. However, it isn't at all clear that the potential consistency benefits gained for either suggestion would justify the additional disruption, so they've currently been omitted from the proposal. Defining expected invariant relations between ``__bool__`` and ``__exists__`` ----------------------------------------------------------------------------- The PEP currently leaves the definition of ``__bool__`` on all existing types unmodified, which ensures the entire proposal remains backwards compatible, but results in the following cases where ``bool(obj)`` returns ``True``, but the proposed ``operator.exists(obj)`` would return ``False``: * ``NaN`` values for ``float``, ``complex``, and ``decimal.Decimal`` * ``Ellipsis`` * ``NotImplemented`` The main argument for potentially changing these is that it becomes easier to reason about potential code behaviour if we have a recommended invariant in place saying that values which indicate they don't exist in an existence checking context should also report themselves as being ``False`` in a truth checking context. Failing to define such an invariant would lead to arguably odd outcomes like ``float("NaN") ?else 0.0`` returning ``0.0`` while ``float("NaN") or 0.0`` returns ``NaN``. Limitations =========== Arbitrary sentinel objects -------------------------- This proposal doesn't attempt to provide syntactic support for the "sentinel object" idiom, where ``None`` is a permitted explicit value, so a separate sentinel object is defined to indicate missing values:: _SENTINEL = object() def f(obj=_SENTINEL): return obj if obj is not _SENTINEL else default_value() This could potentially be supported at the expense of making the existence protocol definition significantly more complex, both to define and to use: * at the Python layer, ``operator.exists`` and ``__exists__`` implementations would return the empty tuple to indicate non-existence, and otherwise return a singleton tuple containing a reference to the object to be used as the result of the existence check * at the C layer, ``tp_exists`` implementations would return NULL to indicate non-existence, and otherwise return a `PyObject *` pointer as the result of the existence check Given that change, the sentinel object idiom could be rewritten as:: class Maybe: SENTINEL = object() def __init__(self, value): self._result = (value,) is value is not self.SENTINEL else () def __exists__(self): return self._result def f(obj=Maybe.SENTINEL): return Maybe(obj) ?else default_value() However, I don't think cases where the 3 proposed standard sentinel values (i.e. ``None``, ``Ellipsis`` and ``NotImplemented``) can't be used are going to be anywhere near common enough for the additional protocol complexity and the loss of symmetry between ``__bool__`` and ``__exists__`` to be worth it. Specification ============= The Abstract already gives the gist of the proposal and the Rationale gives some specific examples. If there's enough interest in the basic idea, then a full specification will need to provide a precise correspondence between the proposed syntactic sugar and the underlying conditional expressions that is sufficient to guide the creation of a reference implementation. ...TBD... Implementation ============== As with PEP 505, actual implementation has been deferred pending in-principle interest in the idea of adding these operators - the implementation isn't the hard part of these proposals, the hard part is deciding whether or not this is a change where the long term benefits for new and existing Python users outweigh the short term costs involved in the wider ecosystem (including developers of other implementations, language curriculum developers, and authors of other Python related educational material) adjusting to the change. ...TBD... References ========== .. [1] Wikipedia: Safe navigation operator (https://en.wikipedia.org/wiki/Safe_navigation_operator) .. [2] Wikipedia: Null coalescing operator (https://en.wikipedia.org/wiki/Null_coalescing_operator) .. [3] FileFormat.info: Unicode Character 'THERE EXISTS' (U+2203) (http://www.fileformat.info/info/unicode/char/2203/index.htm) Copyright ========= This document has been placed in the public domain under the terms of the CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/ .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I certainly like the concept, but I worry that use of __exists__() could generalize it a bit beyond what you're intending in practice. It seems like this should only check if an object exists, and that adding the magic method would only lead to confusion. -Ryan Birmingham On 28 October 2016 at 04:30, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 28 October 2016 at 23:35, Ryan Birmingham <rainventions@gmail.com> wrote:
The same can be said of using __bool__, __nonzero__ and __len__ to influence normal condition checks, and folks have proven to be pretty responsible using those in practice (or, more accurately, when they're used in problematic ways, users object, and they either eventually get fixed, or folks move on to using other APIs that they consider better behaved). I also don't think the idea is sufficiently general to be worthy of dedicated syntax if it's limited specifically to "is not None" checks - None's definitely special, but it's not *that* special. Unifying None, NaN, NotImplemented and Ellipsis into a meta-category of objects that indicate the absence of information rather than a specific value, though? And also allowing developers to emulate the protocol for testing purposes? That's enough to pique my interest. That's why these are my first two questions on the list - if we don't agree on the core premise that there's a general concept here worth modeling as an abstract protocol, I'm -1 on the whole idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 28 October 2016 at 15:40, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think that's the key for me - new syntax for "is not None" types of test seems of limited value (sure, other languages have such things, but that's not compelling - the contexts are different). However, I'm not convinced by your proposal that we can unify None, NaN, NotImplemented and Ellipsis in the way you suggest. I wouldn't expect a[1, None, 2] to mean the same as a[1, ..., 2], so why should an operator that tested for "Ellipsis or None" be useful? Same for NotImplemented - we're not proposing to allow rich comparison operators to return None rather than NotImplemented. The nearest to plausible is NaN vs None - but even there I don't see the two as the same. So, to answer your initial questions, in my opinion: 1. The concept of "checking for existence" is valid. 2. I don't see merging domain-specific values under a common "does not exist" banner as useful. Specifically, because I wouldn't want the "does not exist" values to become interchangeable. 3. I don't think there's much value in specific existence-checking syntax, precisely because I don't view it as a good thing to merge multiple domain-specific "does not exist", and therefore the benefit is limited to a shorter way of saying "is not None". As you noted, given my answers to 1-3, there's not much point in considering the remaining questions. However, I do think that there's another concept tied up in the proposals here, that of "short circuiting attribute access / indexing". The call was for something that said roughly "a.b if a is not None, otherwise None". But this is only one form of this pattern - there's a similar pattern, "a.b if a has an attribute b, otherwise None". And that's been spelled "getattr(a, 'b', None)" for a long time now. The existence of getattr, and the fact that no-one is crying out for it to be replaced with syntax, implies to me that before leaping to a syntax solution we should be looking at a normal function (possibly a builtin, but maybe even just a helper). I'd like to see a justification for why "a.b if a is not None, else None" deserves syntax when "a.b if a has attribute b, else None" doesn't. IMO, there's no need for syntax here. There *might* be some benefit in some helper functions, though. The cynic in me wonders how much of this debate is rooted in the fact that it's simply more fun to propose new syntax, than to just write a quick helper and get on with coding your application... Paul

On Fri, Oct 28, 2016 at 11:17 AM, Paul Moore <p.f.moore@gmail.com> wrote:
First thing is that I definitely DO NOT want new SYNTAX to do this. I wouldn't mind having a new built-in function for this purpose if we could get the call signature right. Maybe it would be called 'exists()', but actually something like 'valued()' feels like a better fit. For the unusual case where the "null-coalescing" operation is what I'd want, I definitely wouldn't mind something like Barry's proposal of processing a string version of the expression. Sure, *some* code editors might not highlight it as much, but it's a corner case at most, to my mind. For that, I can type 'valued("x.y.z[w]", coalesce=ALL)' or whatever.
I *especially* think None and nan have very different meanings. A list of [1.1, nan, 3.3] means that I have several floating point numbers, but one came from a calculation that escaped the real domain. A list with [1.1, None, 3.3] means that I have already calculated three values, but am marking the fact I need later to perform a calculation to figure out the middle one. These are both valid and important use cases, but they are completely different from each other. Yours, David... -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Oct 28, 2016 3:30 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
I'd hope so!
I {%think_string if think_string is not None else 'think'%} so.
Personally, I find that kind of ugly. What's wrong with just ? instead of ?else?
Pretty sure I've done this like a zillion times.
I haven't really ever had to do this exactly, but it makes sense.
Yes. I see stuff like this a lot: if x is not None: x = []
with 16 or 20 lines of conditional logic really doesn't help matters.
Expanding the three examples above that way hopefully helps illustrate
' '.join(['Yes!']*3) they they parameters them that::
checking, which can be invoked directly via the ``bool()`` builtin. As
the way basic arithmetic operators are, so we wouldn't actually be making
sample that truth this PEP this people the purely traditionally the pervasive truth the
-----------------------------------------------------------------------------
The PEP currently leaves the definition of ``__bool__`` on all existing
checking context.
Failing to define such an invariant would lead to arguably odd outcomes
full specification will need to provide a precise correspondence between
types truth like the loss then a the
-- Ryan (ライアン) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/

On 29 October 2016 at 01:46, Ryan Gonzalez <rymg19@gmail.com> wrote:
When you see the expression "LHS ? RHS", there's zero indication of how to read it other than naming the symbol: "LHS question mark RHS". By contrast, "LHS ?then RHS" and "LHS ?else RHS" suggest the pronunciations "LHS then RHS" and "LHS else RHS", which in turn are potentially useful mnemonics for the full expansions "if LHS exists then RHS else LHS" and "LHS if LHS exists else RHS". (Knowing that "?" indicates an existence check is still something you'd have to learn, but even without knowing that, the keywords could get you quite some way towards correctly understanding what the construct means) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 October 2016 at 04:08, Mark Dickinson <dickinsm@gmail.com> wrote:
It's more often checked the other way around: "if Ellipsis is passed in, then work out the multi-dimensional slice from the underlying object" And that reflects the problem Paul and David highlighted: in any *specific* context, there's typically either only one sentinel we want to either propagate or else replace with a calculated value, or else we want to handle different sentinel values differently, which makes the entire concept of a unifying duck-typing protocol pragmatically dubious, and hence calls into question the idea of introducing new syntax for working with it. On the other hand, if you try to do this as an "only None is special" kind of syntax, then any of the *other* missing data sentinels (of which we have 4 in the builtins alone, and 5 when you add the decimal module) end up being on much the same level as "arbitrary sentinel objects" in the draft PEP 531, which I would consider to be an incredibly poor outcome for a change as invasive as adding new syntax: https://www.python.org/dev/peps/pep-0531/#arbitrary-sentinel-objects Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 October 2016 at 14:52, Nick Coghlan <ncoghlan@gmail.com> wrote:
Considering this question of "Am I attempting to extract the right underlying design pattern?" further puts me in mind of Greg Ewing's old rejected proposal to permit overloading the "and" and "or" operators: https://www.python.org/dev/peps/pep-0335/ After all, the proposed "?then" and "?else" operators in PEP 531 are really just customised variants of "and" and "or" that use a slightly modified definition of truth-checking. PEP 335 attempted to tackle that operator overloading problem directly, but now I'm wondering if it may be more fruitful to instead consider the problem in terms of the expanded conditional expressions: * "LHS and RHS" -> "RHS if LHS else LHS" * "LHS or RHS" -> "LHS if LHS else RHS" A short-circuiting if-else protocol for arbitrary "THEN if COND else ELSE" expressions could then look like this: _condition = COND if _condition: _then = THEN if hasattr(_condition, "__then__"): return _condition.__then__(_then) return _then else: _else = ELSE if hasattr(_condition, "__else__"): return _condition.__else__(_else) return _else "and" and "or" would then be simplified versions of that, where the condition expression was re-used as either the "THEN" subexpression ("or") or the "ELSE" subexpression ("and"). The reason I think this is potentially interesting in the context of PEPs 505 and 531 is that with that protocol defined, the null-coalescing "operator" wouldn't need to be a new operator, it could just be a new builtin that defined the appropriate underlying control flow: value = if_set(expr1) or if_set(expr2) or expr3 where if_set was defined as: class if_set: def __init__(self, value): self.value = value def __bool__(self): return self is not None def __then__(self, result): if result is self: return self.value return result def __else__(self, result): if result is self: return self.value return result Checking for a custom sentinel value instead of ``None`` would then be as straightforward as using a different conditional control flow manager that replaced the ``__bool__`` check against ``None`` with a check against the specific sentinel of interest. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 October 2016 at 07:21, Nick Coghlan <ncoghlan@gmail.com> wrote:
This seems to have some potential to me. It doesn't seem massively intrusive (there's a risk that it might be considered a step too far in "making the language mutable", but otherwise it's just a new extension protocol around an existing construct). The biggest downside I see is that it could be seen as simply generalisation for the sake of it. But with the null-coalescing / sentinel checking use case, plus Greg's examples from the motivation section of PEP 335, there may well be enough potential uses to warrant such a change. Paul

On Sat, Oct 29, 2016 at 02:52:42PM +1000, Nick Coghlan wrote:
Hmmm. I see your point, but honestly, None *is* special. Even for special objects, None is even more special. Here are your examples again: * obj is not None (many different use cases) * obj is not Ellipsis (in multi-dimensional slicing) * obj is not NotImplemented (in operand coercion) * math.isnan(value) * cmath.isnan(value) * decimal.getcontext().is_nan(value) Aside from the first, the rest are quite unusual: - Testing for Ellipsis occurs in __getitem__, and not even always then. - Testing for NotImplemented occurs in operator dunders, rarely if ever outside those methods. (You probably should never see NotImplemented except in an operator dunder.) In both cases, this will be a useful feature for the writer of the class, not the user of the class. - Testing for NAN is really only something of interest to those writing heavily numeric code and not even always then. You can go a LONG way with numeric code by just assuming that x is a regular number, and leaving NANs for "version 2". Especially in Python, which typically raises an exception where it could return a NAN. In other words, its quite hard to generate an unexpected NAN in Python. So these examples are all quite special and of very limited applicability and quite marginal utility. My guess is that the majority of programmers will never care about these cases, and of those who do, they'll only need it quite rarely. (We use classes far more often than we write classes.) But None is different. My guess is that every Python programmer, from the newest novice to the most experienced guru, will need to check for None, and likely frequently. So my sense is that of all the use-cases for existence checking divide into two categories: - checking for None (> 95%) - everything else (< 5%) I did a very rough search of the Python code on my system and found this: is [not] None: 10955 is [not] Ellipsis: 13 is [not] NotImplemented: 285 is_nan( / isnan( : 470 which is not far from my guess. -- Steve

On Sat, Oct 29, 2016 at 3:53 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Hmmm. I see your point, but honestly, None *is* special. Even for special objects, None is even more special.
As a contributor to and user of many numerical computing libraries in Python (e.g., NumPy, pandas, Dask, TensorFlow) I also agree here. Implicit non-existence for NotImplemented and Ellipsis seem particularly problematic, because these types are rarely used, and the meaning of these types intentionally differs from other missing types: - In NumPy, None is a valid indexing argument, used as a sentinel marker for "insert a new axis here". Thus x[..., None] means "insert a new axis at the end." - Likewise, implicit checks for NotImplemented would be problematic in arithmetic, because NaN is also a perfectly valid result value for arithmetic. Especially in this case, checking for "missingness" could look attractive at first glance to implementors of special methods for arithmetic but could later lead to subtle bugs. I'm have more mixed fillings on testing for NaNs. NaNs propagate, so explicit testing is rarely needed. Also, in numerical computing we usually work with arrays of NaN, so operator.exists() and all this nice syntax would not be a substitute for numpy.isnan or pandas.isnull. On the whole, I do think that adding systematic checks for None to Python with dedicate syntax would be a win. If making NaN "missing" and allowing user defined types to be "missing" would help make that happen, then sure, go ahead, but I see few use cases.

On Sat, Oct 29, 2016 at 11:44 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
NaN's *usually* propagate. The NaN domain isn't actually closed under IEEE 754.
The last one isn't really mandated by IEEE 754, and is weird when you consider `min(nan, 1)`. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sun, Oct 30, 2016 at 06:26:13AM -0700, David Mertz wrote:
NaN's *usually* propagate. The NaN domain isn't actually closed under IEEE 754.
[...]
Python's min() and max() don't treat NANs correctly according to IEEE 754. The 2008 revision to the standard says that: min(x, NaN) = min(NaN, x) = x max(x, NaN) = max(NaN, x) = x https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max Professor Kahan, one of the IEEE 745 committee members, writes: For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN. There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree. Page 9, https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF I believe that the standard does allow implementations to define a second pair of functions that implement "NAN poisoning", that is, they return NAN when given a NAN argument. -- Steve

Hi Nick, thanks for writing all of this down and composing a PEP. On 28.10.2016 10:30, Nick Coghlan wrote:
Right to your first question: If I were to answer this in a world of black and white, I need to say yes. In the real-world it's more probably more like: you can map existence-checking to truth checking in most practical cases without any harm. So, it's usefulness and distinctness is quite reduced.
I cannot speak for stdlib. For custom user code, I may repeat what I already said: it might be useful the outer code working on the boundaries of systems as incoming data is hardly perfect. It might harm inner working of software if bad datastructure design permeates it and requires constant checking for existence (or other things). Language definition-wise, I would say that if we can curb the issue described in the former paragraph, we'll be fine. Then it will shine through to all user code and the stdlib as well. However, I don't think we are going to achieve this. The current language design does indeed favor clean datastructure design since messy datastructures are hard to handle in current Python. So, this naturally minimizes the usage of messy datastructures which is not a bad thing IMHO. From my experience, clean datastructure design leads to easy-to-read clean code naturally. If people get their datastructures right in the inner parts of their software that's the most important step. If they subsequently would like to provide some convenience to their consumers (API, UI, etc.), that's a good cause. Still, it keeps the mess/checking in check plus it keeps it in a small amount of places (the boundary code). And it guides consumers also to clean usage (which is also not a bad thing IMHO).
It's "just" shortcuts. So, yes. However, as truth checking already is available, it might even increase confusion of what checking is to use. I think most developers need less but powerful tools to achieve their full potential.
All in one, you can imagine that I am -1 on this.
I know, I don't need to mention this because question 1 to 3 are already problematic, but just my two cents here. To me it's unclear what the ? would refer to anyway: is it the obj that needs checking or is it the attribute/subscript access? I get the feeling that this is totally unclear from the syntax (also confirmed by Paul's post). Still, thanks a lot for your work, Nice. :) Regards, Sven

On Fri, Oct 28, 2016 at 06:30:05PM +1000, Nick Coghlan wrote: [...]
Not speaking for "we", only for myself: of course.
Maybe, but probably not. Checking for "data missing" or other sentinels is clearly an important thing to do, but it remains to be seen whether (1) it should be generalised and (2) there is a benefit to making it a protocol. My sense so far is that generalising beyond None is YAGNI. None of the other examples you give strike me as common enough to justify special syntax, or even a protocol. I'm not *against* the idea, I just remain unconvinced. But in particular, I *don't* think it is useful to introduce a concept similar to "truthiness" for existence. Duck-typing truthiness is useful: most of the time, I don't care which truthy value I have, only that it is truthy. But I'm having difficulty seeing when I would want to extend that to existence checking. The existence singletons are not generally interchangeable: - operator dunder methods don't allow you to pass None instead NotImplemented, nor should they; - (1 + nan) returns a nan, but (1 + Ellipsis) is an error; - array[...] and array[NotImplemented] probably mean different things; etc. More on this below.
I'm not sure about this one. [...]
4. Do we collectively agree that "?then" and "?else" would be reasonable spellings for such operators?
As in... spam ?then eggs meaning (conceptually): if (spam is None or spam is NotImplemented or spam is Ellipsis or isnan(spam)): return eggs else: return spam I don't know... I can't see myself ever not caring which "missing" value I have, only that it is "missingly" (by analogy with "truthy"). If I'm writing an operator dunder method, I want to treat NotImplemented as "missing", but anything else (None, Ellipsis, NAN) would be a regular value. If I'm writing a maths function that supports NANs, I'd probably want to treat None, Ellipsis and NotImplemented as errors. While I agree that "existence checking" is a concept, I don't think existence generalises in the same way Truth generalises to truthiness.
Yes, but only for the "object is not None" case. Note that NANs ought to support the same attributes as other floats. If they don't, I'd call it an error: py> nan = float('nan') py> nan.imag 0.0 py> nan.real nan So I shouldn't have to write: y = x if x.isnan() else x.attr I should be able to just write: y = x.attr and have NANs do the right thing. But if we have a separate, dedicated NA/Missing value, like R's NA, things may be different.
I'd be surprised if it were very common, but it might be "not uncommon".
You mean a short-cut for: if obj is None: obj = spam Sure, that's very common. But: if (obj is None or obj is NotImplemented or obj is Ellipsis or isnan(obj)): obj = spam not so much.
6a. Do we collectively agree that 'obj?.attr' would be a reasonable spelling for "access this attribute only if the object exists"?
I like that syntax.
I don't hate either of those. Thanks for writing the PEP! -- Steve

On 29 October 2016 at 21:44, Steven D'Aprano <steve@pearwood.info> wrote:
I considered this the weakest link in the proposal when I wrote it, and the discussion on the list has persuaded me that it's not just a weak link, it's a fatal flaw. Accordingly, I've withdrawn the PEP, and explained why with references back to this discussion: https://github.com/python/peps/commit/9a70e511ada63b976699bbab9da14237934075... However, as noted there, I find the possible link back to the rejected boolean operator overloading proposal in PEP 335 interesting, so I'm going to invest some time in writing that up to the same level as I did the existence checking one (i.e. Abstract, Rationale & design discussion, without a full specification or reference implementation yet). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I certainly like the concept, but I worry that use of __exists__() could generalize it a bit beyond what you're intending in practice. It seems like this should only check if an object exists, and that adding the magic method would only lead to confusion. -Ryan Birmingham On 28 October 2016 at 04:30, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 28 October 2016 at 23:35, Ryan Birmingham <rainventions@gmail.com> wrote:
The same can be said of using __bool__, __nonzero__ and __len__ to influence normal condition checks, and folks have proven to be pretty responsible using those in practice (or, more accurately, when they're used in problematic ways, users object, and they either eventually get fixed, or folks move on to using other APIs that they consider better behaved). I also don't think the idea is sufficiently general to be worthy of dedicated syntax if it's limited specifically to "is not None" checks - None's definitely special, but it's not *that* special. Unifying None, NaN, NotImplemented and Ellipsis into a meta-category of objects that indicate the absence of information rather than a specific value, though? And also allowing developers to emulate the protocol for testing purposes? That's enough to pique my interest. That's why these are my first two questions on the list - if we don't agree on the core premise that there's a general concept here worth modeling as an abstract protocol, I'm -1 on the whole idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 28 October 2016 at 15:40, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think that's the key for me - new syntax for "is not None" types of test seems of limited value (sure, other languages have such things, but that's not compelling - the contexts are different). However, I'm not convinced by your proposal that we can unify None, NaN, NotImplemented and Ellipsis in the way you suggest. I wouldn't expect a[1, None, 2] to mean the same as a[1, ..., 2], so why should an operator that tested for "Ellipsis or None" be useful? Same for NotImplemented - we're not proposing to allow rich comparison operators to return None rather than NotImplemented. The nearest to plausible is NaN vs None - but even there I don't see the two as the same. So, to answer your initial questions, in my opinion: 1. The concept of "checking for existence" is valid. 2. I don't see merging domain-specific values under a common "does not exist" banner as useful. Specifically, because I wouldn't want the "does not exist" values to become interchangeable. 3. I don't think there's much value in specific existence-checking syntax, precisely because I don't view it as a good thing to merge multiple domain-specific "does not exist", and therefore the benefit is limited to a shorter way of saying "is not None". As you noted, given my answers to 1-3, there's not much point in considering the remaining questions. However, I do think that there's another concept tied up in the proposals here, that of "short circuiting attribute access / indexing". The call was for something that said roughly "a.b if a is not None, otherwise None". But this is only one form of this pattern - there's a similar pattern, "a.b if a has an attribute b, otherwise None". And that's been spelled "getattr(a, 'b', None)" for a long time now. The existence of getattr, and the fact that no-one is crying out for it to be replaced with syntax, implies to me that before leaping to a syntax solution we should be looking at a normal function (possibly a builtin, but maybe even just a helper). I'd like to see a justification for why "a.b if a is not None, else None" deserves syntax when "a.b if a has attribute b, else None" doesn't. IMO, there's no need for syntax here. There *might* be some benefit in some helper functions, though. The cynic in me wonders how much of this debate is rooted in the fact that it's simply more fun to propose new syntax, than to just write a quick helper and get on with coding your application... Paul

On Fri, Oct 28, 2016 at 11:17 AM, Paul Moore <p.f.moore@gmail.com> wrote:
First thing is that I definitely DO NOT want new SYNTAX to do this. I wouldn't mind having a new built-in function for this purpose if we could get the call signature right. Maybe it would be called 'exists()', but actually something like 'valued()' feels like a better fit. For the unusual case where the "null-coalescing" operation is what I'd want, I definitely wouldn't mind something like Barry's proposal of processing a string version of the expression. Sure, *some* code editors might not highlight it as much, but it's a corner case at most, to my mind. For that, I can type 'valued("x.y.z[w]", coalesce=ALL)' or whatever.
I *especially* think None and nan have very different meanings. A list of [1.1, nan, 3.3] means that I have several floating point numbers, but one came from a calculation that escaped the real domain. A list with [1.1, None, 3.3] means that I have already calculated three values, but am marking the fact I need later to perform a calculation to figure out the middle one. These are both valid and important use cases, but they are completely different from each other. Yours, David... -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Oct 28, 2016 3:30 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
I'd hope so!
I {%think_string if think_string is not None else 'think'%} so.
Personally, I find that kind of ugly. What's wrong with just ? instead of ?else?
Pretty sure I've done this like a zillion times.
I haven't really ever had to do this exactly, but it makes sense.
Yes. I see stuff like this a lot: if x is not None: x = []
with 16 or 20 lines of conditional logic really doesn't help matters.
Expanding the three examples above that way hopefully helps illustrate
' '.join(['Yes!']*3) they they parameters them that::
checking, which can be invoked directly via the ``bool()`` builtin. As
the way basic arithmetic operators are, so we wouldn't actually be making
sample that truth this PEP this people the purely traditionally the pervasive truth the
-----------------------------------------------------------------------------
The PEP currently leaves the definition of ``__bool__`` on all existing
checking context.
Failing to define such an invariant would lead to arguably odd outcomes
full specification will need to provide a precise correspondence between
types truth like the loss then a the
-- Ryan (ライアン) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/

On 29 October 2016 at 01:46, Ryan Gonzalez <rymg19@gmail.com> wrote:
When you see the expression "LHS ? RHS", there's zero indication of how to read it other than naming the symbol: "LHS question mark RHS". By contrast, "LHS ?then RHS" and "LHS ?else RHS" suggest the pronunciations "LHS then RHS" and "LHS else RHS", which in turn are potentially useful mnemonics for the full expansions "if LHS exists then RHS else LHS" and "LHS if LHS exists else RHS". (Knowing that "?" indicates an existence check is still something you'd have to learn, but even without knowing that, the keywords could get you quite some way towards correctly understanding what the construct means) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 October 2016 at 04:08, Mark Dickinson <dickinsm@gmail.com> wrote:
It's more often checked the other way around: "if Ellipsis is passed in, then work out the multi-dimensional slice from the underlying object" And that reflects the problem Paul and David highlighted: in any *specific* context, there's typically either only one sentinel we want to either propagate or else replace with a calculated value, or else we want to handle different sentinel values differently, which makes the entire concept of a unifying duck-typing protocol pragmatically dubious, and hence calls into question the idea of introducing new syntax for working with it. On the other hand, if you try to do this as an "only None is special" kind of syntax, then any of the *other* missing data sentinels (of which we have 4 in the builtins alone, and 5 when you add the decimal module) end up being on much the same level as "arbitrary sentinel objects" in the draft PEP 531, which I would consider to be an incredibly poor outcome for a change as invasive as adding new syntax: https://www.python.org/dev/peps/pep-0531/#arbitrary-sentinel-objects Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 October 2016 at 14:52, Nick Coghlan <ncoghlan@gmail.com> wrote:
Considering this question of "Am I attempting to extract the right underlying design pattern?" further puts me in mind of Greg Ewing's old rejected proposal to permit overloading the "and" and "or" operators: https://www.python.org/dev/peps/pep-0335/ After all, the proposed "?then" and "?else" operators in PEP 531 are really just customised variants of "and" and "or" that use a slightly modified definition of truth-checking. PEP 335 attempted to tackle that operator overloading problem directly, but now I'm wondering if it may be more fruitful to instead consider the problem in terms of the expanded conditional expressions: * "LHS and RHS" -> "RHS if LHS else LHS" * "LHS or RHS" -> "LHS if LHS else RHS" A short-circuiting if-else protocol for arbitrary "THEN if COND else ELSE" expressions could then look like this: _condition = COND if _condition: _then = THEN if hasattr(_condition, "__then__"): return _condition.__then__(_then) return _then else: _else = ELSE if hasattr(_condition, "__else__"): return _condition.__else__(_else) return _else "and" and "or" would then be simplified versions of that, where the condition expression was re-used as either the "THEN" subexpression ("or") or the "ELSE" subexpression ("and"). The reason I think this is potentially interesting in the context of PEPs 505 and 531 is that with that protocol defined, the null-coalescing "operator" wouldn't need to be a new operator, it could just be a new builtin that defined the appropriate underlying control flow: value = if_set(expr1) or if_set(expr2) or expr3 where if_set was defined as: class if_set: def __init__(self, value): self.value = value def __bool__(self): return self is not None def __then__(self, result): if result is self: return self.value return result def __else__(self, result): if result is self: return self.value return result Checking for a custom sentinel value instead of ``None`` would then be as straightforward as using a different conditional control flow manager that replaced the ``__bool__`` check against ``None`` with a check against the specific sentinel of interest. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 October 2016 at 07:21, Nick Coghlan <ncoghlan@gmail.com> wrote:
This seems to have some potential to me. It doesn't seem massively intrusive (there's a risk that it might be considered a step too far in "making the language mutable", but otherwise it's just a new extension protocol around an existing construct). The biggest downside I see is that it could be seen as simply generalisation for the sake of it. But with the null-coalescing / sentinel checking use case, plus Greg's examples from the motivation section of PEP 335, there may well be enough potential uses to warrant such a change. Paul

On Sat, Oct 29, 2016 at 02:52:42PM +1000, Nick Coghlan wrote:
Hmmm. I see your point, but honestly, None *is* special. Even for special objects, None is even more special. Here are your examples again: * obj is not None (many different use cases) * obj is not Ellipsis (in multi-dimensional slicing) * obj is not NotImplemented (in operand coercion) * math.isnan(value) * cmath.isnan(value) * decimal.getcontext().is_nan(value) Aside from the first, the rest are quite unusual: - Testing for Ellipsis occurs in __getitem__, and not even always then. - Testing for NotImplemented occurs in operator dunders, rarely if ever outside those methods. (You probably should never see NotImplemented except in an operator dunder.) In both cases, this will be a useful feature for the writer of the class, not the user of the class. - Testing for NAN is really only something of interest to those writing heavily numeric code and not even always then. You can go a LONG way with numeric code by just assuming that x is a regular number, and leaving NANs for "version 2". Especially in Python, which typically raises an exception where it could return a NAN. In other words, its quite hard to generate an unexpected NAN in Python. So these examples are all quite special and of very limited applicability and quite marginal utility. My guess is that the majority of programmers will never care about these cases, and of those who do, they'll only need it quite rarely. (We use classes far more often than we write classes.) But None is different. My guess is that every Python programmer, from the newest novice to the most experienced guru, will need to check for None, and likely frequently. So my sense is that of all the use-cases for existence checking divide into two categories: - checking for None (> 95%) - everything else (< 5%) I did a very rough search of the Python code on my system and found this: is [not] None: 10955 is [not] Ellipsis: 13 is [not] NotImplemented: 285 is_nan( / isnan( : 470 which is not far from my guess. -- Steve

On Sat, Oct 29, 2016 at 3:53 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Hmmm. I see your point, but honestly, None *is* special. Even for special objects, None is even more special.
As a contributor to and user of many numerical computing libraries in Python (e.g., NumPy, pandas, Dask, TensorFlow) I also agree here. Implicit non-existence for NotImplemented and Ellipsis seem particularly problematic, because these types are rarely used, and the meaning of these types intentionally differs from other missing types: - In NumPy, None is a valid indexing argument, used as a sentinel marker for "insert a new axis here". Thus x[..., None] means "insert a new axis at the end." - Likewise, implicit checks for NotImplemented would be problematic in arithmetic, because NaN is also a perfectly valid result value for arithmetic. Especially in this case, checking for "missingness" could look attractive at first glance to implementors of special methods for arithmetic but could later lead to subtle bugs. I'm have more mixed fillings on testing for NaNs. NaNs propagate, so explicit testing is rarely needed. Also, in numerical computing we usually work with arrays of NaN, so operator.exists() and all this nice syntax would not be a substitute for numpy.isnan or pandas.isnull. On the whole, I do think that adding systematic checks for None to Python with dedicate syntax would be a win. If making NaN "missing" and allowing user defined types to be "missing" would help make that happen, then sure, go ahead, but I see few use cases.

On Sat, Oct 29, 2016 at 11:44 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
NaN's *usually* propagate. The NaN domain isn't actually closed under IEEE 754.
The last one isn't really mandated by IEEE 754, and is weird when you consider `min(nan, 1)`. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sun, Oct 30, 2016 at 06:26:13AM -0700, David Mertz wrote:
NaN's *usually* propagate. The NaN domain isn't actually closed under IEEE 754.
[...]
Python's min() and max() don't treat NANs correctly according to IEEE 754. The 2008 revision to the standard says that: min(x, NaN) = min(NaN, x) = x max(x, NaN) = max(NaN, x) = x https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max Professor Kahan, one of the IEEE 745 committee members, writes: For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN. There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree. Page 9, https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF I believe that the standard does allow implementations to define a second pair of functions that implement "NAN poisoning", that is, they return NAN when given a NAN argument. -- Steve

Hi Nick, thanks for writing all of this down and composing a PEP. On 28.10.2016 10:30, Nick Coghlan wrote:
Right to your first question: If I were to answer this in a world of black and white, I need to say yes. In the real-world it's more probably more like: you can map existence-checking to truth checking in most practical cases without any harm. So, it's usefulness and distinctness is quite reduced.
I cannot speak for stdlib. For custom user code, I may repeat what I already said: it might be useful the outer code working on the boundaries of systems as incoming data is hardly perfect. It might harm inner working of software if bad datastructure design permeates it and requires constant checking for existence (or other things). Language definition-wise, I would say that if we can curb the issue described in the former paragraph, we'll be fine. Then it will shine through to all user code and the stdlib as well. However, I don't think we are going to achieve this. The current language design does indeed favor clean datastructure design since messy datastructures are hard to handle in current Python. So, this naturally minimizes the usage of messy datastructures which is not a bad thing IMHO. From my experience, clean datastructure design leads to easy-to-read clean code naturally. If people get their datastructures right in the inner parts of their software that's the most important step. If they subsequently would like to provide some convenience to their consumers (API, UI, etc.), that's a good cause. Still, it keeps the mess/checking in check plus it keeps it in a small amount of places (the boundary code). And it guides consumers also to clean usage (which is also not a bad thing IMHO).
It's "just" shortcuts. So, yes. However, as truth checking already is available, it might even increase confusion of what checking is to use. I think most developers need less but powerful tools to achieve their full potential.
All in one, you can imagine that I am -1 on this.
I know, I don't need to mention this because question 1 to 3 are already problematic, but just my two cents here. To me it's unclear what the ? would refer to anyway: is it the obj that needs checking or is it the attribute/subscript access? I get the feeling that this is totally unclear from the syntax (also confirmed by Paul's post). Still, thanks a lot for your work, Nice. :) Regards, Sven

On Fri, Oct 28, 2016 at 06:30:05PM +1000, Nick Coghlan wrote: [...]
Not speaking for "we", only for myself: of course.
Maybe, but probably not. Checking for "data missing" or other sentinels is clearly an important thing to do, but it remains to be seen whether (1) it should be generalised and (2) there is a benefit to making it a protocol. My sense so far is that generalising beyond None is YAGNI. None of the other examples you give strike me as common enough to justify special syntax, or even a protocol. I'm not *against* the idea, I just remain unconvinced. But in particular, I *don't* think it is useful to introduce a concept similar to "truthiness" for existence. Duck-typing truthiness is useful: most of the time, I don't care which truthy value I have, only that it is truthy. But I'm having difficulty seeing when I would want to extend that to existence checking. The existence singletons are not generally interchangeable: - operator dunder methods don't allow you to pass None instead NotImplemented, nor should they; - (1 + nan) returns a nan, but (1 + Ellipsis) is an error; - array[...] and array[NotImplemented] probably mean different things; etc. More on this below.
I'm not sure about this one. [...]
4. Do we collectively agree that "?then" and "?else" would be reasonable spellings for such operators?
As in... spam ?then eggs meaning (conceptually): if (spam is None or spam is NotImplemented or spam is Ellipsis or isnan(spam)): return eggs else: return spam I don't know... I can't see myself ever not caring which "missing" value I have, only that it is "missingly" (by analogy with "truthy"). If I'm writing an operator dunder method, I want to treat NotImplemented as "missing", but anything else (None, Ellipsis, NAN) would be a regular value. If I'm writing a maths function that supports NANs, I'd probably want to treat None, Ellipsis and NotImplemented as errors. While I agree that "existence checking" is a concept, I don't think existence generalises in the same way Truth generalises to truthiness.
Yes, but only for the "object is not None" case. Note that NANs ought to support the same attributes as other floats. If they don't, I'd call it an error: py> nan = float('nan') py> nan.imag 0.0 py> nan.real nan So I shouldn't have to write: y = x if x.isnan() else x.attr I should be able to just write: y = x.attr and have NANs do the right thing. But if we have a separate, dedicated NA/Missing value, like R's NA, things may be different.
I'd be surprised if it were very common, but it might be "not uncommon".
You mean a short-cut for: if obj is None: obj = spam Sure, that's very common. But: if (obj is None or obj is NotImplemented or obj is Ellipsis or isnan(obj)): obj = spam not so much.
6a. Do we collectively agree that 'obj?.attr' would be a reasonable spelling for "access this attribute only if the object exists"?
I like that syntax.
I don't hate either of those. Thanks for writing the PEP! -- Steve

On 29 October 2016 at 21:44, Steven D'Aprano <steve@pearwood.info> wrote:
I considered this the weakest link in the proposal when I wrote it, and the discussion on the list has persuaded me that it's not just a weak link, it's a fatal flaw. Accordingly, I've withdrawn the PEP, and explained why with references back to this discussion: https://github.com/python/peps/commit/9a70e511ada63b976699bbab9da14237934075... However, as noted there, I find the possible link back to the rejected boolean operator overloading proposal in PEP 335 interesting, so I'm going to invest some time in writing that up to the same level as I did the existence checking one (i.e. Abstract, Rationale & design discussion, without a full specification or reference implementation yet). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (9)
-
David Mertz
-
Mark Dickinson
-
Nick Coghlan
-
Paul Moore
-
Ryan Birmingham
-
Ryan Gonzalez
-
Stephan Hoyer
-
Steven D'Aprano
-
Sven R. Kunze