On Nov 12, 2019, at 17:00, Samuel Muldoon <muldoonsamuel@gmail.com> wrote:

Currently, the `in` operator (also known as `__contains__`) always uses the rightmost argument's implementation.

For example,

   status = obj in "xylophone"

Is similar to:

    status = "xylophone".__contains__( obj )

The current implementation of  `__contains__` is similar to the way that `+` used to only look to the leftmost argument for implementation.

    total = 4 + obj
    total = int.__add__(4, obj)

When was this? I’m pretty sure __radd__ was there in 1.x.

However, these days, `__radd__` gives us the following:

     try:
         total = type(4).__add__(4, obj)
     except NotImplementedError:
         total = type(obj).__radd__(obj, 4) 

We propose something similar for `__contains__`: That a new dunder/magic method `__lcontains__` be created and that the `in` operator be implemented similarly to the following:

    # IMPLEMENTATION OF
    #     status = obj in "xylophone"`
    try:
        status =  "xylophone".__contains__(obj)
    except NotImplementedError:
        status = False
    if not status:
        try:
            status = obj.__lcontains__(“xylophone”)
    except AttributeError:
        # type(obj) does not have an `__lcontains__` method
        with io.StringIO() as string_stream:
            print(
                "unsupported operand type(s) for `in`:",
                repr(type(4).__name__),
                "and",
                repr(type(obj).__name__),
                file=string_stream
            )
            msg = string_stream.getvalue()
        raise TypeError(msg) from None

You’ve specified rules that are different from the one you gave for __radd__, and also different from the actual rules for __radd__. Is that intentional? If so, why?

To summarize the rules: If type(rhs) is a proper subclass of type(lhs), check rhs.__radd__ first and fall back to lhs.__add__. Otherwise, if they’re the same type, only check lhs.__add__. Otherwise, check lhs.__add__ first and fall back to rhs.__radd__. In each case, the check uses special method lookup, not normal getattr. Also, fallback happens if lookup raises an AttributeError or the call returns NotImplemented; it does not happen if either one raises NotImplementedError. And finally, if the fallback fails in the same way, you get a TypeError.

As an example application, one might develope a tree in which each node represents a string (the strings being unique within the tree). A property of the tree might be that node `n` is a descendant of node `m` if and only if `n` is a sub-string of `m`. For example the string "yell" is a descendant of "yellow." We might want the root node of the tree to be a special object, `root` such that every string is in `root` and that `root` is in no string.

I don’t understand why you’d want this. If your tree is defined as substrings of a string, why isn’t your root the maximal string, instead of an empty string? Also, why does `node in “yellow”` work in the first place, when “yellow” is a str, not a Node?  Also, any string is a substring of itself; do you actually want every Node to be a descendant of itself? (And, if so, is the root a descendant of itself or not?) And finally, doesn’t this mean the root of any tree contains every descendant of every possible tree, not just its own descendants?

Most of all, why can’t you implement your rule in Python today, without any new methods?

    class Node:
        def __contains__(self, other):
            if self.isroot: return True
            if other.isroot: return False
            return other.label in self.label

The only reason you need __radd__ is to handle interaction with different types, especially ones you don’t control. When you’re just building a single type, you can put all the logic in __add__. And the same thing ought to be true for __lcontains__.

Not understanding the point of this example makes it hard to evaluate how well the proposal solves it, but I don’t think it actually does.

That is, the code `root in "yellow"` should return `False`. If ` __lcontains__ ` were implemented, then we could implement the node as follows:

class RootNode(Node):

    def __contains__(container, element):

        return True

    def __lcontains__(element, container):

        return False


Presumably the rhs’s __contains__ method exists and does not raise NotImplementedError, right? Then by your rules, RootNode.__lcontains__ would never get called. 

This is the reason for those complicated rules about proper subclasses, identical classes, and unrelated classes being handled differently by __radd__. But even with those rules, your rhs isn’t even a Node, it’s a str. And str.__contains__ definitely exists and doesn’t raise NotImplementedError, and, as it’s an unrelated class, it will get called first, so you’ll just get a TypeError without ever having the chance to get your __lcontains__ called.

And that means that you can’t actually get the benefits without massively breaking backward compatibility. The only reason you can use __radd__ to make new types be addable to int is that int.__add__ doesn’t raise TypeError on unknown types, it returns NotImplemented. And the same for every other builtin, stdlib, and third-party numeric type. But every builtin, stdlib, and third-party container type raises TypeError from __contains__ on unknown types. So for __lcontains__ to be useful, they’d all have to be changed to return NotImplemented instead.

I think __lcontains__ (following the same rules as __radd__, and with the change to every existing __contains__, and probably at least two versions’ worth of __future__) could be useful, and if I were designing a new Python-like language I’d probably include it unless someone came up with a good reason not to. By adding it today would definitely be disruptive. So it needs a real killer use case that’s worth all that disruption.