I've said a few times that I think it would be good if the behavior were defined in terms of __getitem__'s behavior. If the rough behavior is this:

def removeprefix(self, prefix):
    if self.startswith(prefix):
        return self[len(prefix):]
    else:
        return self[:]

Then you can shift all the guarantees about whether the subtype is str and whether it might return `self` when the prefix is missing onto the implementation of __getitem__.

For CPython's implementation of str, `self[:]` returns `self`, so it's clearly true that __getitem__ is allowed to return `self` in some situations. Subclasses that do not override __getitem__ will return the str base class, and subclasses that do overwrite __getitem__ can choose what they want to do. So someone could make their subclass do this:

class MyStr(str):
    def __getitem__(self, key):
        if isinstance(key, slice) and key.start is key.stop is key.end is None:
            return self
        return type(self)(super().__getitem__(key))

They would then get "removeprefix" and "removesuffix" for free, with the desired semantics and optimizations.

If we go with this approach (which again I think is much friendlier to subclassers), that obviates the problem of whether `self[:]` is a good summary of something that can return `self`: since "does the same thing as self[:]" is the behavior it's trying to describe, there's no ambiguity.

Best,
Paul

On 3/25/20 1:36 PM, Dennis Sweeney wrote:
I'm removing the tuple feature from this PEP. So now, if I understand
correctly, I don't think there's disagreement about behavior, just about
how that behavior should be summarized in Python code. 

Ethan Furman wrote:
It appears that in CPython, self[:] is self is true for base
str
 objects, so I think return self[:] is consistent with (1) the premise
 that returning self is an implementation detail that is neither mandated
 nor forbidden, and (2) the premise that the methods should return base
 str objects even when called on str subclasses.
The Python interpreter in my head sees self[:] and returns a copy. 
A
note that says a str is returned would be more useful than trying to
exactly mirror internal details in the Python "roughly equivalent" code.
I think I'm still in the camp that ``return self[:]`` more precisely prescribes
the desired behavior. It would feel strange to me to write ``return self``
and then say "but you don't actually have to return self, and in fact
you shouldn't when working with subclasses". To me, it feels like

    return (the original object unchanged, or a copy of the object, 
            depending on implementation details, 
            but always make a copy when working with subclasses)

is well-summarized by

   return self[:]

especially if followed by the text

    Note that ``self[:]`` might not actually make a copy -- if the affix
    is empty or not found, and if ``type(self) is str``, then these methods
    may, but are not required to, make the optimization of returning ``self``.
    However, when called on instances of subclasses of ``str``, these
    methods should return base ``str`` objects, not ``self``.

...which is a necessary explanation regardless. Granted, ``return self[:]``
isn't perfect if ``__getitem__`` is overridden, but at the cost of three
characters, the Python gains accuracy over both the optional nature of
returning ``self`` in all cases and the impossibility (assuming no dunders
are overridden) of returning self for subclasses. It also dissuades readers
from relying on the behavior of returning self, which we're specifying is
an implementation detail.

Is that text explanation satisfactory?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4E77QD52JCMHSP7O62C57XILLQN6SPCT/
Code of Conduct: http://python.org/psf/codeofconduct/