I imagine it's an implementation detail of which ones depend on __getitem__.

The only methods that would be reasonably amenable to a guarantee like "always returns the same thing as __getitem__" would be (l|r|)strip(), split(), splitlines(), and .partition(), because they only work with subsets of the input string.

Most of the other stuff involves constructing new strings and it's harder to cast them in terms of other "primitive operations" since strings are immutable.

I suspect that to the extent that the ones that could be implemented in terms of __getitem__ are returning base strings, it's either because no one thought about doing it at the time and they used another mechanism or it was a deliberate choice to be consistent with the other methods.

I don't see removeprefix and removesuffix explicitly being implemented in terms of slicing operations as a huge win - you've demonstrated that someone who wants a persistent string subclass still would need to override a lot of methods, so two more shouldn't hurt much - I just think that "consistent with most of the other methods" is a particularly good reason to avoid explicitly defining these operations in terms of __getitem__. The default semantics are the same (i.e. if you don't explicitly change the return type of __getitem__, it won't change the return type of the remove* methods), and the only difference is that for all the other methods, it's an implementation detail whether they call __getitem__, whereas for the remove methods it would be explicitly documented.

In my ideal world, a lot of these methods would be redefined in terms of a small set of primitives that people writing subclasses could implement as a protocol that would allow methods called on the functions to retain their class, but I think the time for that has passed. Still, I don't think it would hurt for new methods to be defined in terms of what primitive operations exist where possible.


On 3/25/20 3:09 PM, Dennis Sweeney wrote:
I was surprised by the following behavior:

    class MyStr(str):
        def __getitem__(self, key):
            if isinstance(key, slice) and key.start is key.stop is key.end:
                return self
            return type(self)(super().__getitem__(key))

    my_foo = MyStr("foo")
    MY_FOO = MyStr("FOO")
    My_Foo = MyStr("Foo")
    empty = MyStr("")

    assert type(my_foo.casefold()) is str
    assert type(MY_FOO.capitalize()) is str
    assert type(my_foo.center(3)) is str
    assert type(my_foo.expandtabs()) is str
    assert type(my_foo.join(())) is str
    assert type(my_foo.ljust(3)) is str
    assert type(my_foo.lower()) is str
    assert type(my_foo.lstrip()) is str
    assert type(my_foo.replace("x", "y")) is str
    assert type(my_foo.split()[0]) is str
    assert type(my_foo.splitlines()[0]) is str
    assert type(my_foo.strip()) is str
    assert type(empty.swapcase()) is str
    assert type(My_Foo.title()) is str
    assert type(MY_FOO.upper()) is str
    assert type(my_foo.zfill(3)) is str

    assert type(my_foo.partition("z")[0]) is MyStr
    assert type(my_foo.format()) is MyStr

I was under the impression that all of the ``str`` methods exclusively returned base ``str`` objects. Is there any reason why those two are different, and is there a reason that would apply to ``removeprefix`` and ``removesuffix`` as well?
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TVDATHMCK25GT4OTBUBDWG3TBJN6DOKK/
Code of Conduct: http://python.org/psf/codeofconduct/