I imagine it's an implementation detail of which ones depend on
__getitem__.
The only methods that would be reasonably amenable to a guarantee
like "always returns the same thing as __getitem__" would be
(l|r|)strip(), split(), splitlines(), and .partition(), because
they only work with subsets of the input string.
Most of the other stuff involves constructing new strings and it's
harder to cast them in terms of other "primitive operations" since
strings are immutable.
I suspect that to the extent that the ones that could be
implemented in terms of __getitem__ are returning base strings,
it's either because no one thought about doing it at the time and
they used another mechanism or it was a deliberate choice to be
consistent with the other methods.
I don't see removeprefix and removesuffix explicitly being
implemented in terms of slicing operations as a huge win - you've
demonstrated that someone who wants a persistent string subclass
still would need to override a lot of methods, so two more
shouldn't hurt much - I just think that "consistent with most of
the other methods" is a particularly good reason to avoid
explicitly defining these operations in terms of __getitem__. The
default semantics are the same (i.e. if you don't
explicitly change the return type of __getitem__, it won't change
the return type of the remove* methods), and the only difference
is that for all the other methods, it's an implementation
detail whether they call __getitem__, whereas for the remove
methods it would be explicitly documented.
In my ideal world, a lot of these methods would be redefined in
terms of a small set of primitives that people writing subclasses
could implement as a protocol that would allow methods called on
the functions to retain their class, but I think the time for that
has passed. Still, I don't think it would hurt for new
methods to be defined in terms of what primitive operations exist
where possible.
Best,
Paul
I was surprised by the following behavior: class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end: return self return type(self)(super().__getitem__(key)) my_foo = MyStr("foo") MY_FOO = MyStr("FOO") My_Foo = MyStr("Foo") empty = MyStr("") assert type(my_foo.casefold()) is str assert type(MY_FOO.capitalize()) is str assert type(my_foo.center(3)) is str assert type(my_foo.expandtabs()) is str assert type(my_foo.join(())) is str assert type(my_foo.ljust(3)) is str assert type(my_foo.lower()) is str assert type(my_foo.lstrip()) is str assert type(my_foo.replace("x", "y")) is str assert type(my_foo.split()[0]) is str assert type(my_foo.splitlines()[0]) is str assert type(my_foo.strip()) is str assert type(empty.swapcase()) is str assert type(My_Foo.title()) is str assert type(MY_FOO.upper()) is str assert type(my_foo.zfill(3)) is str assert type(my_foo.partition("z")[0]) is MyStr assert type(my_foo.format()) is MyStr I was under the impression that all of the ``str`` methods exclusively returned base ``str`` objects. Is there any reason why those two are different, and is there a reason that would apply to ``removeprefix`` and ``removesuffix`` as well? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TVDATHMCK25GT4OTBUBDWG3TBJN6DOKK/ Code of Conduct: http://python.org/psf/codeofconduct/