
I imagine it's an implementation detail of which ones depend on __getitem__. The only methods that would be reasonably amenable to a guarantee like "always returns the same thing as __getitem__" would be (l|r|)strip(), split(), splitlines(), and .partition(), because they only work with subsets of the input string. Most of the other stuff involves constructing new strings and it's harder to cast them in terms of other "primitive operations" since strings are immutable. I suspect that to the extent that the ones that /could/ be implemented in terms of __getitem__ are returning base strings, it's either because no one thought about doing it at the time and they used another mechanism or it was a deliberate choice to be consistent with the other methods. I don't see removeprefix and removesuffix explicitly being implemented in terms of slicing operations as a huge win - you've demonstrated that someone who wants a persistent string subclass still would need to override a /lot/ of methods, so two more shouldn't hurt much - I just think that "consistent with most of the other methods" is a /particularly/ good reason to avoid explicitly defining these operations in terms of __getitem__. The /default/ semantics are the same (i.e. if you don't explicitly change the return type of __getitem__, it won't change the return type of the remove* methods), and the only difference is that for all the /other/ methods, it's an implementation detail whether they call __getitem__, whereas for the remove methods it would be explicitly documented. In my ideal world, a lot of these methods would be redefined in terms of a small set of primitives that people writing subclasses could implement as a protocol that would allow methods called on the functions to retain their class, but I think the time for that has passed. Still, I don't think it would /hurt/ for new methods to be defined in terms of what primitive operations exist where possible. Best, Paul On 3/25/20 3:09 PM, Dennis Sweeney wrote:
I was surprised by the following behavior:
class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end: return self return type(self)(super().__getitem__(key))
my_foo = MyStr("foo") MY_FOO = MyStr("FOO") My_Foo = MyStr("Foo") empty = MyStr("")
assert type(my_foo.casefold()) is str assert type(MY_FOO.capitalize()) is str assert type(my_foo.center(3)) is str assert type(my_foo.expandtabs()) is str assert type(my_foo.join(())) is str assert type(my_foo.ljust(3)) is str assert type(my_foo.lower()) is str assert type(my_foo.lstrip()) is str assert type(my_foo.replace("x", "y")) is str assert type(my_foo.split()[0]) is str assert type(my_foo.splitlines()[0]) is str assert type(my_foo.strip()) is str assert type(empty.swapcase()) is str assert type(My_Foo.title()) is str assert type(MY_FOO.upper()) is str assert type(my_foo.zfill(3)) is str
assert type(my_foo.partition("z")[0]) is MyStr assert type(my_foo.format()) is MyStr
I was under the impression that all of the ``str`` methods exclusively returned base ``str`` objects. Is there any reason why those two are different, and is there a reason that would apply to ``removeprefix`` and ``removesuffix`` as well? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TVDATHMC... Code of Conduct: http://python.org/psf/codeofconduct/