
Sorry, I think I accidentally left out a clause here - I meant that the rationale for /always returning a 'str'/ (as opposed to returning a subclass) is missing, it just says in the PEP:
The only difference between the real implementation and the above is that, as with other string methods like replace, the methods will raise a TypeError if any of self, pre or suf is not an instace of str, and will cast subclasses of str to builtin str objects.
I think the rationale for these differences is not made entirely clear, specifically the "and will cast subclasses of str to builtin str objects" part. I think it would be best to define the truncation in terms of __getitem__ - possibly with the caveat that implementations are allowed (but not required) to return `self` unchanged if no match is found. Best, Paul P.S. Dennis - just noticed in this reply that there is a typo in the PEP - s/instace/instance On 3/22/20 12:15 PM, Victor Stinner wrote:
tl; dr A method implemented in C is more efficient than hand-written pure-Python code, and it's less error-prone
I don't think if it has already been said previously, but I hate having to compute manually the string length when writing:
if line.startswith("prefix"): line = line[6:]
Usually what I do is to open a Python REPL and I type: len("prefix") and copy-paste the result :-)
Passing directly the length is a risk of mistake. What if I write line[7:] and it works most of the time because of a space, but sometimes the space is omitted randomly and the application fails?
--
The lazy approach is:
if line.startswith("prefix"): line = line[len("prefix"):]
Such code makes my "micro-optimizer hearth" bleeding since I know that Python is stupid and calls len() at runtime, the compiler is unable to optimize it (sadly for good reasons, len name can be overriden) :-(
=> line.cutprefix("prefix") is more efficient! ;-) It's also also shorter.
Victor
Le dim. 22 mars 2020 à 17:02, Paul Ganssle <paul@ganssle.io> a écrit :
I don't see any rationale in the PEP or in the python-ideas thread (admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass" there). Is this just for consistency with other methods like .casefold?
I can understand why you'd want it to be consistent, but I think it's misguided in this case. It adds unnecessary complexity for subclass implementers to need to re-implement these two additional methods, and I can see no obvious reason why this behavior would be necessary, since these methods can be implemented in terms of string slicing.
Even if you wanted to use `str`-specific optimizations in C that aren't available if you are constrained to use the subclass's __getitem__, it's inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast path" that doesn't use slice.
I think defining this in terms of string slicing makes the most sense (and, notably, slice itself returns `str` unless explicitly overridden, the default is for it to return `str` anyway...).
Either way, it would be nice to see the rationale included in the PEP somewhere.
Best, Paul
On 3/22/20 7:16 AM, Eric V. Smith wrote:
On 3/22/2020 1:42 AM, Nick Coghlan wrote:
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
On 3/21/2020 12:39 PM, Victor Stinner wrote: > Well, if CPython is modified to implement tagged pointers and > supports > storing a short strings (a few latin1 characters) as a pointer, it > may > become harder to keep the same behavior for "x is y" where x and y > are > strings. Are you suggesting that it could become impossible to write this function:
def myself(o): return o
and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language. Other way around - because strings are immutable, their identity isn't supposed to matter, so it's possible that functions that currently return the exact same object in some cases may in the future start returning a different object with the same value.
Right now, in CPython, with no tagged pointers, we return the full existing pointer wherever we can, as that saves us a data copy. With tagged pointers, the pointer storage effectively *is* the instance, so you can't really replicate that existing "copy the reference not the storage" behaviour any more.
That said, it's also possible that identity for tagged pointers would be value based (similar to the effect of the small integer cache and string interning), in which case the entire question would become moot.
Either way, the PEP shouldn't be specifying that a new object *must* be returned, and it also shouldn't be specifying that the same object *can't* be returned. Agreed. I think the PEP should say that a str will be returned (in the event of a subclass, assuming that's what we decide), but if the argument is exactly a str, that it may or may not return the original object.
Eric
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JHM7T6JZ... Code of Conduct: http://python.org/psf/codeofconduct/
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RTQWEE4K... Code of Conduct: http://python.org/psf/codeofconduct/