Sorry, I think I accidentally left out a clause here - I meant that the rationale for always returning a 'str' (as opposed to returning a subclass) is missing, it just says in the PEP:

The only difference between the real implementation and the above is that, as with other string methods like replace, the methods will raise a TypeError if any of self, pre or suf is not an instace of str, and will cast subclasses of str to builtin str objects.

I think the rationale for these differences is not made entirely clear, specifically the "and will cast subclasses of str to builtin str objects" part.

I think it would be best to define the truncation in terms of __getitem__ - possibly with the caveat that implementations are allowed (but not required) to return `self` unchanged if no match is found.

Best,
Paul

P.S. Dennis - just noticed in this reply that there is a typo in the PEP - s/instace/instance

On 3/22/20 12:15 PM, Victor Stinner wrote:

tl; dr A method implemented in C is more efficient than hand-written
pure-Python code, and it's less error-prone

I don't think if it has already been said previously, but I hate
having to compute manually the string length when writing:

if line.startswith("prefix"): line = line[6:]

Usually what I do is to open a Python REPL and I type: len("prefix")
and copy-paste the result :-)

Passing directly the length is a risk of mistake. What if I write
line[7:] and it works most of the time because of a space, but
sometimes the space is omitted randomly and the application fails?

--

The lazy approach is:

if line.startswith("prefix"): line = line[len("prefix"):]

Such code makes my "micro-optimizer hearth" bleeding since I know that
Python is stupid and calls len() at runtime, the compiler is unable to
optimize it (sadly for good reasons, len name can be overriden)  :-(

=> line.cutprefix("prefix") is more efficient! ;-) It's also also shorter.

Victor

Le dim. 22 mars 2020 à 17:02, Paul Ganssle <paul@ganssle.io> a écrit :

I don't see any rationale in the PEP or in the python-ideas thread
(admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass"
there). Is this just for consistency with other methods like .casefold?

I can understand why you'd want it to be consistent, but I think it's
misguided in this case. It adds unnecessary complexity for subclass
implementers to need to re-implement these two additional methods, and I
can see no obvious reason why this behavior would be necessary, since
these methods can be implemented in terms of string slicing.

Even if you wanted to use `str`-specific optimizations in C that aren't
available if you are constrained to use the subclass's __getitem__, it's
inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast
path" that doesn't use slice.

I think defining this in terms of string slicing makes the most sense
(and, notably, slice itself returns `str` unless explicitly overridden,
the default is for it to return `str` anyway...).

Either way, it would be nice to see the rationale included in the PEP
somewhere.

Best,
Paul

On 3/22/20 7:16 AM, Eric V. Smith wrote:

On 3/22/2020 1:42 AM, Nick Coghlan wrote:

On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:

On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:

On 3/21/2020 12:39 PM, Victor Stinner wrote:

Well, if CPython is modified to implement tagged pointers and
supports
storing a short strings (a few latin1 characters) as a pointer, it
may
become harder to keep the same behavior for "x is y" where x and y
are
strings.

Are you suggesting that it could become impossible to write this
function:

     def myself(o):
         return o

and not be able to rely on "o is myself(o)"? That seems... a pretty
nasty breaking change for the language.

Other way around - because strings are immutable, their identity isn't
supposed to matter, so it's possible that functions that currently
return the exact same object in some cases may in the future start
returning a different object with the same value.

Right now, in CPython, with no tagged pointers, we return the full
existing pointer wherever we can, as that saves us a data copy. With
tagged pointers, the pointer storage effectively *is* the instance, so
you can't really replicate that existing "copy the reference not the
storage" behaviour any more.

That said, it's also possible that identity for tagged pointers would
be value based (similar to the effect of the small integer cache and
string interning), in which case the entire question would become
moot.

Either way, the PEP shouldn't be specifying that a new object *must*
be returned, and it also shouldn't be specifying that the same object
*can't* be returned.

Agreed. I think the PEP should say that a str will be returned (in the
event of a subclass, assuming that's what we decide), but if the
argument is exactly a str, that it may or may not return the original
object.

Eric

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/JHM7T6JZU56PWYRJDG45HMRBXE3CBXMX/
Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RTQWEE4KZYIIXL3HK3C6IJ2ATQ6CM7PG/
Code of Conduct: http://python.org/psf/codeofconduct/