[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

March 22, 2020

      Sorry, I think I accidentally left out a clause here - I meant that the
rationale for /always returning a 'str'/ (as opposed to returning a
subclass) is missing, it just says in the PEP:
...
The only difference between the real implementation and the above is
that, as with other string methods like replace, the methods will
raise a TypeError if any of self, pre or suf is not an instace of str,
and will cast subclasses of str to builtin str objects.
I think the rationale for these differences is not made entirely clear,
specifically the "and will cast subclasses of str to builtin str
objects" part.

I think it would be best to define the truncation in terms of
__getitem__ - possibly with the caveat that implementations are allowed
(but not required) to return `self` unchanged if no match is found.

Best,
Paul

P.S. Dennis - just noticed in this reply that there is a typo in the PEP
- s/instace/instance

On 3/22/20 12:15 PM, Victor Stinner wrote:
...
tl; dr A method implemented in C is more efficient than hand-written
pure-Python code, and it's less error-prone
I don't think if it has already been said previously, but I hate
having to compute manually the string length when writing:
if line.startswith("prefix"): line = line[6:]
Usually what I do is to open a Python REPL and I type: len("prefix")
and copy-paste the result :-)
Passing directly the length is a risk of mistake. What if I write
line[7:] and it works most of the time because of a space, but
sometimes the space is omitted randomly and the application fails?
--
The lazy approach is:
if line.startswith("prefix"): line = line[len("prefix"):]
Such code makes my "micro-optimizer hearth" bleeding since I know that
Python is stupid and calls len() at runtime, the compiler is unable to
optimize it (sadly for good reasons, len name can be overriden)  :-(
=> line.cutprefix("prefix") is more efficient! ;-) It's also also shorter.
Victor
Le dim. 22 mars 2020 à 17:02, Paul Ganssle <paul@ganssle.io> a écrit :
...
I don't see any rationale in the PEP or in the python-ideas thread
(admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass"
there). Is this just for consistency with other methods like .casefold?
I can understand why you'd want it to be consistent, but I think it's
misguided in this case. It adds unnecessary complexity for subclass
implementers to need to re-implement these two additional methods, and I
can see no obvious reason why this behavior would be necessary, since
these methods can be implemented in terms of string slicing.
Even if you wanted to use `str`-specific optimizations in C that aren't
available if you are constrained to use the subclass's __getitem__, it's
inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast
path" that doesn't use slice.
I think defining this in terms of string slicing makes the most sense
(and, notably, slice itself returns `str` unless explicitly overridden,
the default is for it to return `str` anyway...).
Either way, it would be nice to see the rationale included in the PEP
somewhere.
Best,
Paul
On 3/22/20 7:16 AM, Eric V. Smith wrote:
...
On 3/22/2020 1:42 AM, Nick Coghlan wrote:
...
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
...
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
...
On 3/21/2020 12:39 PM, Victor Stinner wrote:
> Well, if CPython is modified to implement tagged pointers and
> supports
> storing a short strings (a few latin1 characters) as a pointer, it
> may
> become harder to keep the same behavior for "x is y" where x and y
> are
> strings.
Are you suggesting that it could become impossible to write this
function:
def myself(o):
         return o
and not be able to rely on "o is myself(o)"? That seems... a pretty
nasty breaking change for the language.
Other way around - because strings are immutable, their identity isn't
supposed to matter, so it's possible that functions that currently
return the exact same object in some cases may in the future start
returning a different object with the same value.
Right now, in CPython, with no tagged pointers, we return the full
existing pointer wherever we can, as that saves us a data copy. With
tagged pointers, the pointer storage effectively *is* the instance, so
you can't really replicate that existing "copy the reference not the
storage" behaviour any more.
That said, it's also possible that identity for tagged pointers would
be value based (similar to the effect of the small integer cache and
string interning), in which case the entire question would become
moot.
Either way, the PEP shouldn't be specifying that a new object *must*
be returned, and it also shouldn't be specifying that the same object
*can't* be returned.
Agreed. I think the PEP should say that a str will be returned (in the
event of a subclass, assuming that's what we decide), but if the
argument is exactly a str, that it may or may not return the original
object.
Eric
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/JHM7T6JZ...
Code of Conduct: http://python.org/psf/codeofconduct/

Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RTQWEE4K...
Code of Conduct: http://python.org/psf/codeofconduct/