[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

March 20, 2020

      Hi Dennis,

Thanks for writing a proper PEP. It easier to review a specification
than an implementation.

Le ven. 20 mars 2020 à 20:00, Dennis Sweeney
<sweeney.dennis650@gmail.com> a écrit :
...
Abstract
========
This is a proposal to add two new methods, ``cutprefix`` and
``cutsuffix``, to the APIs of Python's various string objects.
It would be nice to describe the behavior of these methods in a short
sentence here.
...
In
particular, the methods would be added to Unicode ``str`` objects,
binary ``bytes`` and ``bytearray`` objects, and
``collections.UserString``.
IMHO the abstract should stop here. You should move the above text in
the Specification section. The abstract shouldn't go into details.
...
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
been removed.  If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
roughly equivalent to
``s[:-len(suf)] if suf and s.endswith(suf) else s``.
(...)
...
The builtin ``str`` class will gain two new methods with roughly the
following behavior::
def cutprefix(self: str, pre: str, /) -> str:
        if self.startswith(pre):
            return self[len(pre):]
        return self[:]
I'm not sure that I'm comfortable with not specifying if the method
must return the string unmodified or return a copy if it doesn't start
with the prefix. It can subtle causes: see the "Allow multiple
prefixes" example which expects that it doesn't return a copy.
Usually, PyPy does its best to mimick exactly CPython behavior anyway,
since applications rely on CPython exact behavior (even if it's bad
thing). Hopefully, Python 3.8 started to emit a SyntaxWarning when
"is" operator is used to compare an object to a string (like: x is
"abc").

I suggest to always require to return the unmodified string. Honestly,
it's not hard to guarantee and implement this behavior in Python!

IMHO you should also test if pre is non-empty just to make the intent
more explicit.

Note: please rename "pre" to "prefix".

In short, I propose:

     def cutprefix(self: str, prefix: str, /) -> str:
         if self.startswith(prefix) and prefix:
             return self[len(prefix):]
         else:
             return self

I call startswith() before testing if pre is non-empty to inherit of
startswith() input type validation. For example, "a".startswith(b'x')
raises a TypeError.

I also suggest to avoid/remove the duplicated "rough specification" of
the abstract: "s[len(pre):] if s.startswith(pre) else s". Only one
specification per PEP is enough ;-)
...
The two methods will also be added to ``collections.UserString``,
where they rely on the implementation of the new ``str`` methods.
I don't think that mentioning "where they rely on the implementation
of the new ``str`` methods" is worth it. The spec can leave this part
to the implementation.
...
Motivating examples from the Python standard library
====================================================
The examples below demonstrate how the proposed methods can make code
one or more of the following: (...)
IMO there are too many examples. For example, refactor.py and
c_annotations.py are more or less the same. Just keep refactor.py.

Overall, 2 or 3 examples should be enough.
...
Allow multiple prefixes
-----------------------
Some users discussed the desire to be able to remove multiple
prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``.
However, this adds ambiguity about the order in which the prefixes are
removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``.
After this proposal, this can be spelled explicitly as
``s.cutprefix('Foo').cutprefix('FooBar')``.
I like the ability to specify multiple prefixes or suffixes. If the
order is an issue, only allow tuple and list types and you're done.

I don't see how disallowing s.cutprefix(('Foo', 'FooBar')) but
allowing s.cutprefix('Foo').cutprefix('FooBar') prevents any risk of
mistake.

I'm sure that there are many use cases for cutsuffix() accepting
multiple suffixes. IMO it makes the method even more attractive and
efficient.

Example to remove newline suffix (Dos, Unix and macOS newlines):
line.cutsuffix(("\r\n", "\n", "\r")). It's not ambitious: "\r\n" is
tested first explicitly, then "\r".
...
Remove multiple copies of a prefix
----------------------------------
This is the behavior that would be consistent with the aforementioned
expansion of the ``lstrip/rstrip`` API -- repeatedly applying the
function until the argument is unchanged.  This behavior is attainable
from the proposed behavior via the following::
>>> s = 'foo' * 100 + 'bar'
    >>> while s != (s := s.cutprefix("foo")): pass
    >>> s
    'bar'
Well, even if it's less efficient, I think that I would prefer to write:

while s.endswith("\n"): s = s.cutsuffix("\n")

... especially because the specification doesn't (currently) require
to return the string unmodified if it doesn't end with the suffix...
...
Raising an exception when not found
-----------------------------------
There was a suggestion that ``s.cutprefix(pre)`` should raise an
exception if ``not s.startswith(pre)``.  However, this does not match
with the behavior and feel of other string methods.  There could be
``required=False`` keyword added, but this violates the KISS
principle.
You may add that it makes cutprefix() and cutsuffix() methods
consistent with the strip() functions family. "abc".strip() doesn't
raise.

startswith() and endswith() methods can be used to explicitly raise an
exception if there is no match.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

Victor Stinner