
Hi Dennis, Thanks for writing a proper PEP. It easier to review a specification than an implementation. Le ven. 20 mars 2020 à 20:00, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
Abstract ========
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
It would be nice to describe the behavior of these methods in a short sentence here.
In particular, the methods would be added to Unicode ``str`` objects, binary ``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
IMHO the abstract should stop here. You should move the above text in the Specification section. The abstract shouldn't go into details.
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is roughly equivalent to ``s[:-len(suf)] if suf and s.endswith(suf) else s``.
(...)
The builtin ``str`` class will gain two new methods with roughly the following behavior::
def cutprefix(self: str, pre: str, /) -> str: if self.startswith(pre): return self[len(pre):] return self[:]
I'm not sure that I'm comfortable with not specifying if the method must return the string unmodified or return a copy if it doesn't start with the prefix. It can subtle causes: see the "Allow multiple prefixes" example which expects that it doesn't return a copy. Usually, PyPy does its best to mimick exactly CPython behavior anyway, since applications rely on CPython exact behavior (even if it's bad thing). Hopefully, Python 3.8 started to emit a SyntaxWarning when "is" operator is used to compare an object to a string (like: x is "abc"). I suggest to always require to return the unmodified string. Honestly, it's not hard to guarantee and implement this behavior in Python! IMHO you should also test if pre is non-empty just to make the intent more explicit. Note: please rename "pre" to "prefix". In short, I propose: def cutprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix) and prefix: return self[len(prefix):] else: return self I call startswith() before testing if pre is non-empty to inherit of startswith() input type validation. For example, "a".startswith(b'x') raises a TypeError. I also suggest to avoid/remove the duplicated "rough specification" of the abstract: "s[len(pre):] if s.startswith(pre) else s". Only one specification per PEP is enough ;-)
The two methods will also be added to ``collections.UserString``, where they rely on the implementation of the new ``str`` methods.
I don't think that mentioning "where they rely on the implementation of the new ``str`` methods" is worth it. The spec can leave this part to the implementation.
Motivating examples from the Python standard library ====================================================
The examples below demonstrate how the proposed methods can make code one or more of the following: (...)
IMO there are too many examples. For example, refactor.py and c_annotations.py are more or less the same. Just keep refactor.py. Overall, 2 or 3 examples should be enough.
Allow multiple prefixes -----------------------
Some users discussed the desire to be able to remove multiple prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``. However, this adds ambiguity about the order in which the prefixes are removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``. After this proposal, this can be spelled explicitly as ``s.cutprefix('Foo').cutprefix('FooBar')``.
I like the ability to specify multiple prefixes or suffixes. If the order is an issue, only allow tuple and list types and you're done. I don't see how disallowing s.cutprefix(('Foo', 'FooBar')) but allowing s.cutprefix('Foo').cutprefix('FooBar') prevents any risk of mistake. I'm sure that there are many use cases for cutsuffix() accepting multiple suffixes. IMO it makes the method even more attractive and efficient. Example to remove newline suffix (Dos, Unix and macOS newlines): line.cutsuffix(("\r\n", "\n", "\r")). It's not ambitious: "\r\n" is tested first explicitly, then "\r".
Remove multiple copies of a prefix ----------------------------------
This is the behavior that would be consistent with the aforementioned expansion of the ``lstrip/rstrip`` API -- repeatedly applying the function until the argument is unchanged. This behavior is attainable from the proposed behavior via the following::
>>> s = 'foo' * 100 + 'bar' >>> while s != (s := s.cutprefix("foo")): pass >>> s 'bar'
Well, even if it's less efficient, I think that I would prefer to write: while s.endswith("\n"): s = s.cutsuffix("\n") ... especially because the specification doesn't (currently) require to return the string unmodified if it doesn't end with the suffix...
Raising an exception when not found -----------------------------------
There was a suggestion that ``s.cutprefix(pre)`` should raise an exception if ``not s.startswith(pre)``. However, this does not match with the behavior and feel of other string methods. There could be ``required=False`` keyword added, but this violates the KISS principle.
You may add that it makes cutprefix() and cutsuffix() methods consistent with the strip() functions family. "abc".strip() doesn't raise. startswith() and endswith() methods can be used to explicitly raise an exception if there is no match. Victor -- Night gathers, and now my watch begins. It shall not end until my death.