PEP 616 -- String methods to remove prefixes and suffixes
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Browser Link: https://www.python.org/dev/peps/pep-0616/ PEP: 616 Title: String methods to remove prefixes and suffixes Author: Dennis Sweeney <sweeney.dennis650@gmail.com> Sponsor: Eric V. Smith <eric@trueblade.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-Mar-2020 Python-Version: 3.9 Post-History: 30-Aug-2002 Abstract ======== This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects. In particular, the methods would be added to Unicode ``str`` objects, binary ``bytes`` and ``bytearray`` objects, and ``collections.UserString``. If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``. The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is roughly equivalent to ``s[:-len(suf)] if suf and s.endswith(suf) else s``. Rationale ========= There have been repeated issues [#confusion]_ on the Bug Tracker and StackOverflow related to user confusion about the existing ``str.lstrip`` and ``str.rstrip`` methods. These users are typically expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they are surprised that the parameter for ``lstrip`` is interpreted as a set of characters, not a substring. This repeated issue is evidence that these methods are useful, and the new methods allow a cleaner redirection of users to the desired behavior. As another testimonial for the usefulness of these methods, several users on Python-Ideas [#pyid]_ reported frequently including similar functions in their own code for productivity. The implementation often contained subtle mistakes regarding the handling of the empty string (see `Specification`_). Specification ============= The builtin ``str`` class will gain two new methods with roughly the following behavior:: def cutprefix(self: str, pre: str, /) -> str: if self.startswith(pre): return self[len(pre):] return self[:] def cutsuffix(self: str, suf: str, /) -> str: if suf and self.endswith(suf): return self[:-len(suf)] return self[:] The only difference between the real implementation and the above is that, as with other string methods like ``replace``, the methods will raise a ``TypeError`` if any of ``self``, ``pre`` or ``suf`` is not an instace of ``str``, and will cast subclasses of ``str`` to builtin ``str`` objects. Note that without the check for the truthyness of ``suf``, ``s.cutsuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``. Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()`` will accept any bytes-like object as an argument. Note that the ``bytearray`` methods return a copy of ``self``; they do not operate in place. The following behavior is considered a CPython implementation detail, but is not guaranteed by this specification:: >>> x = 'foobar' * 10**6 >>> x.cutprefix('baz') is x is x.cutsuffix('baz') True >>> x.cutprefix('') is x is x.cutsuffix('') True That is, for CPython's immutable ``str`` and ``bytes`` objects, the methods return the original object when the affix is not found or if the affix is empty. Because these types test for equality using shortcuts for identity and length, the following equivalent expressions are evaluated at approximately the same speed, for any ``str`` objects (or ``bytes`` objects) ``x`` and ``y``:: >>> (True, x[len(y):]) if x.startswith(y) else (False, x) >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x) The two methods will also be added to ``collections.UserString``, where they rely on the implementation of the new ``str`` methods. Motivating examples from the Python standard library ==================================================== The examples below demonstrate how the proposed methods can make code one or more of the following: Less fragile: The code will not depend on the user to count the length of a literal. More performant: The code does not require a call to the Python built-in ``len`` function. More descriptive: The methods give a higher-level API for code readability, as opposed to the traditional method of string slicing. refactor.py ----------- - Current:: if fix_name.startswith(self.FILE_PREFIX): fix_name = fix_name[len(self.FILE_PREFIX):] - Improved:: fix_name = fix_name.cutprefix(self.FILE_PREFIX) c_annotations.py: ----------------- - Current:: if name.startswith("c."): name = name[2:] - Improved:: name = name.cutprefix("c.") find_recursionlimit.py ---------------------- - Current:: if test_func_name.startswith("test_"): print(test_func_name[5:]) else: print(test_func_name) - Improved:: print(test_finc_name.cutprefix("test_")) deccheck.py ----------- This is an interesting case because the author chose to use the ``str.replace`` method in a situation where only a prefix was intended to be removed. - Current:: if funcname.startswith("context."): self.funcname = funcname.replace("context.", "") self.contextfunc = True else: self.funcname = funcname self.contextfunc = False - Improved:: if funcname.startswith("context."): self.funcname = funcname.cutprefix("context.") self.contextfunc = True else: self.funcname = funcname self.contextfunc = False - Arguably further improved:: self.contextfunc = funcname.startswith("context.") self.funcname = funcname.cutprefix("context.") test_i18n.py ------------ - Current:: if test_func_name.startswith("test_"): print(test_func_name[5:]) else: print(test_func_name) - Improved:: print(test_finc_name.cutprefix("test_")) - Current:: if creationDate.endswith('\\n'): creationDate = creationDate[:-len('\\n')] - Improved:: creationDate = creationDate.cutsuffix('\\n') shared_memory.py ---------------- - Current:: reported_name = self._name if _USE_POSIX and self._prepend_leading_slash: if self._name.startswith("/"): reported_name = self._name[1:] return reported_name - Improved:: if _USE_POSIX and self._prepend_leading_slash: return self._name.cutprefix("/") return self._name build-installer.py ------------------ - Current:: if archiveName.endswith('.tar.gz'): retval = os.path.basename(archiveName[:-7]) if ((retval.startswith('tcl') or retval.startswith('tk')) and retval.endswith('-src')): retval = retval[:-4] - Improved:: if archiveName.endswith('.tar.gz'): retval = os.path.basename(archiveName[:-7]) if retval.startswith(('tcl', 'tk')): retval = retval.cutsuffix('-src') Depending on personal style, ``archiveName[:-7]`` could also be changed to ``archiveName.cutsuffix('.tar.gz')``. test_core.py ------------ - Current:: if output.endswith("\n"): output = output[:-1] - Improved:: output = output.cutsuffix("\n") cookiejar.py ------------ - Current:: def strip_quotes(text): if text.startswith('"'): text = text[1:] if text.endswith('"'): text = text[:-1] return text - Improved:: def strip_quotes(text): return text.cutprefix('"').cutsuffix('"') - Current:: if line.endswith("\n"): line = line[:-1] - Improved:: line = line.cutsuffix("\n") fixdiv.py --------- - Current:: def chop(line): if line.endswith("\n"): return line[:-1] else: return line - Improved:: def chop(line): return line.cutsuffix("\n") test_concurrent_futures.py -------------------------- In the following example, the meaning of the code changes slightly, but in context, it behaves the same. - Current:: if name.endswith(('Mixin', 'Tests')): return name[:-5] elif name.endswith('Test'): return name[:-4] else: return name - Improved:: return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test') msvc9compiler.py ---------------- - Current:: if value.endswith(os.pathsep): value = value[:-1] - Improved:: value = value.cutsuffix(os.pathsep) test_pathlib.py --------------- - Current:: self.assertTrue(r.startswith(clsname + '('), r) self.assertTrue(r.endswith(')'), r) inner = r[len(clsname) + 1 : -1] - Improved:: self.assertTrue(r.startswith(clsname + '('), r) self.assertTrue(r.endswith(')'), r) inner = r.cutprefix(clsname + '(').cutsuffix(')') Rejected Ideas ============== Expand the lstrip and rstrip APIs --------------------------------- Because ``lstrip`` takes a string as its argument, it could be viewed as taking an iterable of length-1 strings. The API could therefore be generalized to accept any iterable of strings, which would be successively removed as prefixes. While this behavior would be consistent, it would not be obvious for users to have to call ``'foobar'.cutprefix(('foo,))`` for the common use case of a single prefix. Allow multiple prefixes ----------------------- Some users discussed the desire to be able to remove multiple prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``. However, this adds ambiguity about the order in which the prefixes are removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``. After this proposal, this can be spelled explicitly as ``s.cutprefix('Foo').cutprefix('FooBar')``. Remove multiple copies of a prefix ---------------------------------- This is the behavior that would be consistent with the aforementioned expansion of the ``lstrip/rstrip`` API -- repeatedly applying the function until the argument is unchanged. This behavior is attainable from the proposed behavior via the following:: >>> s = 'foo' * 100 + 'bar' >>> while s != (s := s.cutprefix("foo")): pass >>> s 'bar' The above can be modififed by chaining multiple ``cutprefix`` calls together to achieve the full behavior of the ``lstrip``/``rstrip`` generalization, while being explicit in the order of removal. While the proposed API could later be extended to include some of these use cases, to do so before any observation of how these methods are used in practice would be premature and may lead to choosing the wrong behavior. Raising an exception when not found ----------------------------------- There was a suggestion that ``s.cutprefix(pre)`` should raise an exception if ``not s.startswith(pre)``. However, this does not match with the behavior and feel of other string methods. There could be ``required=False`` keyword added, but this violates the KISS principle. Alternative Method Names ------------------------ Several alternatives method names have been proposed. Some are listed below, along with commentary for why they should be rejected in favor of ``cutprefix`` (the same arguments hold for ``cutsuffix``) ``ltrim`` "Trim" does in other languages (e.g. JavaScript, Java, Go, PHP) what ``strip`` methods do in Python. ``lstrip(string=...)`` This would avoid adding a new method, but for different behavior, it's better to have two different methods than one method with a keyword argument that select the behavior. ``cut_prefix`` All of the other methods of the string API, e.g. ``str.startswith()``, use ``lowercase`` rather than ``lower_case_with_underscores``. ``cutleft``, ``leftcut``, or ``lcut`` The explicitness of "prefix" is preferred. ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc. All of these might have been acceptable, but they have more characters than ``cut``. Some suggested that the verb "cut" implies mutability, but the string API already contains verbs like "replace", "strip", "split", and "swapcase". ``stripprefix`` Users may benefit from the mnemonic that "strip" means working with sets of characters, while other methods work with substrings, so re-using "strip" here should be avoided. Reference Implementation ======================== See the pull request on GitHub [#pr]_. References ========== .. [#pr] GitHub pull request with implementation (https://github.com/python/cpython/pull/18939) .. [#pyid] Discussion on Python-Ideas (https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSU...) .. [#confusion] Comment listing Bug Tracker and StackOverflow issues (https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFI...) Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 03/20/2020 11:52 AM, Dennis Sweeney wrote:
Thank you, Dennis, for putting this together! And Eric for sponsoring. :) Overall I think it's a good idea, but...
Um, what mnemonic? I am strongly opposed to the chosen names of `cut*` -- these methods do basically the same thing as the existing `strip` methods (remove something from either end of a string), and so should have similar names: - the existence of `stripsuffix` is a clue/reminder that `strip` doesn't work with substrings - if all of these similar methods have similar names they will be grouped together in the documentation making discovery of the correct one much easier. So for this iteration of the PEP, I am -1 -- ~Ethan~
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Thanks for the feedback! I meant mnemonic as in the broader sense of "way of remembering things", not some kind of rhyming device or acronym. Maybe "mnemonic" isn't the perfect word. I was just trying to say that the structure of how the methods are named should how their behavior relates to one another, which it seems you agree with. Fair enough that ``[l/r]strip`` and the proposed methods share the behavior of "removing something from the end of a string". From that perspective, they're similar. But my thought was that ``s.lstrip("abc")`` has extremely similar behavior when changing "lstrip" to "rstrip" or "strip" -- the argument is interpreted in the exactly same way (as a character set) in each case. Looking at how the argument is used, I'd argue that ``lstrip``/``rstrip``/``strip`` are much more similar to each other than they are to the proposed methods, and that the proposed methods are perhaps more similar to something like ``str.replace``. But it does seem pretty subjective what the threshold is for behavior similar enough to have related names -- I see where you're coming from. Also, the docs at ( https://docs.python.org/3/library/stdtypes.html?highlight=lstrip#string-meth... ) are alphabetical, not grouped by "similar names", so even ``lstrip``, ``strip``, and ``rstrip`` are already in different places. Maybe the name "stripprefix" would be more discoverable when "Ctrl-f"ing the docs, if it weren't for the following addition in the linked PR: .. method:: str.lstrip([chars]) Return a copy of the string with leading characters removed. The *chars* argument is a string specifying the set of characters to be removed. If omitted or ``None``, the *chars* argument defaults to removing whitespace. The *chars* argument is not a prefix; rather, all combinations of its values are stripped:: >>> ' spacious '.lstrip() 'spacious ' >>> 'www.example.com'.lstrip('cmowz.') 'example.com' + See :meth:`str.cutprefix` for a method that will remove a single prefix + string rather than all of a set of characters.
data:image/s3,"s3://crabby-images/5d1a9/5d1a957dd9ea9fe19db91fcae4b9621af102dd72" alt=""
On Fri, 20 Mar 2020 20:49:12 -0000 "Dennis Sweeney" <sweeney.dennis650@gmail.com> wrote:
Correct, but I don't like the word "cut" because it suggests that something is cut into pieces which can be used later separately. I'd propose to use "trim" instead of "cut" because it makes clear that something is cut off and discarded, and it is clearly different from "strip".
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 21Mar2020 14:17, musbur@posteo.org <musbur@posteo.org> wrote:
Please, NO. "trim" is a VERY well known PHP function, and does what our strip does. I've very against this (otherwise fine) word for this reason. I still prefer "cut", though the consensus seems to be for "strip". Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/b8491/b8491be6c910fecbef774491deda81cc5f10ed6d" alt=""
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
For clarity, I'll change If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. to If ``s`` does not have ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns ``s`` or an unchanged copy of ``s``. For consistency with the Specification section, I'll also change s[len(pre):] if s.startswith(pre) else s to s[len(pre):] if s.startswith(pre) else s[:] and similarly change the ``cutsuffix`` snippet.
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so. This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be. So one writes (assuming methods): prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also I am explicitly in favour of returning self if unchanged. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not. Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question. Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix". Eric
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings. Victor Le sam. 21 mars 2020 à 17:23, Eric V. Smith <eric@trueblade.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Good point. And I guess it's still a problem for interned strings, since even a copy could be the same object:
So I now agree with Ned, we shouldn't be prescriptive here, and we should explicitly say in the PEP that there's no way to tell if the strip/cut/whatever took place, other than comparing via equality, not identity. Eric
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
In that case, the PEP should advice to use .startwith() or .endswith() explicitly if the caller requires to know if the string is going to be modified. Example: modified = False # O(n) complexity where n=len("prefix:") if line.startswith("prefix:"): line = line.cutprefix("prefix: ") modified = True It should be more efficient than: old_line = line line = line.cutprefix("prefix: ") modified = (line != old_line) # O(n) complexity where n=len(line) since the checked prefix is usually way shorter than the whole string. Victor Le sam. 21 mars 2020 à 17:45, Eric V. Smith <eric@trueblade.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
Are you suggesting that it could become impossible to write this function: def myself(o): return o and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language.
Unless Victor asserts that a function like myself() above cannot be relied on to have its return value "is" its passed in value, I disagree. The beauty of returning the original object on no change is that the test is O(1) and the criterion is clear. It is easy to document that stripping an empty affix returns the original string. I guess a test for len(stripped_string) == len(unstripped_string) is also O(1), and is less prescriptive. I just don't see the weight to Ned's characterisation of "a is/is-not b" as overly prescriptive; returning the same reference as one is given seems nearly the easiest thing a function can ever do. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
Other way around - because strings are immutable, their identity isn't supposed to matter, so it's possible that functions that currently return the exact same object in some cases may in the future start returning a different object with the same value. Right now, in CPython, with no tagged pointers, we return the full existing pointer wherever we can, as that saves us a data copy. With tagged pointers, the pointer storage effectively *is* the instance, so you can't really replicate that existing "copy the reference not the storage" behaviour any more. That said, it's also possible that identity for tagged pointers would be value based (similar to the effect of the small integer cache and string interning), in which case the entire question would become moot. Either way, the PEP shouldn't be specifying that a new object *must* be returned, and it also shouldn't be specifying that the same object *can't* be returned. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I don't see any rationale in the PEP or in the python-ideas thread (admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass" there). Is this just for consistency with other methods like .casefold? I can understand why you'd want it to be consistent, but I think it's misguided in this case. It adds unnecessary complexity for subclass implementers to need to re-implement these two additional methods, and I can see no obvious reason why this behavior would be necessary, since these methods can be implemented in terms of string slicing. Even if you wanted to use `str`-specific optimizations in C that aren't available if you are constrained to use the subclass's __getitem__, it's inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast path" that doesn't use slice. I think defining this in terms of string slicing makes the most sense (and, notably, slice itself returns `str` unless explicitly overridden, the default is for it to return `str` anyway...). Either way, it would be nice to see the rationale included in the PEP somewhere. Best, Paul On 3/22/20 7:16 AM, Eric V. Smith wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
tl; dr A method implemented in C is more efficient than hand-written pure-Python code, and it's less error-prone I don't think if it has already been said previously, but I hate having to compute manually the string length when writing: if line.startswith("prefix"): line = line[6:] Usually what I do is to open a Python REPL and I type: len("prefix") and copy-paste the result :-) Passing directly the length is a risk of mistake. What if I write line[7:] and it works most of the time because of a space, but sometimes the space is omitted randomly and the application fails? -- The lazy approach is: if line.startswith("prefix"): line = line[len("prefix"):] Such code makes my "micro-optimizer hearth" bleeding since I know that Python is stupid and calls len() at runtime, the compiler is unable to optimize it (sadly for good reasons, len name can be overriden) :-( => line.cutprefix("prefix") is more efficient! ;-) It's also also shorter. Victor Le dim. 22 mars 2020 à 17:02, Paul Ganssle <paul@ganssle.io> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
Sorry, I think I accidentally left out a clause here - I meant that the rationale for /always returning a 'str'/ (as opposed to returning a subclass) is missing, it just says in the PEP:
I think the rationale for these differences is not made entirely clear, specifically the "and will cast subclasses of str to builtin str objects" part. I think it would be best to define the truncation in terms of __getitem__ - possibly with the caveat that implementations are allowed (but not required) to return `self` unchanged if no match is found. Best, Paul P.S. Dennis - just noticed in this reply that there is a typo in the PEP - s/instace/instance On 3/22/20 12:15 PM, Victor Stinner wrote:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sun, Mar 22, 2020 at 4:20 AM Eric V. Smith <eric@trueblade.com> wrote:
Yes. Returning self if the class is exactly str is *just* an optimization -- it must not be mandated nor ruled out. And we *have* to decide that it returns a plain str instance if called on a subclass instance (unless overridden, of course) since the base class (str) won't know the signature of the subclass constructor. That's also why all other str methods return an instance of plain str when called on a subclass instance. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
My suggestion is to rely on __getitem__ here (for subclasses), in which case we don't actually need to know the subclass constructor. The rough implementation in the PEP shows how to do it without needing to know the subclass constructor: def redbikeshed(self, prefix): if self.startswith(pre): return self[len(pre):] return self[:] The actual implementation doesn't need to be implemented that way, as long as the result is always there result of slicing the original string, it's safe to do so* and more convenient for subclass implementers (who now only have to implement __getitem__ to get the affix-trimming functions for free). One downside to this scheme is that I think it makes getting the type hinting right more complicated, since the return type of these functions is basically, "Whatever the return type of self.__getitem__ is", but I don't think anyone will complain if you write -> str with the understanding that __getitem__ should return a str or a subtype thereof. Best, Paul *Assuming they haven't messed with __getitem__ to do something non-standard, but if they've done that I think they've tossed Liskov substitution out the window and will have to re-implement these methods if they want them to work. On 3/22/20 2:03 PM, Guido van Rossum wrote:
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 21/03/2020 16:15, Eric V. Smith wrote:
*If* no python implementation would have a problem with the "is" test (and from a position of total ignorance I would guess that this is the case :-)), then it would be a useful feature and it is easier to define it now than try to force conformance later. I have no problem with 's.startswith("") == True and s.cutprefix("") is s'. YMMV. Rob Cliffe
data:image/s3,"s3://crabby-images/25c1c/25c1c3af6a72513b68fa05e2e58c268428e42e0d" alt=""
On 3/21/20 12:51 PM, Rob Cliffe via Python-Dev wrote:
Why take on that "*If*" conditional? We're constantly telling people not to compare strings with "is". So why define how "is" will behave in this PEP? It's the implementation's decision whether to return a new immutable object with the same value, or the same object. As Steven points out elsewhere in this thread, Python's builtins' behavior differ, across methods and versions, in this regard. I certainly didn't know that, and it was probably news to you as well. So why do we need to nail it down for suffixes and prefixes? There will be no conformance to force later, because if the value doesn't change, then it doesn't matter whether it's a new string or the same string. --Ned.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sat, Mar 21, 2020 at 12:15:21PM -0400, Eric V. Smith wrote:
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
I agree with Ned -- whether the string object is returned unchanged or a copy is an implementation decision, not a language decision. [Eric]
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing.
We don't make this guarantee about string identity for any other string method, and CPython's behaviour varies from method to method: py> s = 'a b c' py> s is s.strip() True py> s is s.lower() False and version to version: py> s is s.replace('a', 'a') # 2.7 False py> s is s.replace('a', 'a') # 3.5 True I've never seen anyone relying on this behaviour, and I don't expect these new methods will change that. Thinking that `is` is another way of writing `==`, yes, I see that frequently. But relying on object identity to see whether a new string was created by a method, no. If you want to know whether a prefix/suffix was removed, there's a more reliable way than identity and a cheaper way than O(N) equality. Just compare the length of the string before and after. If the lengths are the same, nothing was removed. -- Steven
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 22Mar2020 05:09, Steven D'Aprano <steve@pearwood.info> wrote:
Well, ok, expressed on this basis, colour me convinced. I'm not ok with not mandating that no change to the string returns an equal string (but, really, _only_ because i can do a test with len(), as I consider a test of content wildly excessive - potentially quite expensive - strings are not always short).
Aye. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Hi Dennis, Thanks for writing a proper PEP. It easier to review a specification than an implementation. Le ven. 20 mars 2020 à 20:00, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
It would be nice to describe the behavior of these methods in a short sentence here.
IMHO the abstract should stop here. You should move the above text in the Specification section. The abstract shouldn't go into details.
(...)
I'm not sure that I'm comfortable with not specifying if the method must return the string unmodified or return a copy if it doesn't start with the prefix. It can subtle causes: see the "Allow multiple prefixes" example which expects that it doesn't return a copy. Usually, PyPy does its best to mimick exactly CPython behavior anyway, since applications rely on CPython exact behavior (even if it's bad thing). Hopefully, Python 3.8 started to emit a SyntaxWarning when "is" operator is used to compare an object to a string (like: x is "abc"). I suggest to always require to return the unmodified string. Honestly, it's not hard to guarantee and implement this behavior in Python! IMHO you should also test if pre is non-empty just to make the intent more explicit. Note: please rename "pre" to "prefix". In short, I propose: def cutprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix) and prefix: return self[len(prefix):] else: return self I call startswith() before testing if pre is non-empty to inherit of startswith() input type validation. For example, "a".startswith(b'x') raises a TypeError. I also suggest to avoid/remove the duplicated "rough specification" of the abstract: "s[len(pre):] if s.startswith(pre) else s". Only one specification per PEP is enough ;-)
The two methods will also be added to ``collections.UserString``, where they rely on the implementation of the new ``str`` methods.
I don't think that mentioning "where they rely on the implementation of the new ``str`` methods" is worth it. The spec can leave this part to the implementation.
IMO there are too many examples. For example, refactor.py and c_annotations.py are more or less the same. Just keep refactor.py. Overall, 2 or 3 examples should be enough.
I like the ability to specify multiple prefixes or suffixes. If the order is an issue, only allow tuple and list types and you're done. I don't see how disallowing s.cutprefix(('Foo', 'FooBar')) but allowing s.cutprefix('Foo').cutprefix('FooBar') prevents any risk of mistake. I'm sure that there are many use cases for cutsuffix() accepting multiple suffixes. IMO it makes the method even more attractive and efficient. Example to remove newline suffix (Dos, Unix and macOS newlines): line.cutsuffix(("\r\n", "\n", "\r")). It's not ambitious: "\r\n" is tested first explicitly, then "\r".
Well, even if it's less efficient, I think that I would prefer to write: while s.endswith("\n"): s = s.cutsuffix("\n") ... especially because the specification doesn't (currently) require to return the string unmodified if it doesn't end with the suffix...
You may add that it makes cutprefix() and cutsuffix() methods consistent with the strip() functions family. "abc".strip() doesn't raise. startswith() and endswith() methods can be used to explicitly raise an exception if there is no match. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Hi Victor. I accidentally created a new thread, but I intended everything below as a response: Thanks for the review!
This still erroneously accepts tuples and and would return return str subclasses unchanged. If we want to make the Python be the spec with accuracy about type-checking, then perhaps we want: def cutprefix(self: str, prefix: str, /) -> str: if not isinstance(prefix, str): raise TypeError(f'cutprefix() argument must be str, ' f'not {type(prefix).__qualname__}') self = str(self) prefix = str(prefix) if self.startswith(prefix): return self[len(prefix):] else: return self For accepting multiple prefixes, I can't tell if there's a consensus about whether ``s = s.cutprefix("a", "b", "c")`` should be the same as for prefix in ["a", "b", "c"]: s = s.cutprefix(prefix) or for prefix in ["a", "b", "c"]: if s.startwith(prefix): s = s.cutprefix(prefix) break The latter seems to be harder for users to implement through other means, and it's the behavior that test_concurrent_futures.py has implemented now, so maybe that's what we want. Also, it seems more elegant to me to accept variadic arguments, rather than a single tuple of arguments. Is it worth it to match the related-but-not-the-same API of "startswith" if it makes for uglier Python? My gut reaction is to prefer the varargs, but maybe someone has a different perspective. I can submit a revision to the PEP with some changes soon.
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Le dim. 22 mars 2020 à 01:45, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
I expect that "FooBar".cutprefix(("Foo", "Bar")) returns "Bar". IMO it's consistent with "FooFoo".cutprefix("Foo") which only returns "Foo" and not "": https://www.python.org/dev/peps/pep-0616/#remove-multiple-copies-of-a-prefix If you want to remove both prefixes, "FooBar".cutprefix("Foo").cutprefix("Bar") should be called to get "".
I suggest to accept a tuple of strings: str.cutprefix(("prefix1", "prefix2")) To be consistent with startswith(): str.startswith(("prefix1", "prefix2")) cutprefix() and startswith() can be used together and so I would prefer to have the same API: prefixes = ("context: ", "ctx:") has_prefix = False if line.startswith(prefixes): line = line.cutprefix(prefixes) has_prefix = True A different API would look more surprising, no? Compare it to: prefixes = ("context: ", "ctx:") has_prefix = False if line.startswith(prefixes): line = line.cutprefix(*prefixes) # <== HERE has_prefix = True The difference is even more visible is you pass directly the prefixes: .cutprefix("context: ", "ctx:") vs .cutprefix(("context: ", "ctx:")) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/6fc99/6fc992c2163c5d797b9e90e26d1ea9688535e4ee" alt=""
On Fri, Mar 20, 2020 at 3:28 PM Victor Stinner <vstinner@python.org> wrote:
I tend to be mistrustful of code that tries to guess the best thing to do, when something expected isn't found. How about: def cutprefix(self: str, pre: str, raise_on_no_match: bool=False, /) -> str: if self.startswith(pre): return self[len(pre):] if raise_on_no_match: raise ValueError('prefix not found') return self[:]
data:image/s3,"s3://crabby-images/abc12/abc12520d7ab3316ea400a00f51f03e9133f9fe1" alt=""
On 23/03/2020 14:50, Dan Stromberg wrote:
I'm firmly of the opinion that the functions should either raise or not, and should definitely not have a parameter to switch behaviours. Probably it should do nothing; if the programmer needs to know that the prefix wasn't there, cutprefix() probably wasn't the right thing to use anyway. -- Rhodri James *-* Kynesim Ltd
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith. -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
Prefix and suffix aren't jargon. They teach those words to kids in primary school. Why the concern over "jargon"? We happily talk about exception, metaclass, thread, process, CPU, gigabyte, async, ethernet, socket, hexadecimal, iterator, class, instance, HTTP, boolean, etc without blinking, but you're shying at prefix and suffix? -- Steven
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Even then, it seems that prefix is an established computer science term: [1] https://en.wikipedia.org/wiki/Substring#Prefix [2] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). Chapter 15.4: Longest common subsequence And a quick search reveals that it's used hundreds of times in the docs: https://docs.python.org/3/search.html?q=prefix
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Sat, Mar 21, 2020 at 11:35 AM Steven D'Aprano <steve@pearwood.info> wrote:
Whereas they don't have to teach "start" and "end", because kids already know them before they start school.
Yeah. Jargon is fine when there's no regular word with appropriate precision, but we shouldn't use jargon just for jargon's sake. Python has a long tradition of preferring regular words when possible, e.g. using not/and/or instead of !/&&/||, and startswith/endswith instead of hasprefix/hassuffix. -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Mar 22, 2020 at 1:02 PM Nathaniel Smith <njs@pobox.com> wrote:
Given that the word "prefix" appears in help("".startswith), I don't think there's really a lot to be gained by arguing this point :) There's absolutely nothing wrong with the word. But Dennis, welcome to the wonderful world of change proposals, where you will experience insane amounts of pushback and debate on the finest points of bikeshedding, whether or not people actually even support the proposal at all... ChrisA
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Lol -- thanks! In my mind, another reason that I like including the words "prefix" and "suffix" over "start" and "end" is that, even though using the verb "end" in "endswith" is unambiguous, the noun "end" can be used as either the initial or final end, as in "remove this thing from both ends of the string. So "suffix" feels more precise to me.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com> wrote:
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation) I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API. The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"? * uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes. We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string: s = s.replacestart('context.' , '') This approach would also very cleanly handle the last example from the PEP: s = s.replaceend(('Mixin', 'Tests', 'Test'), '') The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name. Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that): s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None Cheers, Nick.
data:image/s3,"s3://crabby-images/db629/db629be3404f4763b49bef32351c2f48b5904d7c" alt=""
Nick Coghlan wrote:
FWIW, I don't place as much value on being consistent with "startswith()" and "endswith()". But with it being substring based, I think the term "replace" actually makes a lot more sense here compared to "cut". +1 On Sat, Mar 21, 2020 at 9:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
To my language sense, hasprefix/hassuffix are horrible compared to startswith/endswith. If you were to talk about this kind of condition using English instead of Python, you wouldn't say "if x has prefix y", you'd say "if x starts with y". (I doubt any programming language uses hasPrefix or has_prefix for this, making it a strawman.) *But*, what would you say if you wanted to express the idea or removing something from the start or end? It's pretty verbose to say "remove y from the end of x", and it's not easy to translate that into a method name. x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which confuses the reader. The thing is that it's hard to translate "starts" (a verb) into a noun -- the "start" of something is its very beginning (i.e., in Python, position zero), while a "prefix" is a noun that specifically describes an initial substring (and I'm glad we don't have to use *that* :-).
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)
This feels like a hypergeneralization. In 99.9% of use cases we just need to remove the prefix or suffix. If you want to replace the suffix with something else, you can probably use string concatenation. (In the one use case I can think of, changing "foo.c" into "foo.o", it would make sense that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o" actually works better there.
This approach would also very cleanly handle the last example from the PEP:
s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
Maybe the proposed functions can optionally take a tuple of prefixes/suffixes, like startswith/endswith do?
Agreed on the second part, I just really don't like the 'ee'.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/16a69/16a6968453d03f176e5572028dae0140728f2a26" alt=""
On 22.03.2020 6:38, Guido van Rossum wrote:
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
-- Regards, Ivan
data:image/s3,"s3://crabby-images/db629/db629be3404f4763b49bef32351c2f48b5904d7c" alt=""
Ivan Pozdeez wrote:
In this case, being in line with the existing string API method names take priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines, etc. Although I agree that an underscore would probably be a bit easier to read here, it would be rather confusing to randomly swap between the naming convention for the same API. The benefit gained in *slightly *easier readability wouldn't make up for the headache IMO. On Sun, Mar 22, 2020 at 12:13 AM Ivan Pozdeev via Python-Dev < python-dev@python.org> wrote:
data:image/s3,"s3://crabby-images/db629/db629be3404f4763b49bef32351c2f48b5904d7c" alt=""
Oops, I just realized that I wrote "splitlines" twice there. I guess that goes to show how much I use that specific method in comparison to the others, but the point still stands. Here's a more comprehensive set of existing string methods to better demonstrate it (Python 3.8.2):
On Sun, Mar 22, 2020 at 12:17 AM Kyle Stanley <aeros167@gmail.com> wrote:
data:image/s3,"s3://crabby-images/f81c3/f81c349b494ddf4b2afda851969a1bfe75852ddf" alt=""
Nice PEP! That this discussion wound up in the NP-complete "naming things" territory as the main topic right from the start/prefix/beginning speaks highly of it. :) The only things left I have to add are (a) agreed on don't specify if it is a copy or not for str and bytes.. BUT (b) do specify that for bytearray. Being the only mutable type, it matters. Consistency with other bytearray methods based on https://docs.python.org/3/library/stdtypes.html#bytearray suggests copy. (Someone always wants inplace versions of bytearray methods, that is a separate topic not for this pep) Fwiw I *like* your cutprefix/suffix names. Avoiding the terms strip and trim is wise to avoid confusion and having the name read as nice English is Pythonic. I'm not going to vote on other suggestions. -gps On Sat, Mar 21, 2020, 9:32 PM Kyle Stanley <aeros167@gmail.com> wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Le dim. 22 mars 2020 à 06:07, Gregory P. Smith <greg@krypto.org> a écrit :
Nice PEP! That this discussion wound up in the NP-complete "naming things" territory as the main topic right from the start/prefix/beginning speaks highly of it. :)
Maybe we should have a rule to disallow bikeshedding until the foundations of a PEP are settled. Or always create two threads per PEP: one for bikeshedding only, one for otherthing else :-D Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sat, Mar 21, 2020 at 8:38 PM Guido van Rossum <guido@python.org> wrote:
Thinking a bit more, I could also get behind "removeprefix" and "removesuffix". -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I like "removeprefix" and "removesuffix". My only concern before had been length, but three more characters than "cut***fix" is a small price to pay for clarity.
data:image/s3,"s3://crabby-images/c4bfb/c4bfb7eb8049a34be5d35c8edd3c4971f86ed5f2" alt=""
On Sun, Mar 22, 2020 at 05:00:10AM -0000, Dennis Sweeney wrote:
I personally rely on auto-complete of my editor while writing. So, thinking about these these methods in "correct" terms might be more important to me that the length. +1 for removeprefix and removesuffix. Thanks, Senthil
data:image/s3,"s3://crabby-images/ba804/ba8041e10e98002f080f774cae147a628a117cbc" alt=""
On 2020-03-21 20:38, Guido van Rossum wrote:
To jump on the bikeshed, trimprefix and trimsuffix are the best I've read so far, due to the definitions of the words in English. Though often used interchangeably, when I think of "strip" I think of removing multiple things, somewhat indiscriminately with an arm motion, which is how the functions currently work. e.g. "strip paint", "strip clothes": https://www.dictionary.com/browse/strip to take away or remove When I think of trim, I think more of a single cut of higher precision with scissors. e.g. "trim hair", "trim branches": https://www.dictionary.com/browse/trim to put into a neat or orderly condition by clipping… Which is what this method would do. That trim matches Go is a small but decent benefit. Another person warned against inconsistency with PHP, but don't think PHP should be considered for design guidance, IMHO. Perhaps as an example of what not to do, which happily is in agreement with the above. -Mike p.s. +1, I do support this PEP, with or without name change, since some mentioned concern over that.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Is there a proven use case for anything other than the empty string as the replacement? I prefer your "replacewhatever" to another "stripwhatever" name, and I think it's clear and nicely fits the behavior you proposed. But should we allow a naming convenience to dictate that the behavior should be generalized to a use case we're not sure exists, where the same same argument is passed 99% of the time? I think a downside would be that a pass-a-string-or-a-tuple-of-strings interface would be more mental effort to keep track of than a ``*args`` variadic interface for "(cut/remove/without/trim)prefix", even if the former is how ``startswith()`` works.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Sun, 22 Mar 2020 at 14:01, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
Is there a proven use case for anything other than the empty string as the replacement? I prefer your "replacewhatever" to another "stripwhatever" name, and I think it's clear and nicely fits the behavior you proposed. But should we allow a naming convenience to dictate that the behavior should be generalized to a use case we're not sure exists, where the same same argument is passed 99% of the time?
I think so, as if we don't, then we'd end up with the following three methods on str objects (using Guido's suggested names of "removeprefix" and "removesuffix", as I genuinely like those): * replace() * removeprefix() * removesuffix() And the following questions still end up with relatively non-obvious answers: Q: How do I do a replace, but only at the start or end of the string? A: Use "new_prefix + s.removeprefix(old_prefix)" or "s.removesuffix(old_suffix) + new_suffix" Q: How do I remove a substring from anywhere in a string, rather than just from the start or end? A: Use "s.replace(substr, '')" Most of that objection would go away if the PEP added a plain old "remove()" method in addition to removeprefix() and removesuffix(), though - the "replace the substring with an empty string" trick isn't the most obvious spelling in the world, whereas I'd expect a lot folks to reach for "s.remove(substr)" based on the regular sequence API, and I think Guido's right that in many cases where a prefix or suffix is being changed, you also want to add it if the old prefix/suffix is missing (and in the cases where you don't then, then you can either use startswith()/endswith() first, or else check for a length change.
I think a downside would be that a pass-a-string-or-a-tuple-of-strings interface would be more mental effort to keep track of than a ``*args`` variadic interface for "(cut/remove/without/trim)prefix", even if the former is how ``startswith()`` works.
I doubt we'd use *args for any new string methods, precisely because we don't use it for any of the existing ones. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*". And if no consensus is reached in this thread for a name I would assume the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
data:image/s3,"s3://crabby-images/b8491/b8491be6c910fecbef774491deda81cc5f10ed6d" alt=""
On Tue, Mar 24, 2020 at 2:53 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I think name choice is easier if you write the documentation first: cutprefix - Removes the specified prefix. trimprefix - Removes the specified prefix. stripprefix - Removes the specified prefix. removeprefix - Removes the specified prefix. Duh. :)
data:image/s3,"s3://crabby-images/db629/db629be3404f4763b49bef32351c2f48b5904d7c" alt=""
I'm also most strongly in favor of "remove*" (out of the above options). I'm opposed to cut*, mainly because it's too ambiguous in comparison to other options such as "remove*" and "replace*", which would do a much better job of explaining the operation performed. Without the .NET conflict, I would normally be +1 on "trim*" as well; with it in mind though, I'd lower it down to +0. Personally, I don't consider a conflict in a different ecosystem enough to lower it down to -0, but it still has some influence on my preference. So far, the consensus seems to be in favor of "remove*" with several +1s and no arguments against it (as far as I can tell), whereas the other options have been rather controversial. On Tue, Mar 24, 2020 at 3:38 PM Steve Dower <steve.dower@python.org> wrote:
data:image/s3,"s3://crabby-images/f81c3/f81c349b494ddf4b2afda851969a1bfe75852ddf" alt=""
On Tue, Mar 24, 2020 at 11:55 AM Brett Cannon <brett@python.org> wrote:
"raymondLuxuryYacht*" pronounced Throatwobbler Mangrove it is! Never fear, the entire stdlib is full of naming inconsistencies and questionable choices accumulated over time. Whatever is chosen will be lost in the noise and people will happily use it. The original PEP mentioned that trim had a different use in PHP which is why I suggest avoiding that one. I don't know how much crossover there actually is between PHP and Python programmers these days outside of FB. -gps * https://montypython.fandom.com/wiki/Raymond_Luxury-Yacht _______________________________________________
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 24Mar2020 18:49, Brett Cannon <brett@python.org> wrote:
I reiterate my huge -1 on "trim" because it will confuse every PHP user who comes to us from the dark side. Over there "trim" means what our "strip" means. I've got (differing) opinions about the others, but "trim" is a big one to me. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/16a69/16a6968453d03f176e5572028dae0140728f2a26" alt=""
On 20.03.2020 21:52, Dennis Sweeney wrote:
Does it need to be separate methods? Can we augment or even replace *strip() instead? E.g. *strip(chars: str, line: str) -> str As written in the PEP preface, the very reason for the PEP is that people are continuously trying to use *strip methods for the suggested functionality -- which shows that this is where they are expecting to find it. (as a bonus, we'll be saved from bikeshedding debates over the names) --- Then, https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSU... suggests that the use of strip with character set argument may have fallen out of favor since its adoption. If that's the case, it can be deprecated in favor of the new use, thus saving us from extra complexity in perspective.
-- Regards, Ivan
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Mar 22, 2020 at 06:57:52AM +0300, Ivan Pozdeev via Python-Dev wrote:
Does it need to be separate methods?
Yes. Overloading a single method to do two dissimilar things is poor design.
They are only expecting to find it in strip() because there is no other alternative where it could be. There's nothing inherent about strip that means to delete a prefix or suffix, but when the only other choices are such obviously wrong methods as upper(), find(), replace(), count() etc it is easy to jump to the wrong conclusion that strip does what is wanted. -- Steven
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Ivan Pozdeev via Python-Dev writes:
That is true. However, the rule of thumb (due to Guido, IIRC) is if the parameter is normally going to be a literal constant, and there are few such constants (like <= 3), put them in the name of the function rather than as values for an optional parameter. Overloading doesn't save much, if any, typing in this case. That's why we have strip, rstrip, and lstrip in the first place, although nowadays we'd likely spell the modifiers out (and maybe use start/end rather than left/right, which I would guess force BIDI users to translate to start/end on the fly). Steve
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 22Mar2020 08:10, Ivan Pozdeev <vano@mail.mipt.ru> wrote:
That is not the only difference. strip() does not just remove a character from the set provided (as a str). It removes as many of them as there are; that is why "foo.ext".strip(".ext") can actually be quite misleading to someone looking for a suffix remover - it often looks like it did the right thing. By contrast, cutprefix/cutsuffix (or stripsuffix, whatever) remove only _one_ instance of the affix. To my mind they are quite different, which is the basis of my personal dislike of reusing the word "strip". Just extending "strip()" with a funky new affix mode would be even worse, since it can _still_ be misleading if the caller omited the special mode. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/fd43a/fd43a1cccdc1d153ee8e72a25e677f0751134ccc" alt=""
My 2c on the naming: 'start' and 'end' in 'startswith' and 'endswith' are verbs, whereas we're looking for a noun if we want to cut/strip/trim a string. You can use 'start' and 'end' as nouns for this case but 'prefix' and 'suffix' seems a more obvious choice in English to me. Pathlib has `with_suffix()` and `with_name()`, which would give us something like `without_prefix()` or `without_suffix()` in this case. I think the name "strip", and the default (no-argument) behaviour of stripping whitespace implies that the method is used to strip something down to its bare essentials, like stripping a bed of its covers. Usually you use strip() to remove whitespace and get to the real important data. I don't think such an implication holds for removing a *specific* prefix/suffix. I also don't much like "strip" as the semantics are quite different - if i'm understanding correctly, we're removing a *single* instance of a *single* *multi-character* string. A verb like "trim" or "cut" seems appropriate to highlight that difference. Barney On Fri, 20 Mar 2020 at 18:59, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Dennis: please add references to past discussions in python-ideas and python-dev. Link to the first email of each thread in these lists. Victor
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Here's an updated version. Online: https://www.python.org/dev/peps/pep-0616/ Source: https://raw.githubusercontent.com/python/peps/master/pep-0616.rst Changes: - More complete Python implementation to match what the type checking in the C implementation would be - Clarified that returning ``self`` is an optimization - Added links to past discussions on Python-Ideas and Python-Dev - Specified ability to accept a tuple of strings - Shorter abstract section and fewer stdlib examples - Mentioned - Typo and formatting fixes I didn't change the name because it didn't seem like there was a strong consensus for an alternative yet. I liked the suggestions of ``dropprefix`` or ``removeprefix``. All the best, Dennis
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 22/03/2020 22:25, Dennis Sweeney wrote:
Proofreading: it would not be obvious for users to have to call 'foobar'.cutprefix(('foo,)) for the common use case of a single prefix. Missing single quote after the last foo.
or the more obvious and readable alternative:
Er no, in both these examples s is reduced to an empty string. Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 22Mar2020 23:33, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
That surprises me too. I expect the first matching affix to be used. It is the only way for the caller to have a predictable policy. As a diversion, _are_ there use cases where an empty affix is useful or reasonable or likely? Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Cameron Simpson writes:
As a diversion, _are_ there use cases where an empty affix is useful or reasonable or likely?
In the "raise on failure" design, "aba".cutsuffix('.doc') raises, "aba".cutsuffix('.doc', '') returns "aba". BTW, since I'm here, thanks for your discussion of context managers for loop invariants. It was very enlightening.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Mar 22, 2020 at 10:25:28PM -0000, Dennis Sweeney wrote:
I am concerned about that tuple of strings feature. First, an implementation question: you do this when the prefix is a tuple: if isinstance(prefix, tuple): for option in tuple(prefix): if not isinstance(option, str): raise TypeError() option_str = str(option) which looks like two unnecessary copies: 1. Having confirmed that `prefix` is a tuple, you call tuple() to make a copy of it in order to iterate over it. Why? 2. Having confirmed that option is a string, you call str() on it to (potentially) make a copy. Why? Aside from those questions about the reference implementation, I am concerned about the feature itself. No other string method that returns a modified copy of the string takes a tuple of alternatives. * startswith and endswith do take a tuple of (pre/suff)ixes, but they don't return a modified copy; they just return a True or False flag; * replace does return a modified copy, and only takes a single substring at a time; * find/index/partition/split etc don't accept multiple substrings to search for. That makes startswith/endswith the unusual ones, and we should be conservative before emulating them. The difficulty here is that the notion of "cut one of these prefixes" is ambiguous if two or more of the prefixes match. It doesn't matter for startswith: "extraordinary".startswith(('ex', 'extra')) since it is True whether you match left-to-right, shortest-to-largest, or even in random order. But for cutprefix, which prefix should be deleted? Of course we can make a ruling by fiat, right now, and declare that it will cut the first matching prefix reading left to right, whether that's what users expect or not. That seems reasonable when your prefixes are hard-coded in the source, as above. But what happens here? prefixes = get_prefixes('user.config') result = mystring.cutprefix(prefixes) Whatever decision we make -- delete the shortest match, longest match, first match, last match -- we're going to surprise and annoy the people who expected one of the other behaviours. This is why replace() still only takes a single substring to match and this isn't supported: "extraordinary".replace(('ex', 'extra'), '') We ought to get some real-life exposure to the simple case first, before adding support for multiple prefixes/suffixes. -- Steven
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Steven D'Aprano wrote:
This was an attempt to ensure no one can do funny business with tuple or str subclassing. I was trying to emulate the ``PyTuple_Check`` followed by ``PyTuple_GET_SIZE`` and ``PyTuple_GET_ITEM`` that are done by the C implementation of ``str.startswith()`` to ensure that only the tuple/str methods are used, not arbitrary user subclass code. It seems that that's what most of the ``str`` methods force. I was mistaken in how to do this with pure Python. I believe I actually wanted something like: def cutprefix(self, prefix, /): if not isinstance(self, str): raise TypeError() if isinstance(prefix, tuple): for option in tuple.__iter__(prefix): if not isinstance(option, str): raise TypeError() if str.startswith(self, option): return str.__getitem__( self, slice(str.__len__(option), None)) return str.__getitem__(self, slice(None, None)) if not isinstance(prefix, str): raise TypeError() if str.startswith(self, prefix): return str.__getitem__(self, slice(str.__len__(prefix), None)) else: return str.__getitem__(self, slice(None, None)) ... which looks even uglier.
We ought to get some real-life exposure to the simple case first, before adding support for multiple prefixes/suffixes.
I could be (and have been) convinced either way about whether or not to generalize to tuples of strings. I thought Victor made a good point about compatibility with ``startswith()``
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 24/03/20 3:43 pm, Dennis Sweeney wrote:
The C code uses those functions for efficiency, not to prevent "funny business". PyTuple_GET_SIZE and PyTuple_GET_ITEM are macros that directly access fields of the tuple struct, and PyTuple_Check is much faster than a full isinstance check. There is no point in trying to emulate these in Python code. -- Greg
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I think my confusion is about just how precise this sort of "reference implementation" should be. Should it behave with ``str`` and ``tuple`` subclasses exactly how it would when implemented? If so, I would expect the following to work: class S(str): __len__ = __getitem__ = __iter__ = None class T(tuple): __len__ = __getitem__ = __iter__ = None x = str.cutprefix("FooBar", T(("a", S("Foo"), 17))) assert x == "Bar" assert type(x) is str and so I think the ``str.__getitem__(self, slice(str.__len__(prefix), None))`` monstrosity would be the most technically correct, unless I'm missing something. But I've never seen Python code so ugly. And I suppose this is a slippery slope -- should it also guard against people redefining ``len = lambda x: 5`` and ``str = list`` in the global scope? Clearly not. I think then maybe it would be preferred to use the something like the following in the PEP: def cutprefix(self, prefix, /): if isinstance(prefix, str): if self.startswith(prefix): return self[len(prefix):] return self[:] elif isinstance(prefix, tuple): for option in prefix: if self.startswith(option): return self[len(option):] return self[:] else: raise TypeError() def cutsuffix(self, suffix): if isinstance(suffix, str): if self.endswith(suffix): return self[:len(self)-len(suffix)] return self[:] elif isinstance(suffix, tuple): for option in suffix: if self.endswith(option): return self[:len(self)-len(option)] return self[:] else: raise TypeError() The above would fail the assertions as written before, but would pass them for subclasses ``class S(str): pass`` and ``class T(tuple): pass`` that do not override any dunder methods. Is this an acceptable compromise if it appears alongside a clarifying sentence like the following? These methods should always return base ``str`` objects, even when called on ``str`` subclasses. I'm looking for guidance as to whether that's an appropriate level of precision for a PEP. If so, I'll make that change. All the best, Dennis
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Mar 24, 2020 at 08:14:33PM -0000, Dennis Sweeney wrote:
I think my confusion is about just how precise this sort of "reference implementation" should be. Should it behave with ``str`` and ``tuple`` subclasses exactly how it would when implemented? If so, I would expect the following to work:
I think that for the purposes of a relatively straight-forward PEP like this, you should start simple and only add complexity if needed to resolve questions. The Python implementation ought to show the desired semantics, not try to be an exact translation of the C code. Think of the Python equivalents in the itertools docs: https://docs.python.org/3/library/itertools.html See for example: https://www.python.org/dev/peps/pep-0584/#reference-implementation https://www.python.org/dev/peps/pep-0572/#appendix-b-rough-code-translations... You already state that the methods will show "roughly the following behavior", so there's no expectation that it will be precisely what the real methods do. Aim for clarity over emulation of unusual corner cases. The reference implementation is informative not prescriptive. -- Steven
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Mar 24, 2020 at 08:14:33PM -0000, Dennis Sweeney wrote:
Didn't we have a discussion about not mandating a copy when nothing changes? For strings, I'd just return `self`. It is only bytearray that requires a copy to be made.
I'd also remove the entire multiple substrings feature, for reasons I've already given. "Compatibility with startswith" is not a good reason to add this feature and you haven't established any good use-cases for it. A closer analog is str.replace(substring, ''), and after almost 30 years of real-world experience, that method still only takes a single substring, not a tuple. -- Steven
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Steven D'Aprano wrote:
It appears that in CPython, ``self[:] is self`` is true for base ``str`` objects, so I think ``return self[:]`` is consistent with (1) the premise that returning self is an implementation detail that is neither mandated nor forbidden, and (2) the premise that the methods should return base ``str`` objects even when called on ``str`` subclasses.
The ``test_concurrent_futures.py`` example seemed to be a good use case to me. I agree that it would be good to see how common that actually is though. But it seems to me that any alternative behavior, e.g. repeated removal, could be implemented by a user on top of the remove-only-the-first-found behavior or by fluently chaining multiple method calls. Maybe you're right that it's too complex, but I think it's at least worth discussing.
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
Dennis Sweeney wrote:
Steven D'Aprano wrote:
Dennis Sweeney wrote:
The Python interpreter in my head sees `self[:]` and returns a copy. A note that says a `str` is returned would be more useful than trying to exactly mirror internal details in the Python "roughly equivalent" code.
I agree with Steven -- a tuple of options is not necessary for the affix removal methods. -- ~Ethan~
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I'm removing the tuple feature from this PEP. So now, if I understand correctly, I don't think there's disagreement about behavior, just about how that behavior should be summarized in Python code. Ethan Furman wrote:
I think I'm still in the camp that ``return self[:]`` more precisely prescribes the desired behavior. It would feel strange to me to write ``return self`` and then say "but you don't actually have to return self, and in fact you shouldn't when working with subclasses". To me, it feels like return (the original object unchanged, or a copy of the object, depending on implementation details, but always make a copy when working with subclasses) is well-summarized by return self[:] especially if followed by the text Note that ``self[:]`` might not actually make a copy -- if the affix is empty or not found, and if ``type(self) is str``, then these methods may, but are not required to, make the optimization of returning ``self``. However, when called on instances of subclasses of ``str``, these methods should return base ``str`` objects, not ``self``. ...which is a necessary explanation regardless. Granted, ``return self[:]`` isn't perfect if ``__getitem__`` is overridden, but at the cost of three characters, the Python gains accuracy over both the optional nature of returning ``self`` in all cases and the impossibility (assuming no dunders are overridden) of returning self for subclasses. It also dissuades readers from relying on the behavior of returning self, which we're specifying is an implementation detail. Is that text explanation satisfactory?
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I've said a few times that I think it would be good if the behavior were defined /in terms of __getitem__/'s behavior. If the rough behavior is this: def removeprefix(self, prefix): if self.startswith(prefix): return self[len(prefix):] else: return self[:] Then you can shift all the guarantees about whether the subtype is str and whether it might return `self` when the prefix is missing onto the implementation of __getitem__. For CPython's implementation of str, `self[:]` returns `self`, so it's clearly true that __getitem__ is allowed to return `self` in some situations. Subclasses that do not override __getitem__ will return the str base class, and subclasses that /do/ overwrite __getitem__ can choose what they want to do. So someone could make their subclass do this: class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end is None: return self return type(self)(super().__getitem__(key)) They would then get "removeprefix" and "removesuffix" for free, with the desired semantics and optimizations. If we go with this approach (which again I think is much friendlier to subclassers), that obviates the problem of whether `self[:]` is a good summary of something that can return `self`: since "does the same thing as self[:]" /is/ the behavior it's trying to describe, there's no ambiguity. Best, Paul On 3/25/20 1:36 PM, Dennis Sweeney wrote:
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I was surprised by the following behavior: class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end: return self return type(self)(super().__getitem__(key)) my_foo = MyStr("foo") MY_FOO = MyStr("FOO") My_Foo = MyStr("Foo") empty = MyStr("") assert type(my_foo.casefold()) is str assert type(MY_FOO.capitalize()) is str assert type(my_foo.center(3)) is str assert type(my_foo.expandtabs()) is str assert type(my_foo.join(())) is str assert type(my_foo.ljust(3)) is str assert type(my_foo.lower()) is str assert type(my_foo.lstrip()) is str assert type(my_foo.replace("x", "y")) is str assert type(my_foo.split()[0]) is str assert type(my_foo.splitlines()[0]) is str assert type(my_foo.strip()) is str assert type(empty.swapcase()) is str assert type(My_Foo.title()) is str assert type(MY_FOO.upper()) is str assert type(my_foo.zfill(3)) is str assert type(my_foo.partition("z")[0]) is MyStr assert type(my_foo.format()) is MyStr I was under the impression that all of the ``str`` methods exclusively returned base ``str`` objects. Is there any reason why those two are different, and is there a reason that would apply to ``removeprefix`` and ``removesuffix`` as well?
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I imagine it's an implementation detail of which ones depend on __getitem__. The only methods that would be reasonably amenable to a guarantee like "always returns the same thing as __getitem__" would be (l|r|)strip(), split(), splitlines(), and .partition(), because they only work with subsets of the input string. Most of the other stuff involves constructing new strings and it's harder to cast them in terms of other "primitive operations" since strings are immutable. I suspect that to the extent that the ones that /could/ be implemented in terms of __getitem__ are returning base strings, it's either because no one thought about doing it at the time and they used another mechanism or it was a deliberate choice to be consistent with the other methods. I don't see removeprefix and removesuffix explicitly being implemented in terms of slicing operations as a huge win - you've demonstrated that someone who wants a persistent string subclass still would need to override a /lot/ of methods, so two more shouldn't hurt much - I just think that "consistent with most of the other methods" is a /particularly/ good reason to avoid explicitly defining these operations in terms of __getitem__. The /default/ semantics are the same (i.e. if you don't explicitly change the return type of __getitem__, it won't change the return type of the remove* methods), and the only difference is that for all the /other/ methods, it's an implementation detail whether they call __getitem__, whereas for the remove methods it would be explicitly documented. In my ideal world, a lot of these methods would be redefined in terms of a small set of primitives that people writing subclasses could implement as a protocol that would allow methods called on the functions to retain their class, but I think the time for that has passed. Still, I don't think it would /hurt/ for new methods to be defined in terms of what primitive operations exist where possible. Best, Paul On 3/25/20 3:09 PM, Dennis Sweeney wrote:
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I imagine it's an implementation detail of which ones depend on ``__getitem__``.
If we write class MyStr(str): def __getitem__(self, key): raise ZeroDivisionError() then all of the assertions from before still pass, so in fact *none* of the methods rely on ``__getitem__``. As of now ``str`` does not behave as an ABC at all. But it's an interesting proposal to essentially make it an ABC. Although it makes me curious what all of the different reasons people actually have for subclassing ``str``. All of the examples I found in the stdlib were either (1) contrived test cases (2) strings (e.g. grammar tokens) with some extra attributes along for the ride, or (3) string-based enums. None of types (2) or (3) ever overrode ``__getitem__``, so it doesn't feel like that common of a use case.
Making sure I understand: would you prefer the PEP to say ``return self`` rather than ``return self[:]``? I never had the intention of ``self[:]`` meaning "this must have exactly the behavior of ``self.__getitem__(slice(None, None))`` regardless of type", but I can understand if that's how you're saying it could be interpreted.
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
Dennis Sweeney wrote: -----------------------
Ethan Furman wrote: -------------------
The Python interpreter in my head sees self[:] and returns a copy.
Dennis Sweeney wrote: -----------------------
I don't understand that list bit -- surely, if I'm bothering to implement `removeprefix` and `removesuffix` in my subclass, I would also want to `return self` to keep my subclass? Why would I want to go through the extra overhead of either calling my own `__getitem__` method, or have the `str.__getitem__` method discard my subclass? However, if you are saying that `self[:]` *will* call `self.__class__.__getitem__` so my subclass only has to override `__getitem__` instead of `removeprefix` and `removesuffix`, that I can be happy with. -- ~Ethan~
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I should clarify: by "when working with subclasses" I meant "when str.removeprefix() is called on a subclass that does not override removeprefix", and in that case it should return a base str. I was not taking a stance on how the methods should be overridden, and I'm not sure there are many use cases where it should be.
I was only saying that the new methods should match 20 other methods in the str API by always returning a base str (the exceptions being format, format_map, and (r)partition for some reason). I did not mean to suggest that they should ever call user-supplied ``__getitem__`` code -- I don't think they need to. I haven't found anyone trying to use ``str`` as a mixin class/ABC, and it seems that this would be very difficult to do given that none of its methods currently rely on ``self.__class__.__getitem__``. If ``return self[:]`` in the PEP is too closely linked to "must call user-supplied ``__getitem__`` methods" for it not to be true, and so you're suggesting ``return self`` is more faithful, I can understand. So now if I understand the dilemma up to this point we have: Benefits of writing ``return self`` in the PEP: - Makes it clear that the optimization of not copying is allowed - Makes it clear that ``self.__class__.__getitem__`` isn't used Benefits of writing ``return self[:]`` in the PEP: - Makes it clear that returning self is an implementation detail - For subclasses not overriding ``__getitem__`` (the majority of cases), makes it clear that this method will return a base str like the other str methods. Did I miss anything? All the best, Dennis
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
First off, thank you for being so patient -- trying to champion a PEP can be exhausting. On 03/26/2020 05:22 PM, Dennis Sweeney wrote:
Ethan Furman wrote:
Okay.
Okay.
The only thing you missed is that, for me at least, points A, C, and D are not at all clear from the example code. If I wanted to be explicit about the return type being `str` I would write: return str(self) # subclasses are coerced to str -- ~Ethan~
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I appreciate the input and attention to detail! Using the ``str()`` constructor was sort of what I had thought originally, and that's why I had gone overboard with "casting" in one iteration of the sample code. When I realized that this isn't quite "casting" and that ``__str__`` can be overridden, I went even more overboard and suggested that ``str.__getitem__(self, ...)`` and ``str.__len__(self)`` could be written, which does have the behavior of effectively "casting", but looks nasty. Do you think that the following is a happy medium? def removeprefix(self: str, prefix: str, /) -> str: # coerce subclasses to str self_str = str(self) prefix_str = str(prefix) if self_str.startswith(prefix_str): return self_str[len(prefix_str):] else: return self_str def removesuffix(self: str, suffix: str, /) -> str: # coerce subclasses to str self_str = str(self) suffix_str = str(suffix) if suffix_str and self_str.endswith(suffix_str): return self_str[:-len(suffix_str)] else: return self_str Followed by the text: If ``type(self) is str`` (rather than a subclass) and if the given affix is empty or is not found, then these methods may, but are not required to, make the optimization of returning ``self``.
data:image/s3,"s3://crabby-images/05644/056443d02103b56fe1c656455ffee12aa1a01f1f" alt=""
On Wed, Mar 25, 2020 at 5:42 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
may, but are not required to, make the optimization of returning
Note that ``self[:]`` might not actually make a copy of ``self``. If the affix is empty or not found, and if ``type(self)`` is immutable, then these methods may, but are not required to, make the optimization of returning ``self``. ... [...]
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I was trying to start with the the intended behavior of the str class, then move on to generalizing to other classes, because I think completing a single example and *then* generalizing is an instructional style that's easier to digest, whereas intermixing all of the examples at once can get confused (can I call str.removeprefix(object(), 17)?). Is something missing that's not already there in the following sentence in the PEP? Although the methods on the immutable ``str`` and ``bytes`` types may make the aforementioned optimization of returning the original object, ``bytearray.removeprefix()`` and ``bytearray.removesuffix()`` should always return a copy, never the original object. Best, Dennis
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
How about just presenting pseudo code with the caveat that that's for the base str and bytes classes only, and then stipulating that for subclasses the return value is still a str/bytes/bytearray instance, and leaving it at that? After all the point of the Python code is to show what the C code should do in a way that's easy to grasp -- giving a Python implementation is not meant to constrain the C implementation to have *exactly* the same behavior in all corner cases (since that would lead to seriously contorted C code). On Fri, Mar 27, 2020 at 1:02 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I like how that would take the pressure off of the Python sample. How's something like this? Specification ============= The builtin ``str`` class will gain two new methods which will behave as follows when ``type(self) is str``:: def removeprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix): return self[len(prefix):] else: return self def removesuffix(self: str, suffix: str, /) -> str: if suffix and self.endswith(suffix): return self[:-len(suffix)] else: return self These methods, even when called on ``str`` subclasses, should always return base ``str`` objects. One should not rely on the behavior of ``self`` being returned (as in ``s.removesuffix('') is s``) -- this optimization should be considered an implementation detail. To test whether any affixes were removed during the call, one may use the constant-time behavior of comparing the lengths of the original and new strings:: >>> string = 'Python String Input' >>> new_string = string.removeprefix('Py') >>> modified = (len(string) != len(new_string)) >>> modified True One may also continue using ``startswith()`` and ``endswith()`` methods for control flow instead of testing the lengths as above. Note that without the check for the truthiness of ``suffix``, ``s.removesuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``. Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()`` will accept any bytes-like object as an argument. Although the methods on the immutable ``str`` and ``bytes`` types may make the aforementioned optimization of returning the original object, ``bytearray.removeprefix()`` and ``bytearray.removesuffix()`` should *always* return a copy, never the original object. The two methods will also be added to ``collections.UserString``, with similar behavior. My hesitation to write "return self" is resolved by saying that it should not be relied on, so I think this is a win. Best, Dennis
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Mar 27, 2020 at 1:55 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I'd suggest to drop the last sentence ("One should ... detail.") and instead write 'return self[:]' in the methods.
If I saw that in a code review I'd flag it for non-obviousness. One should use 'string != new_string' *unless* there is severe pressure to squeeze every nanosecond out of this particular code (and it better be inside an inner loop).
One may also continue using ``startswith()`` and ``endswith()`` methods for control flow instead of testing the lengths as above.
That's worse, in a sense, since "foofoobar".removeprefix("foo") returns "foobar" which still starts with "foo". Note that without the check for the truthiness of ``suffix``,
``s.removesuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``.
That's a good one (I started suggesting dropping that when I read this :-) but maybe it ought to go in a comment (and shorter -- at most one line).
This could also be simplified by writing 'return self[:]'.
Writing 'return self[:]' seems to say the same thing in fewer words though. :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I meant that startswith might be called before removeprefix, as it was in the ``deccheck.py`` example.
I thought that someone had suggested that such things go in the PEP, but since these are more stylistic considerations, I would be more than happy to trim it down to just The builtin ``str`` class will gain two new methods which will behave as follows when ``type(self) is type(prefix) is str``:: def removeprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix): return self[len(prefix):] else: return self[:] def removesuffix(self: str, suffix: str, /) -> str: # suffix='' should not call self[:-0]. if suffix and self.endswith(suffix): return self[:-len(suffix)] else: return self[:] These methods, even when called on ``str`` subclasses, should always return base ``str`` objects. Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()`` will accept any bytes-like object as an argument. The two methods will also be added to ``collections.UserString``, with similar behavior.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Mar 27, 2020 at 3:29 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
Not having read the full PEP, that wasn't clear to me. Sorry!
I'm sure someone did. But not every bit of feedback is worth acting upon, and sometimes a weird compromise is cooked up that addresses somebody's nit while making things less understandable for everyone else. I think this is one of those cases.
Excellent! -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Sat., 28 Mar. 2020, 8:39 am Guido van Rossum, <guido@python.org> wrote:
I think that may have been me in a tangent thread where folks were worried about O(N) checks on long strings. I know at least I temporarily forgot to account for string equality checks starting with a few O(1) checks to speed up common cases (IIRC: identity, length, first code point, last code point), which means explicitly calling len() is just as likely to slow things down as it is to speed them up. Cheers, Nick.
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 25/03/20 9:14 am, Dennis Sweeney wrote:
No, I don't think so. The purpose of a Python implementation of a proposed feature is to get the intended semantics across, not to reproduce all the quirks of an imagined C implementation. If you were to bake these details into a Python reference implementation, you would be implying that these are *intended* restrictions, which (unless I misunderstand) is not what you are intending. (Back when yield-fron was being designed, I described the intended semantics in prose, and gave an approximate Python equivalent, which went through several revisions as we thrashed out exactly how the feature should behave. But I don't think it ever exactly matched all the details of the actual implementation, nor was it intended to. The prose turned out to be much more readable, anway.:-) -- Greg
data:image/s3,"s3://crabby-images/8acff/8acff8df3a058787867f7329e81eaa107891f153" alt=""
On 24 Mar 2020, at 2:42, Steven D'Aprano wrote:
Actually I would like for other string methods to gain the ability to search for/chop off multiple substrings too. A `find()` that supports multiple search strings (and returns the leftmost position where a search string can be found) is a great help in implementing some kind of tokenizer: ```python def tokenize(source, delimiter): lastpos = 0 while True: pos = source.find(delimiter, lastpos) if pos == -1: token = source[lastpos:].strip() if token: yield token break else: token = source[lastpos:pos].strip() if token: yield token yield source[pos] lastpos = pos + 1 print(list(tokenize(" [ 1, 2, 3] ", ("[", ",", "]")))) ``` This would output `['[', '1', ',', '2', ',', '3', ']']` if `str.find()` supported multiple substring. Of course to be really usable `find()` would have to return **which** substring was found, which would make the API more complicated (and somewhat incompatible with the existing `find()`). But for `cutprefix()` (or whatever it's going to be called). I'm +1 on supporting multiple prefixes. For ambiguous cases, IMHO the most straight forward option would be to chop off the first prefix found.
[...]
Servus, Walter
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Walter Dörwald writes:
In other words, you want the equivalent of Emacs's "(search-forward (regexp-opt list-of-strings))", which also meets the requirement of returning which string was found (as "(match-string 0)"). Since Python already has a functionally similar API for regexps, we can add a regexp-opt (with appropriate name) method to re, perhaps as .compile_string_list(), and provide a convenience function re.search_string_list() for your application. I'm applying practicality before purity, of course. To some extent we want to encourage simple string approaches, and putting this in regex is not optimal for that. Steve
data:image/s3,"s3://crabby-images/8acff/8acff8df3a058787867f7329e81eaa107891f153" alt=""
On 25 Mar 2020, at 9:48, Stephen J. Turnbull wrote:
Sounds like it. I'm not familiar with Emacs.
If you're using regexps anyway, building the appropriate or-expression shouldn't be a problem. I guess that's what most lexers/tokenizers do anyway.
Exactly. I'm always a bit hesitant when using regexps, if there's a simpler string approach.
Steve
Servus, Walter
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Hi Dennis, Thanks for the updated PEP, it looks way better! I love the ability to pass a tuple of strings ;-) -- The behavior of tuple containing an empty string is a little bit surprising. cutsuffix("Hello World", ("", " World")) returns "Hello World", whereas cutsuffix("Hello World", (" World", "")) returns "Hello". cutprefix() has a the same behavior: the first empty strings stops the loop and returns the string unchanged. I would prefer to raise ValueError("empty separator") to avoid any risk of confusion. I'm not sure that str.cutprefix("") or str.cutsuffix("") does make any sense. "abc".startswith("") and "abc".startswith(("", "a")) are true, but that's fine since startswith() doesn't modify the string. Moreover, we cannot change the behavior now :-) But for new methods, we can try to design them correctly to avoid any risk of confusion. -- It reminds me https://bugs.python.org/issue28029: "".replace("", s, n) now returns s instead of an empty string for all non-zero n. The behavior changes in Python 3.9. There are also discussions about "abc".split("") and re.compile("").split("abc"). str.split() raises ValueError("empty separator") whereas re.split returns ['', 'a', 'b', 'c', ''] which can be (IMO) surprising. See also https://bugs.python.org/issue28937 "str.split(): allow removing empty strings (when sep is not None)". Note: on the other wise, str.strip("") is accepted and returns the string unmodified. But this method doesn't accept a tuple of substrings. It's different than cutprefix/cutsuffix. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Tue, 24 Mar 2020 19:14:16 +0100 Victor Stinner <vstinner@python.org> wrote: []
str.cutprefix("")/str.cutsuffix("") definitely makes sense, e.g.: === config.something === # If you'd like to remove some prefix from your lines, set it here REMOVE_PREFIX = "" ====== === src.py === ... line = line.cutprefix(config.REMOVE_PREFIX) ... ====== Now one may ask whether str.cutprefix(("", "nonempty")) makes sense. A response can be "the more complex functionality, the more complex and confusing corner cases there're to handle". [] -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Tue, 24 Mar 2020 22:51:55 +0100 Victor Stinner <vstinner@python.org> wrote:
Or even just: if line.startswith(config.REMOVE_PREFIX): line = line[len(config.REMOVE_PREFIX):] But the point taken - indeed, any confusing, inconsistent behavior can be fixed on users' side with more if's, once they discover it. -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Mar 24, 2020 at 07:14:16PM +0100, Victor Stinner wrote:
They make as much sense as any other null-operation, such as subtracting 0 or deleting empty slices from lists. Every string s is unchanged if you prepend or concatenate the empty string: assert s == ''+s == s+'' so removing the empty string should obey the same invariant: assert s == s.removeprefix('') == s.removesuffix('') -- Steven
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
It seems that there is a consensus on the names ``removeprefix`` and ``removesuffix``. I will update the PEP accordingly. I'll also simplify sample Python implementation to primarily reflect *intent* over strict type-checking correctness, and I'll adjust the accompanying commentary accordingly. Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP? All the best, Dennis
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 3/24/2020 7:21 PM, Dennis Sweeney wrote:
It seems that there is a consensus on the names ``removeprefix`` and ``removesuffix``. I will update the PEP accordingly. I'll also simplify sample Python implementation to primarily reflect *intent* over strict type-checking correctness, and I'll adjust the accompanying commentary accordingly.
Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP?
No objection. I think that's a good idea. Eric
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Le mer. 25 mars 2020 à 00:29, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP?
name.cutsuffix(('Mixin', 'Tests', 'Test')) is used in the "Motivating examples from the Python standard library" section. It looks like a nice usage of this feature. You added "There were many other such examples in the stdlib." What do you mean by controversial? I proposed to raise an empty if the prefix/suffix is empty to make cutsuffix(("", "suffix")) less surprising. But I'm also fine if you keep this behavior, since startswith/endswith accepts an empty string, and someone wrote that accepting an empty prefix/suffix is an useful feature. Or did someone write that cutprefix/cutsuffix must not accept a tuple of strings? (I'm not sure that I was able to read carefully all emails.) I like the ability to pass multiple prefixes and suffixes because it makes the method similar to lstrip(), rstrip(), strip(), startswith(), endswith() with all accepts multiple "values" (characters to remove, prefixes, suffixes). Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
There were at least two comments suggesting keeping it to one affix at a time: https://mail.python.org/archives/list/python-dev@python.org/message/GPXSIDLK... https://mail.python.org/archives/list/python-dev@python.org/message/EDWFPEGQ... But I didn't see any big objections to the rest of the PEP, so I think maybe we keep it restricted for now.
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Thanks for the pointers to emails. Ethan Furman: "This is why replace() still only takes a single substring to match and this isn't supported: (...)" Hum ok, it makes sense. I agree that we can start with only accepting str (reject tuple), and maybe reconsider the idea of accepting a tuple of str later. Please move the idea in Rejected Ideas, but try also to summarize the reasons why the idea was rejected. I saw: * surprising result for empty prefix/suffix * surprising result for "FooBar text".cutprefix(("Foo", "FooBar")) * issue with unordered sequence like set: only accept tuple which is ordered * str.replace() only accepts str.replace(str, str) to avoid these issues: the idea of accepting str.replace(tuple of str, str) or variant was rejected multiple times. XXX does someone have references to past discussions? I found https://bugs.python.org/issue33647 which is a little bit different. You may mention re.sub() as an existing efficient solution for the complex cases. I have to confess that I had to think twice when I wrote my example line.cutsuffix(("\r\n", "\r", "\n")). Did I write suffixes in the correct order to get what I expect? :-) "\r\n" starts with "\r". Victor Le mer. 25 mars 2020 à 01:44, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Wed, 25 Mar 2020 at 00:42, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
That sounds like a good idea. The issue for me is how the function should behave with a list of affixes if one is a prefix of another, e.g.,removeprefix(('Test', 'Tests')). The empty string case is just one form of that. The behaviour should be defined clearly, and while I imagine "always remove the longest" is the "obvious" sensible choice, I am fairly certain there will be other opinions :-) So deferring the decision for now until we have more experience with the single-affix form seems perfectly reasonable. I'm not even sure that switching to multiple affixes later would need a PEP - it might be fine to add via a simple feature request issue. But that can be a decision for later, too. Paul
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 25Mar2020 08:14, Paul Moore <p.f.moore@gmail.com> wrote:
I'd like to preface this with "I'm fine to implement multiple affixes later, if at all". That said: To me "first match" is the _only_ sensible choice. "longest match" can always be implemented with a "first match" function by sorting on length if desired. Also, "longest first" requires the implementation to do a prescan of the supplied affixes whereas "first match" lets the implementation just iterate over the choices as supplied. I'm beginning to think I must again threaten my partner's anecdote about Netscape Proxy's rule system, which prioritised rules by the lexical length of their regexp, not their config file order of appearance. That way lies (and, indeeed, lay) madness. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
PEP 616 -- String methods to remove prefixes and suffixes is available here: https://www.python.org/dev/peps/pep-0616/ Changes: - Only accept single affixes, not tuples - Make the specification more concise - Make fewer stylistic prescriptions for usage - Fix typos A reference implementation GitHub PR is up to date here: https://github.com/python/cpython/pull/18939 Are there any more comments for it before submission?
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
What do you think of adding a Version History section which lists most important changes since your proposed the first version of the PEP? I recall: * Version 3: don't accept tuple * Version 2: Rename cutprefix/cutsuffix to removeprefix/removesuffix, accept tuple * Version 1: initial version For example, for my PEP 587, I wrote detailed changes, but I don't think that you should go into the details ;-) https://www.python.org/dev/peps/pep-0587/#version-history Victor Le sam. 28 mars 2020 à 06:11, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
My intent is to help people like me to follow the discussion on the PEP. There are more than 100 messages, it's hard to follow PEP updates. Victor Le dim. 29 mars 2020 à 14:55, Rob Cliffe via Python-Dev <python-dev@python.org> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Hello all, It seems that most of the discussion has settled down, but I didn't quite understand from reading PEP 1 what the next step should be -- is this an appropriate time to open an issue on the Steering Council GitHub repository requesting pronouncement on PEP 616? Best, Dennis
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
I suggest you to wait one more week to let other people comment the PEP. After this delay, if you consider that the PEP is ready for pronouncement, you can submit it to the Steering Council, right. Victor Le mer. 1 avr. 2020 à 21:56, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Thu., 2 Apr. 2020, 8:30 am Victor Stinner, <vstinner@python.org> wrote:
Note that the submission to the Steering Council doesn't have to be a request for immediate pronouncement - it's a notification that the PEP is mature enough for the Council to decide whether to appoint a Council member as BDFL-Delegate or to appoint someone else. The decision on whether to wait for more questions is then up to the Council and/or the appointed BDFL-Delegate. PEP 616 definitely looks mature enough for that step to me (and potentially even immediately accepted - it did get dissected pretty thoroughly, after all!) Cheers, Nick.
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 03/20/2020 11:52 AM, Dennis Sweeney wrote:
Thank you, Dennis, for putting this together! And Eric for sponsoring. :) Overall I think it's a good idea, but...
Um, what mnemonic? I am strongly opposed to the chosen names of `cut*` -- these methods do basically the same thing as the existing `strip` methods (remove something from either end of a string), and so should have similar names: - the existence of `stripsuffix` is a clue/reminder that `strip` doesn't work with substrings - if all of these similar methods have similar names they will be grouped together in the documentation making discovery of the correct one much easier. So for this iteration of the PEP, I am -1 -- ~Ethan~
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Thanks for the feedback! I meant mnemonic as in the broader sense of "way of remembering things", not some kind of rhyming device or acronym. Maybe "mnemonic" isn't the perfect word. I was just trying to say that the structure of how the methods are named should how their behavior relates to one another, which it seems you agree with. Fair enough that ``[l/r]strip`` and the proposed methods share the behavior of "removing something from the end of a string". From that perspective, they're similar. But my thought was that ``s.lstrip("abc")`` has extremely similar behavior when changing "lstrip" to "rstrip" or "strip" -- the argument is interpreted in the exactly same way (as a character set) in each case. Looking at how the argument is used, I'd argue that ``lstrip``/``rstrip``/``strip`` are much more similar to each other than they are to the proposed methods, and that the proposed methods are perhaps more similar to something like ``str.replace``. But it does seem pretty subjective what the threshold is for behavior similar enough to have related names -- I see where you're coming from. Also, the docs at ( https://docs.python.org/3/library/stdtypes.html?highlight=lstrip#string-meth... ) are alphabetical, not grouped by "similar names", so even ``lstrip``, ``strip``, and ``rstrip`` are already in different places. Maybe the name "stripprefix" would be more discoverable when "Ctrl-f"ing the docs, if it weren't for the following addition in the linked PR: .. method:: str.lstrip([chars]) Return a copy of the string with leading characters removed. The *chars* argument is a string specifying the set of characters to be removed. If omitted or ``None``, the *chars* argument defaults to removing whitespace. The *chars* argument is not a prefix; rather, all combinations of its values are stripped:: >>> ' spacious '.lstrip() 'spacious ' >>> 'www.example.com'.lstrip('cmowz.') 'example.com' + See :meth:`str.cutprefix` for a method that will remove a single prefix + string rather than all of a set of characters.
data:image/s3,"s3://crabby-images/5d1a9/5d1a957dd9ea9fe19db91fcae4b9621af102dd72" alt=""
On Fri, 20 Mar 2020 20:49:12 -0000 "Dennis Sweeney" <sweeney.dennis650@gmail.com> wrote:
Correct, but I don't like the word "cut" because it suggests that something is cut into pieces which can be used later separately. I'd propose to use "trim" instead of "cut" because it makes clear that something is cut off and discarded, and it is clearly different from "strip".
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 21Mar2020 14:17, musbur@posteo.org <musbur@posteo.org> wrote:
Please, NO. "trim" is a VERY well known PHP function, and does what our strip does. I've very against this (otherwise fine) word for this reason. I still prefer "cut", though the consensus seems to be for "strip". Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/b8491/b8491be6c910fecbef774491deda81cc5f10ed6d" alt=""
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
For clarity, I'll change If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. to If ``s`` does not have ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns ``s`` or an unchanged copy of ``s``. For consistency with the Specification section, I'll also change s[len(pre):] if s.startswith(pre) else s to s[len(pre):] if s.startswith(pre) else s[:] and similarly change the ``cutsuffix`` snippet.
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so. This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be. So one writes (assuming methods): prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also I am explicitly in favour of returning self if unchanged. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not. Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question. Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix". Eric
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings. Victor Le sam. 21 mars 2020 à 17:23, Eric V. Smith <eric@trueblade.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Good point. And I guess it's still a problem for interned strings, since even a copy could be the same object:
So I now agree with Ned, we shouldn't be prescriptive here, and we should explicitly say in the PEP that there's no way to tell if the strip/cut/whatever took place, other than comparing via equality, not identity. Eric
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
In that case, the PEP should advice to use .startwith() or .endswith() explicitly if the caller requires to know if the string is going to be modified. Example: modified = False # O(n) complexity where n=len("prefix:") if line.startswith("prefix:"): line = line.cutprefix("prefix: ") modified = True It should be more efficient than: old_line = line line = line.cutprefix("prefix: ") modified = (line != old_line) # O(n) complexity where n=len(line) since the checked prefix is usually way shorter than the whole string. Victor Le sam. 21 mars 2020 à 17:45, Eric V. Smith <eric@trueblade.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
Are you suggesting that it could become impossible to write this function: def myself(o): return o and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language.
Unless Victor asserts that a function like myself() above cannot be relied on to have its return value "is" its passed in value, I disagree. The beauty of returning the original object on no change is that the test is O(1) and the criterion is clear. It is easy to document that stripping an empty affix returns the original string. I guess a test for len(stripped_string) == len(unstripped_string) is also O(1), and is less prescriptive. I just don't see the weight to Ned's characterisation of "a is/is-not b" as overly prescriptive; returning the same reference as one is given seems nearly the easiest thing a function can ever do. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
Other way around - because strings are immutable, their identity isn't supposed to matter, so it's possible that functions that currently return the exact same object in some cases may in the future start returning a different object with the same value. Right now, in CPython, with no tagged pointers, we return the full existing pointer wherever we can, as that saves us a data copy. With tagged pointers, the pointer storage effectively *is* the instance, so you can't really replicate that existing "copy the reference not the storage" behaviour any more. That said, it's also possible that identity for tagged pointers would be value based (similar to the effect of the small integer cache and string interning), in which case the entire question would become moot. Either way, the PEP shouldn't be specifying that a new object *must* be returned, and it also shouldn't be specifying that the same object *can't* be returned. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I don't see any rationale in the PEP or in the python-ideas thread (admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass" there). Is this just for consistency with other methods like .casefold? I can understand why you'd want it to be consistent, but I think it's misguided in this case. It adds unnecessary complexity for subclass implementers to need to re-implement these two additional methods, and I can see no obvious reason why this behavior would be necessary, since these methods can be implemented in terms of string slicing. Even if you wanted to use `str`-specific optimizations in C that aren't available if you are constrained to use the subclass's __getitem__, it's inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast path" that doesn't use slice. I think defining this in terms of string slicing makes the most sense (and, notably, slice itself returns `str` unless explicitly overridden, the default is for it to return `str` anyway...). Either way, it would be nice to see the rationale included in the PEP somewhere. Best, Paul On 3/22/20 7:16 AM, Eric V. Smith wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
tl; dr A method implemented in C is more efficient than hand-written pure-Python code, and it's less error-prone I don't think if it has already been said previously, but I hate having to compute manually the string length when writing: if line.startswith("prefix"): line = line[6:] Usually what I do is to open a Python REPL and I type: len("prefix") and copy-paste the result :-) Passing directly the length is a risk of mistake. What if I write line[7:] and it works most of the time because of a space, but sometimes the space is omitted randomly and the application fails? -- The lazy approach is: if line.startswith("prefix"): line = line[len("prefix"):] Such code makes my "micro-optimizer hearth" bleeding since I know that Python is stupid and calls len() at runtime, the compiler is unable to optimize it (sadly for good reasons, len name can be overriden) :-( => line.cutprefix("prefix") is more efficient! ;-) It's also also shorter. Victor Le dim. 22 mars 2020 à 17:02, Paul Ganssle <paul@ganssle.io> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
Sorry, I think I accidentally left out a clause here - I meant that the rationale for /always returning a 'str'/ (as opposed to returning a subclass) is missing, it just says in the PEP:
I think the rationale for these differences is not made entirely clear, specifically the "and will cast subclasses of str to builtin str objects" part. I think it would be best to define the truncation in terms of __getitem__ - possibly with the caveat that implementations are allowed (but not required) to return `self` unchanged if no match is found. Best, Paul P.S. Dennis - just noticed in this reply that there is a typo in the PEP - s/instace/instance On 3/22/20 12:15 PM, Victor Stinner wrote:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sun, Mar 22, 2020 at 4:20 AM Eric V. Smith <eric@trueblade.com> wrote:
Yes. Returning self if the class is exactly str is *just* an optimization -- it must not be mandated nor ruled out. And we *have* to decide that it returns a plain str instance if called on a subclass instance (unless overridden, of course) since the base class (str) won't know the signature of the subclass constructor. That's also why all other str methods return an instance of plain str when called on a subclass instance. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
My suggestion is to rely on __getitem__ here (for subclasses), in which case we don't actually need to know the subclass constructor. The rough implementation in the PEP shows how to do it without needing to know the subclass constructor: def redbikeshed(self, prefix): if self.startswith(pre): return self[len(pre):] return self[:] The actual implementation doesn't need to be implemented that way, as long as the result is always there result of slicing the original string, it's safe to do so* and more convenient for subclass implementers (who now only have to implement __getitem__ to get the affix-trimming functions for free). One downside to this scheme is that I think it makes getting the type hinting right more complicated, since the return type of these functions is basically, "Whatever the return type of self.__getitem__ is", but I don't think anyone will complain if you write -> str with the understanding that __getitem__ should return a str or a subtype thereof. Best, Paul *Assuming they haven't messed with __getitem__ to do something non-standard, but if they've done that I think they've tossed Liskov substitution out the window and will have to re-implement these methods if they want them to work. On 3/22/20 2:03 PM, Guido van Rossum wrote:
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 21/03/2020 16:15, Eric V. Smith wrote:
*If* no python implementation would have a problem with the "is" test (and from a position of total ignorance I would guess that this is the case :-)), then it would be a useful feature and it is easier to define it now than try to force conformance later. I have no problem with 's.startswith("") == True and s.cutprefix("") is s'. YMMV. Rob Cliffe
data:image/s3,"s3://crabby-images/25c1c/25c1c3af6a72513b68fa05e2e58c268428e42e0d" alt=""
On 3/21/20 12:51 PM, Rob Cliffe via Python-Dev wrote:
Why take on that "*If*" conditional? We're constantly telling people not to compare strings with "is". So why define how "is" will behave in this PEP? It's the implementation's decision whether to return a new immutable object with the same value, or the same object. As Steven points out elsewhere in this thread, Python's builtins' behavior differ, across methods and versions, in this regard. I certainly didn't know that, and it was probably news to you as well. So why do we need to nail it down for suffixes and prefixes? There will be no conformance to force later, because if the value doesn't change, then it doesn't matter whether it's a new string or the same string. --Ned.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sat, Mar 21, 2020 at 12:15:21PM -0400, Eric V. Smith wrote:
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
I agree with Ned -- whether the string object is returned unchanged or a copy is an implementation decision, not a language decision. [Eric]
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing.
We don't make this guarantee about string identity for any other string method, and CPython's behaviour varies from method to method: py> s = 'a b c' py> s is s.strip() True py> s is s.lower() False and version to version: py> s is s.replace('a', 'a') # 2.7 False py> s is s.replace('a', 'a') # 3.5 True I've never seen anyone relying on this behaviour, and I don't expect these new methods will change that. Thinking that `is` is another way of writing `==`, yes, I see that frequently. But relying on object identity to see whether a new string was created by a method, no. If you want to know whether a prefix/suffix was removed, there's a more reliable way than identity and a cheaper way than O(N) equality. Just compare the length of the string before and after. If the lengths are the same, nothing was removed. -- Steven
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 22Mar2020 05:09, Steven D'Aprano <steve@pearwood.info> wrote:
Well, ok, expressed on this basis, colour me convinced. I'm not ok with not mandating that no change to the string returns an equal string (but, really, _only_ because i can do a test with len(), as I consider a test of content wildly excessive - potentially quite expensive - strings are not always short).
Aye. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Hi Dennis, Thanks for writing a proper PEP. It easier to review a specification than an implementation. Le ven. 20 mars 2020 à 20:00, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
It would be nice to describe the behavior of these methods in a short sentence here.
IMHO the abstract should stop here. You should move the above text in the Specification section. The abstract shouldn't go into details.
(...)
I'm not sure that I'm comfortable with not specifying if the method must return the string unmodified or return a copy if it doesn't start with the prefix. It can subtle causes: see the "Allow multiple prefixes" example which expects that it doesn't return a copy. Usually, PyPy does its best to mimick exactly CPython behavior anyway, since applications rely on CPython exact behavior (even if it's bad thing). Hopefully, Python 3.8 started to emit a SyntaxWarning when "is" operator is used to compare an object to a string (like: x is "abc"). I suggest to always require to return the unmodified string. Honestly, it's not hard to guarantee and implement this behavior in Python! IMHO you should also test if pre is non-empty just to make the intent more explicit. Note: please rename "pre" to "prefix". In short, I propose: def cutprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix) and prefix: return self[len(prefix):] else: return self I call startswith() before testing if pre is non-empty to inherit of startswith() input type validation. For example, "a".startswith(b'x') raises a TypeError. I also suggest to avoid/remove the duplicated "rough specification" of the abstract: "s[len(pre):] if s.startswith(pre) else s". Only one specification per PEP is enough ;-)
The two methods will also be added to ``collections.UserString``, where they rely on the implementation of the new ``str`` methods.
I don't think that mentioning "where they rely on the implementation of the new ``str`` methods" is worth it. The spec can leave this part to the implementation.
IMO there are too many examples. For example, refactor.py and c_annotations.py are more or less the same. Just keep refactor.py. Overall, 2 or 3 examples should be enough.
I like the ability to specify multiple prefixes or suffixes. If the order is an issue, only allow tuple and list types and you're done. I don't see how disallowing s.cutprefix(('Foo', 'FooBar')) but allowing s.cutprefix('Foo').cutprefix('FooBar') prevents any risk of mistake. I'm sure that there are many use cases for cutsuffix() accepting multiple suffixes. IMO it makes the method even more attractive and efficient. Example to remove newline suffix (Dos, Unix and macOS newlines): line.cutsuffix(("\r\n", "\n", "\r")). It's not ambitious: "\r\n" is tested first explicitly, then "\r".
Well, even if it's less efficient, I think that I would prefer to write: while s.endswith("\n"): s = s.cutsuffix("\n") ... especially because the specification doesn't (currently) require to return the string unmodified if it doesn't end with the suffix...
You may add that it makes cutprefix() and cutsuffix() methods consistent with the strip() functions family. "abc".strip() doesn't raise. startswith() and endswith() methods can be used to explicitly raise an exception if there is no match. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Hi Victor. I accidentally created a new thread, but I intended everything below as a response: Thanks for the review!
This still erroneously accepts tuples and and would return return str subclasses unchanged. If we want to make the Python be the spec with accuracy about type-checking, then perhaps we want: def cutprefix(self: str, prefix: str, /) -> str: if not isinstance(prefix, str): raise TypeError(f'cutprefix() argument must be str, ' f'not {type(prefix).__qualname__}') self = str(self) prefix = str(prefix) if self.startswith(prefix): return self[len(prefix):] else: return self For accepting multiple prefixes, I can't tell if there's a consensus about whether ``s = s.cutprefix("a", "b", "c")`` should be the same as for prefix in ["a", "b", "c"]: s = s.cutprefix(prefix) or for prefix in ["a", "b", "c"]: if s.startwith(prefix): s = s.cutprefix(prefix) break The latter seems to be harder for users to implement through other means, and it's the behavior that test_concurrent_futures.py has implemented now, so maybe that's what we want. Also, it seems more elegant to me to accept variadic arguments, rather than a single tuple of arguments. Is it worth it to match the related-but-not-the-same API of "startswith" if it makes for uglier Python? My gut reaction is to prefer the varargs, but maybe someone has a different perspective. I can submit a revision to the PEP with some changes soon.
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Le dim. 22 mars 2020 à 01:45, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
I expect that "FooBar".cutprefix(("Foo", "Bar")) returns "Bar". IMO it's consistent with "FooFoo".cutprefix("Foo") which only returns "Foo" and not "": https://www.python.org/dev/peps/pep-0616/#remove-multiple-copies-of-a-prefix If you want to remove both prefixes, "FooBar".cutprefix("Foo").cutprefix("Bar") should be called to get "".
I suggest to accept a tuple of strings: str.cutprefix(("prefix1", "prefix2")) To be consistent with startswith(): str.startswith(("prefix1", "prefix2")) cutprefix() and startswith() can be used together and so I would prefer to have the same API: prefixes = ("context: ", "ctx:") has_prefix = False if line.startswith(prefixes): line = line.cutprefix(prefixes) has_prefix = True A different API would look more surprising, no? Compare it to: prefixes = ("context: ", "ctx:") has_prefix = False if line.startswith(prefixes): line = line.cutprefix(*prefixes) # <== HERE has_prefix = True The difference is even more visible is you pass directly the prefixes: .cutprefix("context: ", "ctx:") vs .cutprefix(("context: ", "ctx:")) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/6fc99/6fc992c2163c5d797b9e90e26d1ea9688535e4ee" alt=""
On Fri, Mar 20, 2020 at 3:28 PM Victor Stinner <vstinner@python.org> wrote:
I tend to be mistrustful of code that tries to guess the best thing to do, when something expected isn't found. How about: def cutprefix(self: str, pre: str, raise_on_no_match: bool=False, /) -> str: if self.startswith(pre): return self[len(pre):] if raise_on_no_match: raise ValueError('prefix not found') return self[:]
data:image/s3,"s3://crabby-images/abc12/abc12520d7ab3316ea400a00f51f03e9133f9fe1" alt=""
On 23/03/2020 14:50, Dan Stromberg wrote:
I'm firmly of the opinion that the functions should either raise or not, and should definitely not have a parameter to switch behaviours. Probably it should do nothing; if the programmer needs to know that the prefix wasn't there, cutprefix() probably wasn't the right thing to use anyway. -- Rhodri James *-* Kynesim Ltd
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith. -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
Prefix and suffix aren't jargon. They teach those words to kids in primary school. Why the concern over "jargon"? We happily talk about exception, metaclass, thread, process, CPU, gigabyte, async, ethernet, socket, hexadecimal, iterator, class, instance, HTTP, boolean, etc without blinking, but you're shying at prefix and suffix? -- Steven
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Even then, it seems that prefix is an established computer science term: [1] https://en.wikipedia.org/wiki/Substring#Prefix [2] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). Chapter 15.4: Longest common subsequence And a quick search reveals that it's used hundreds of times in the docs: https://docs.python.org/3/search.html?q=prefix
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Sat, Mar 21, 2020 at 11:35 AM Steven D'Aprano <steve@pearwood.info> wrote:
Whereas they don't have to teach "start" and "end", because kids already know them before they start school.
Yeah. Jargon is fine when there's no regular word with appropriate precision, but we shouldn't use jargon just for jargon's sake. Python has a long tradition of preferring regular words when possible, e.g. using not/and/or instead of !/&&/||, and startswith/endswith instead of hasprefix/hassuffix. -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Mar 22, 2020 at 1:02 PM Nathaniel Smith <njs@pobox.com> wrote:
Given that the word "prefix" appears in help("".startswith), I don't think there's really a lot to be gained by arguing this point :) There's absolutely nothing wrong with the word. But Dennis, welcome to the wonderful world of change proposals, where you will experience insane amounts of pushback and debate on the finest points of bikeshedding, whether or not people actually even support the proposal at all... ChrisA
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Lol -- thanks! In my mind, another reason that I like including the words "prefix" and "suffix" over "start" and "end" is that, even though using the verb "end" in "endswith" is unambiguous, the noun "end" can be used as either the initial or final end, as in "remove this thing from both ends of the string. So "suffix" feels more precise to me.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com> wrote:
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation) I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API. The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"? * uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes. We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string: s = s.replacestart('context.' , '') This approach would also very cleanly handle the last example from the PEP: s = s.replaceend(('Mixin', 'Tests', 'Test'), '') The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name. Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that): s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None Cheers, Nick.
data:image/s3,"s3://crabby-images/db629/db629be3404f4763b49bef32351c2f48b5904d7c" alt=""
Nick Coghlan wrote:
FWIW, I don't place as much value on being consistent with "startswith()" and "endswith()". But with it being substring based, I think the term "replace" actually makes a lot more sense here compared to "cut". +1 On Sat, Mar 21, 2020 at 9:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
To my language sense, hasprefix/hassuffix are horrible compared to startswith/endswith. If you were to talk about this kind of condition using English instead of Python, you wouldn't say "if x has prefix y", you'd say "if x starts with y". (I doubt any programming language uses hasPrefix or has_prefix for this, making it a strawman.) *But*, what would you say if you wanted to express the idea or removing something from the start or end? It's pretty verbose to say "remove y from the end of x", and it's not easy to translate that into a method name. x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which confuses the reader. The thing is that it's hard to translate "starts" (a verb) into a noun -- the "start" of something is its very beginning (i.e., in Python, position zero), while a "prefix" is a noun that specifically describes an initial substring (and I'm glad we don't have to use *that* :-).
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)
This feels like a hypergeneralization. In 99.9% of use cases we just need to remove the prefix or suffix. If you want to replace the suffix with something else, you can probably use string concatenation. (In the one use case I can think of, changing "foo.c" into "foo.o", it would make sense that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o" actually works better there.
This approach would also very cleanly handle the last example from the PEP:
s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
Maybe the proposed functions can optionally take a tuple of prefixes/suffixes, like startswith/endswith do?
Agreed on the second part, I just really don't like the 'ee'.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/16a69/16a6968453d03f176e5572028dae0140728f2a26" alt=""
On 22.03.2020 6:38, Guido van Rossum wrote:
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
-- Regards, Ivan
data:image/s3,"s3://crabby-images/db629/db629be3404f4763b49bef32351c2f48b5904d7c" alt=""
Ivan Pozdeez wrote:
In this case, being in line with the existing string API method names take priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines, etc. Although I agree that an underscore would probably be a bit easier to read here, it would be rather confusing to randomly swap between the naming convention for the same API. The benefit gained in *slightly *easier readability wouldn't make up for the headache IMO. On Sun, Mar 22, 2020 at 12:13 AM Ivan Pozdeev via Python-Dev < python-dev@python.org> wrote:
data:image/s3,"s3://crabby-images/db629/db629be3404f4763b49bef32351c2f48b5904d7c" alt=""
Oops, I just realized that I wrote "splitlines" twice there. I guess that goes to show how much I use that specific method in comparison to the others, but the point still stands. Here's a more comprehensive set of existing string methods to better demonstrate it (Python 3.8.2):
On Sun, Mar 22, 2020 at 12:17 AM Kyle Stanley <aeros167@gmail.com> wrote:
data:image/s3,"s3://crabby-images/f81c3/f81c349b494ddf4b2afda851969a1bfe75852ddf" alt=""
Nice PEP! That this discussion wound up in the NP-complete "naming things" territory as the main topic right from the start/prefix/beginning speaks highly of it. :) The only things left I have to add are (a) agreed on don't specify if it is a copy or not for str and bytes.. BUT (b) do specify that for bytearray. Being the only mutable type, it matters. Consistency with other bytearray methods based on https://docs.python.org/3/library/stdtypes.html#bytearray suggests copy. (Someone always wants inplace versions of bytearray methods, that is a separate topic not for this pep) Fwiw I *like* your cutprefix/suffix names. Avoiding the terms strip and trim is wise to avoid confusion and having the name read as nice English is Pythonic. I'm not going to vote on other suggestions. -gps On Sat, Mar 21, 2020, 9:32 PM Kyle Stanley <aeros167@gmail.com> wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Le dim. 22 mars 2020 à 06:07, Gregory P. Smith <greg@krypto.org> a écrit :
Nice PEP! That this discussion wound up in the NP-complete "naming things" territory as the main topic right from the start/prefix/beginning speaks highly of it. :)
Maybe we should have a rule to disallow bikeshedding until the foundations of a PEP are settled. Or always create two threads per PEP: one for bikeshedding only, one for otherthing else :-D Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sat, Mar 21, 2020 at 8:38 PM Guido van Rossum <guido@python.org> wrote:
Thinking a bit more, I could also get behind "removeprefix" and "removesuffix". -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I like "removeprefix" and "removesuffix". My only concern before had been length, but three more characters than "cut***fix" is a small price to pay for clarity.
data:image/s3,"s3://crabby-images/c4bfb/c4bfb7eb8049a34be5d35c8edd3c4971f86ed5f2" alt=""
On Sun, Mar 22, 2020 at 05:00:10AM -0000, Dennis Sweeney wrote:
I personally rely on auto-complete of my editor while writing. So, thinking about these these methods in "correct" terms might be more important to me that the length. +1 for removeprefix and removesuffix. Thanks, Senthil
data:image/s3,"s3://crabby-images/ba804/ba8041e10e98002f080f774cae147a628a117cbc" alt=""
On 2020-03-21 20:38, Guido van Rossum wrote:
To jump on the bikeshed, trimprefix and trimsuffix are the best I've read so far, due to the definitions of the words in English. Though often used interchangeably, when I think of "strip" I think of removing multiple things, somewhat indiscriminately with an arm motion, which is how the functions currently work. e.g. "strip paint", "strip clothes": https://www.dictionary.com/browse/strip to take away or remove When I think of trim, I think more of a single cut of higher precision with scissors. e.g. "trim hair", "trim branches": https://www.dictionary.com/browse/trim to put into a neat or orderly condition by clipping… Which is what this method would do. That trim matches Go is a small but decent benefit. Another person warned against inconsistency with PHP, but don't think PHP should be considered for design guidance, IMHO. Perhaps as an example of what not to do, which happily is in agreement with the above. -Mike p.s. +1, I do support this PEP, with or without name change, since some mentioned concern over that.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Is there a proven use case for anything other than the empty string as the replacement? I prefer your "replacewhatever" to another "stripwhatever" name, and I think it's clear and nicely fits the behavior you proposed. But should we allow a naming convenience to dictate that the behavior should be generalized to a use case we're not sure exists, where the same same argument is passed 99% of the time? I think a downside would be that a pass-a-string-or-a-tuple-of-strings interface would be more mental effort to keep track of than a ``*args`` variadic interface for "(cut/remove/without/trim)prefix", even if the former is how ``startswith()`` works.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Sun, 22 Mar 2020 at 14:01, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
Is there a proven use case for anything other than the empty string as the replacement? I prefer your "replacewhatever" to another "stripwhatever" name, and I think it's clear and nicely fits the behavior you proposed. But should we allow a naming convenience to dictate that the behavior should be generalized to a use case we're not sure exists, where the same same argument is passed 99% of the time?
I think so, as if we don't, then we'd end up with the following three methods on str objects (using Guido's suggested names of "removeprefix" and "removesuffix", as I genuinely like those): * replace() * removeprefix() * removesuffix() And the following questions still end up with relatively non-obvious answers: Q: How do I do a replace, but only at the start or end of the string? A: Use "new_prefix + s.removeprefix(old_prefix)" or "s.removesuffix(old_suffix) + new_suffix" Q: How do I remove a substring from anywhere in a string, rather than just from the start or end? A: Use "s.replace(substr, '')" Most of that objection would go away if the PEP added a plain old "remove()" method in addition to removeprefix() and removesuffix(), though - the "replace the substring with an empty string" trick isn't the most obvious spelling in the world, whereas I'd expect a lot folks to reach for "s.remove(substr)" based on the regular sequence API, and I think Guido's right that in many cases where a prefix or suffix is being changed, you also want to add it if the old prefix/suffix is missing (and in the cases where you don't then, then you can either use startswith()/endswith() first, or else check for a length change.
I think a downside would be that a pass-a-string-or-a-tuple-of-strings interface would be more mental effort to keep track of than a ``*args`` variadic interface for "(cut/remove/without/trim)prefix", even if the former is how ``startswith()`` works.
I doubt we'd use *args for any new string methods, precisely because we don't use it for any of the existing ones. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*". And if no consensus is reached in this thread for a name I would assume the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
data:image/s3,"s3://crabby-images/b8491/b8491be6c910fecbef774491deda81cc5f10ed6d" alt=""
On Tue, Mar 24, 2020 at 2:53 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I think name choice is easier if you write the documentation first: cutprefix - Removes the specified prefix. trimprefix - Removes the specified prefix. stripprefix - Removes the specified prefix. removeprefix - Removes the specified prefix. Duh. :)
data:image/s3,"s3://crabby-images/db629/db629be3404f4763b49bef32351c2f48b5904d7c" alt=""
I'm also most strongly in favor of "remove*" (out of the above options). I'm opposed to cut*, mainly because it's too ambiguous in comparison to other options such as "remove*" and "replace*", which would do a much better job of explaining the operation performed. Without the .NET conflict, I would normally be +1 on "trim*" as well; with it in mind though, I'd lower it down to +0. Personally, I don't consider a conflict in a different ecosystem enough to lower it down to -0, but it still has some influence on my preference. So far, the consensus seems to be in favor of "remove*" with several +1s and no arguments against it (as far as I can tell), whereas the other options have been rather controversial. On Tue, Mar 24, 2020 at 3:38 PM Steve Dower <steve.dower@python.org> wrote:
data:image/s3,"s3://crabby-images/f81c3/f81c349b494ddf4b2afda851969a1bfe75852ddf" alt=""
On Tue, Mar 24, 2020 at 11:55 AM Brett Cannon <brett@python.org> wrote:
"raymondLuxuryYacht*" pronounced Throatwobbler Mangrove it is! Never fear, the entire stdlib is full of naming inconsistencies and questionable choices accumulated over time. Whatever is chosen will be lost in the noise and people will happily use it. The original PEP mentioned that trim had a different use in PHP which is why I suggest avoiding that one. I don't know how much crossover there actually is between PHP and Python programmers these days outside of FB. -gps * https://montypython.fandom.com/wiki/Raymond_Luxury-Yacht _______________________________________________
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 24Mar2020 18:49, Brett Cannon <brett@python.org> wrote:
I reiterate my huge -1 on "trim" because it will confuse every PHP user who comes to us from the dark side. Over there "trim" means what our "strip" means. I've got (differing) opinions about the others, but "trim" is a big one to me. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/16a69/16a6968453d03f176e5572028dae0140728f2a26" alt=""
On 20.03.2020 21:52, Dennis Sweeney wrote:
Does it need to be separate methods? Can we augment or even replace *strip() instead? E.g. *strip(chars: str, line: str) -> str As written in the PEP preface, the very reason for the PEP is that people are continuously trying to use *strip methods for the suggested functionality -- which shows that this is where they are expecting to find it. (as a bonus, we'll be saved from bikeshedding debates over the names) --- Then, https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSU... suggests that the use of strip with character set argument may have fallen out of favor since its adoption. If that's the case, it can be deprecated in favor of the new use, thus saving us from extra complexity in perspective.
-- Regards, Ivan
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Mar 22, 2020 at 06:57:52AM +0300, Ivan Pozdeev via Python-Dev wrote:
Does it need to be separate methods?
Yes. Overloading a single method to do two dissimilar things is poor design.
They are only expecting to find it in strip() because there is no other alternative where it could be. There's nothing inherent about strip that means to delete a prefix or suffix, but when the only other choices are such obviously wrong methods as upper(), find(), replace(), count() etc it is easy to jump to the wrong conclusion that strip does what is wanted. -- Steven
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Ivan Pozdeev via Python-Dev writes:
That is true. However, the rule of thumb (due to Guido, IIRC) is if the parameter is normally going to be a literal constant, and there are few such constants (like <= 3), put them in the name of the function rather than as values for an optional parameter. Overloading doesn't save much, if any, typing in this case. That's why we have strip, rstrip, and lstrip in the first place, although nowadays we'd likely spell the modifiers out (and maybe use start/end rather than left/right, which I would guess force BIDI users to translate to start/end on the fly). Steve
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 22Mar2020 08:10, Ivan Pozdeev <vano@mail.mipt.ru> wrote:
That is not the only difference. strip() does not just remove a character from the set provided (as a str). It removes as many of them as there are; that is why "foo.ext".strip(".ext") can actually be quite misleading to someone looking for a suffix remover - it often looks like it did the right thing. By contrast, cutprefix/cutsuffix (or stripsuffix, whatever) remove only _one_ instance of the affix. To my mind they are quite different, which is the basis of my personal dislike of reusing the word "strip". Just extending "strip()" with a funky new affix mode would be even worse, since it can _still_ be misleading if the caller omited the special mode. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/fd43a/fd43a1cccdc1d153ee8e72a25e677f0751134ccc" alt=""
My 2c on the naming: 'start' and 'end' in 'startswith' and 'endswith' are verbs, whereas we're looking for a noun if we want to cut/strip/trim a string. You can use 'start' and 'end' as nouns for this case but 'prefix' and 'suffix' seems a more obvious choice in English to me. Pathlib has `with_suffix()` and `with_name()`, which would give us something like `without_prefix()` or `without_suffix()` in this case. I think the name "strip", and the default (no-argument) behaviour of stripping whitespace implies that the method is used to strip something down to its bare essentials, like stripping a bed of its covers. Usually you use strip() to remove whitespace and get to the real important data. I don't think such an implication holds for removing a *specific* prefix/suffix. I also don't much like "strip" as the semantics are quite different - if i'm understanding correctly, we're removing a *single* instance of a *single* *multi-character* string. A verb like "trim" or "cut" seems appropriate to highlight that difference. Barney On Fri, 20 Mar 2020 at 18:59, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Dennis: please add references to past discussions in python-ideas and python-dev. Link to the first email of each thread in these lists. Victor
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Here's an updated version. Online: https://www.python.org/dev/peps/pep-0616/ Source: https://raw.githubusercontent.com/python/peps/master/pep-0616.rst Changes: - More complete Python implementation to match what the type checking in the C implementation would be - Clarified that returning ``self`` is an optimization - Added links to past discussions on Python-Ideas and Python-Dev - Specified ability to accept a tuple of strings - Shorter abstract section and fewer stdlib examples - Mentioned - Typo and formatting fixes I didn't change the name because it didn't seem like there was a strong consensus for an alternative yet. I liked the suggestions of ``dropprefix`` or ``removeprefix``. All the best, Dennis
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 22/03/2020 22:25, Dennis Sweeney wrote:
Proofreading: it would not be obvious for users to have to call 'foobar'.cutprefix(('foo,)) for the common use case of a single prefix. Missing single quote after the last foo.
or the more obvious and readable alternative:
Er no, in both these examples s is reduced to an empty string. Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 22Mar2020 23:33, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
That surprises me too. I expect the first matching affix to be used. It is the only way for the caller to have a predictable policy. As a diversion, _are_ there use cases where an empty affix is useful or reasonable or likely? Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Cameron Simpson writes:
As a diversion, _are_ there use cases where an empty affix is useful or reasonable or likely?
In the "raise on failure" design, "aba".cutsuffix('.doc') raises, "aba".cutsuffix('.doc', '') returns "aba". BTW, since I'm here, thanks for your discussion of context managers for loop invariants. It was very enlightening.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Mar 22, 2020 at 10:25:28PM -0000, Dennis Sweeney wrote:
I am concerned about that tuple of strings feature. First, an implementation question: you do this when the prefix is a tuple: if isinstance(prefix, tuple): for option in tuple(prefix): if not isinstance(option, str): raise TypeError() option_str = str(option) which looks like two unnecessary copies: 1. Having confirmed that `prefix` is a tuple, you call tuple() to make a copy of it in order to iterate over it. Why? 2. Having confirmed that option is a string, you call str() on it to (potentially) make a copy. Why? Aside from those questions about the reference implementation, I am concerned about the feature itself. No other string method that returns a modified copy of the string takes a tuple of alternatives. * startswith and endswith do take a tuple of (pre/suff)ixes, but they don't return a modified copy; they just return a True or False flag; * replace does return a modified copy, and only takes a single substring at a time; * find/index/partition/split etc don't accept multiple substrings to search for. That makes startswith/endswith the unusual ones, and we should be conservative before emulating them. The difficulty here is that the notion of "cut one of these prefixes" is ambiguous if two or more of the prefixes match. It doesn't matter for startswith: "extraordinary".startswith(('ex', 'extra')) since it is True whether you match left-to-right, shortest-to-largest, or even in random order. But for cutprefix, which prefix should be deleted? Of course we can make a ruling by fiat, right now, and declare that it will cut the first matching prefix reading left to right, whether that's what users expect or not. That seems reasonable when your prefixes are hard-coded in the source, as above. But what happens here? prefixes = get_prefixes('user.config') result = mystring.cutprefix(prefixes) Whatever decision we make -- delete the shortest match, longest match, first match, last match -- we're going to surprise and annoy the people who expected one of the other behaviours. This is why replace() still only takes a single substring to match and this isn't supported: "extraordinary".replace(('ex', 'extra'), '') We ought to get some real-life exposure to the simple case first, before adding support for multiple prefixes/suffixes. -- Steven
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Steven D'Aprano wrote:
This was an attempt to ensure no one can do funny business with tuple or str subclassing. I was trying to emulate the ``PyTuple_Check`` followed by ``PyTuple_GET_SIZE`` and ``PyTuple_GET_ITEM`` that are done by the C implementation of ``str.startswith()`` to ensure that only the tuple/str methods are used, not arbitrary user subclass code. It seems that that's what most of the ``str`` methods force. I was mistaken in how to do this with pure Python. I believe I actually wanted something like: def cutprefix(self, prefix, /): if not isinstance(self, str): raise TypeError() if isinstance(prefix, tuple): for option in tuple.__iter__(prefix): if not isinstance(option, str): raise TypeError() if str.startswith(self, option): return str.__getitem__( self, slice(str.__len__(option), None)) return str.__getitem__(self, slice(None, None)) if not isinstance(prefix, str): raise TypeError() if str.startswith(self, prefix): return str.__getitem__(self, slice(str.__len__(prefix), None)) else: return str.__getitem__(self, slice(None, None)) ... which looks even uglier.
We ought to get some real-life exposure to the simple case first, before adding support for multiple prefixes/suffixes.
I could be (and have been) convinced either way about whether or not to generalize to tuples of strings. I thought Victor made a good point about compatibility with ``startswith()``
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 24/03/20 3:43 pm, Dennis Sweeney wrote:
The C code uses those functions for efficiency, not to prevent "funny business". PyTuple_GET_SIZE and PyTuple_GET_ITEM are macros that directly access fields of the tuple struct, and PyTuple_Check is much faster than a full isinstance check. There is no point in trying to emulate these in Python code. -- Greg
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I think my confusion is about just how precise this sort of "reference implementation" should be. Should it behave with ``str`` and ``tuple`` subclasses exactly how it would when implemented? If so, I would expect the following to work: class S(str): __len__ = __getitem__ = __iter__ = None class T(tuple): __len__ = __getitem__ = __iter__ = None x = str.cutprefix("FooBar", T(("a", S("Foo"), 17))) assert x == "Bar" assert type(x) is str and so I think the ``str.__getitem__(self, slice(str.__len__(prefix), None))`` monstrosity would be the most technically correct, unless I'm missing something. But I've never seen Python code so ugly. And I suppose this is a slippery slope -- should it also guard against people redefining ``len = lambda x: 5`` and ``str = list`` in the global scope? Clearly not. I think then maybe it would be preferred to use the something like the following in the PEP: def cutprefix(self, prefix, /): if isinstance(prefix, str): if self.startswith(prefix): return self[len(prefix):] return self[:] elif isinstance(prefix, tuple): for option in prefix: if self.startswith(option): return self[len(option):] return self[:] else: raise TypeError() def cutsuffix(self, suffix): if isinstance(suffix, str): if self.endswith(suffix): return self[:len(self)-len(suffix)] return self[:] elif isinstance(suffix, tuple): for option in suffix: if self.endswith(option): return self[:len(self)-len(option)] return self[:] else: raise TypeError() The above would fail the assertions as written before, but would pass them for subclasses ``class S(str): pass`` and ``class T(tuple): pass`` that do not override any dunder methods. Is this an acceptable compromise if it appears alongside a clarifying sentence like the following? These methods should always return base ``str`` objects, even when called on ``str`` subclasses. I'm looking for guidance as to whether that's an appropriate level of precision for a PEP. If so, I'll make that change. All the best, Dennis
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Mar 24, 2020 at 08:14:33PM -0000, Dennis Sweeney wrote:
I think my confusion is about just how precise this sort of "reference implementation" should be. Should it behave with ``str`` and ``tuple`` subclasses exactly how it would when implemented? If so, I would expect the following to work:
I think that for the purposes of a relatively straight-forward PEP like this, you should start simple and only add complexity if needed to resolve questions. The Python implementation ought to show the desired semantics, not try to be an exact translation of the C code. Think of the Python equivalents in the itertools docs: https://docs.python.org/3/library/itertools.html See for example: https://www.python.org/dev/peps/pep-0584/#reference-implementation https://www.python.org/dev/peps/pep-0572/#appendix-b-rough-code-translations... You already state that the methods will show "roughly the following behavior", so there's no expectation that it will be precisely what the real methods do. Aim for clarity over emulation of unusual corner cases. The reference implementation is informative not prescriptive. -- Steven
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Mar 24, 2020 at 08:14:33PM -0000, Dennis Sweeney wrote:
Didn't we have a discussion about not mandating a copy when nothing changes? For strings, I'd just return `self`. It is only bytearray that requires a copy to be made.
I'd also remove the entire multiple substrings feature, for reasons I've already given. "Compatibility with startswith" is not a good reason to add this feature and you haven't established any good use-cases for it. A closer analog is str.replace(substring, ''), and after almost 30 years of real-world experience, that method still only takes a single substring, not a tuple. -- Steven
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Steven D'Aprano wrote:
It appears that in CPython, ``self[:] is self`` is true for base ``str`` objects, so I think ``return self[:]`` is consistent with (1) the premise that returning self is an implementation detail that is neither mandated nor forbidden, and (2) the premise that the methods should return base ``str`` objects even when called on ``str`` subclasses.
The ``test_concurrent_futures.py`` example seemed to be a good use case to me. I agree that it would be good to see how common that actually is though. But it seems to me that any alternative behavior, e.g. repeated removal, could be implemented by a user on top of the remove-only-the-first-found behavior or by fluently chaining multiple method calls. Maybe you're right that it's too complex, but I think it's at least worth discussing.
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
Dennis Sweeney wrote:
Steven D'Aprano wrote:
Dennis Sweeney wrote:
The Python interpreter in my head sees `self[:]` and returns a copy. A note that says a `str` is returned would be more useful than trying to exactly mirror internal details in the Python "roughly equivalent" code.
I agree with Steven -- a tuple of options is not necessary for the affix removal methods. -- ~Ethan~
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I'm removing the tuple feature from this PEP. So now, if I understand correctly, I don't think there's disagreement about behavior, just about how that behavior should be summarized in Python code. Ethan Furman wrote:
I think I'm still in the camp that ``return self[:]`` more precisely prescribes the desired behavior. It would feel strange to me to write ``return self`` and then say "but you don't actually have to return self, and in fact you shouldn't when working with subclasses". To me, it feels like return (the original object unchanged, or a copy of the object, depending on implementation details, but always make a copy when working with subclasses) is well-summarized by return self[:] especially if followed by the text Note that ``self[:]`` might not actually make a copy -- if the affix is empty or not found, and if ``type(self) is str``, then these methods may, but are not required to, make the optimization of returning ``self``. However, when called on instances of subclasses of ``str``, these methods should return base ``str`` objects, not ``self``. ...which is a necessary explanation regardless. Granted, ``return self[:]`` isn't perfect if ``__getitem__`` is overridden, but at the cost of three characters, the Python gains accuracy over both the optional nature of returning ``self`` in all cases and the impossibility (assuming no dunders are overridden) of returning self for subclasses. It also dissuades readers from relying on the behavior of returning self, which we're specifying is an implementation detail. Is that text explanation satisfactory?
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I've said a few times that I think it would be good if the behavior were defined /in terms of __getitem__/'s behavior. If the rough behavior is this: def removeprefix(self, prefix): if self.startswith(prefix): return self[len(prefix):] else: return self[:] Then you can shift all the guarantees about whether the subtype is str and whether it might return `self` when the prefix is missing onto the implementation of __getitem__. For CPython's implementation of str, `self[:]` returns `self`, so it's clearly true that __getitem__ is allowed to return `self` in some situations. Subclasses that do not override __getitem__ will return the str base class, and subclasses that /do/ overwrite __getitem__ can choose what they want to do. So someone could make their subclass do this: class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end is None: return self return type(self)(super().__getitem__(key)) They would then get "removeprefix" and "removesuffix" for free, with the desired semantics and optimizations. If we go with this approach (which again I think is much friendlier to subclassers), that obviates the problem of whether `self[:]` is a good summary of something that can return `self`: since "does the same thing as self[:]" /is/ the behavior it's trying to describe, there's no ambiguity. Best, Paul On 3/25/20 1:36 PM, Dennis Sweeney wrote:
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I was surprised by the following behavior: class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end: return self return type(self)(super().__getitem__(key)) my_foo = MyStr("foo") MY_FOO = MyStr("FOO") My_Foo = MyStr("Foo") empty = MyStr("") assert type(my_foo.casefold()) is str assert type(MY_FOO.capitalize()) is str assert type(my_foo.center(3)) is str assert type(my_foo.expandtabs()) is str assert type(my_foo.join(())) is str assert type(my_foo.ljust(3)) is str assert type(my_foo.lower()) is str assert type(my_foo.lstrip()) is str assert type(my_foo.replace("x", "y")) is str assert type(my_foo.split()[0]) is str assert type(my_foo.splitlines()[0]) is str assert type(my_foo.strip()) is str assert type(empty.swapcase()) is str assert type(My_Foo.title()) is str assert type(MY_FOO.upper()) is str assert type(my_foo.zfill(3)) is str assert type(my_foo.partition("z")[0]) is MyStr assert type(my_foo.format()) is MyStr I was under the impression that all of the ``str`` methods exclusively returned base ``str`` objects. Is there any reason why those two are different, and is there a reason that would apply to ``removeprefix`` and ``removesuffix`` as well?
data:image/s3,"s3://crabby-images/edc98/edc9804a1e6f2ca62f3236419f69561516e5074d" alt=""
I imagine it's an implementation detail of which ones depend on __getitem__. The only methods that would be reasonably amenable to a guarantee like "always returns the same thing as __getitem__" would be (l|r|)strip(), split(), splitlines(), and .partition(), because they only work with subsets of the input string. Most of the other stuff involves constructing new strings and it's harder to cast them in terms of other "primitive operations" since strings are immutable. I suspect that to the extent that the ones that /could/ be implemented in terms of __getitem__ are returning base strings, it's either because no one thought about doing it at the time and they used another mechanism or it was a deliberate choice to be consistent with the other methods. I don't see removeprefix and removesuffix explicitly being implemented in terms of slicing operations as a huge win - you've demonstrated that someone who wants a persistent string subclass still would need to override a /lot/ of methods, so two more shouldn't hurt much - I just think that "consistent with most of the other methods" is a /particularly/ good reason to avoid explicitly defining these operations in terms of __getitem__. The /default/ semantics are the same (i.e. if you don't explicitly change the return type of __getitem__, it won't change the return type of the remove* methods), and the only difference is that for all the /other/ methods, it's an implementation detail whether they call __getitem__, whereas for the remove methods it would be explicitly documented. In my ideal world, a lot of these methods would be redefined in terms of a small set of primitives that people writing subclasses could implement as a protocol that would allow methods called on the functions to retain their class, but I think the time for that has passed. Still, I don't think it would /hurt/ for new methods to be defined in terms of what primitive operations exist where possible. Best, Paul On 3/25/20 3:09 PM, Dennis Sweeney wrote:
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I imagine it's an implementation detail of which ones depend on ``__getitem__``.
If we write class MyStr(str): def __getitem__(self, key): raise ZeroDivisionError() then all of the assertions from before still pass, so in fact *none* of the methods rely on ``__getitem__``. As of now ``str`` does not behave as an ABC at all. But it's an interesting proposal to essentially make it an ABC. Although it makes me curious what all of the different reasons people actually have for subclassing ``str``. All of the examples I found in the stdlib were either (1) contrived test cases (2) strings (e.g. grammar tokens) with some extra attributes along for the ride, or (3) string-based enums. None of types (2) or (3) ever overrode ``__getitem__``, so it doesn't feel like that common of a use case.
Making sure I understand: would you prefer the PEP to say ``return self`` rather than ``return self[:]``? I never had the intention of ``self[:]`` meaning "this must have exactly the behavior of ``self.__getitem__(slice(None, None))`` regardless of type", but I can understand if that's how you're saying it could be interpreted.
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
Dennis Sweeney wrote: -----------------------
Ethan Furman wrote: -------------------
The Python interpreter in my head sees self[:] and returns a copy.
Dennis Sweeney wrote: -----------------------
I don't understand that list bit -- surely, if I'm bothering to implement `removeprefix` and `removesuffix` in my subclass, I would also want to `return self` to keep my subclass? Why would I want to go through the extra overhead of either calling my own `__getitem__` method, or have the `str.__getitem__` method discard my subclass? However, if you are saying that `self[:]` *will* call `self.__class__.__getitem__` so my subclass only has to override `__getitem__` instead of `removeprefix` and `removesuffix`, that I can be happy with. -- ~Ethan~
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I should clarify: by "when working with subclasses" I meant "when str.removeprefix() is called on a subclass that does not override removeprefix", and in that case it should return a base str. I was not taking a stance on how the methods should be overridden, and I'm not sure there are many use cases where it should be.
I was only saying that the new methods should match 20 other methods in the str API by always returning a base str (the exceptions being format, format_map, and (r)partition for some reason). I did not mean to suggest that they should ever call user-supplied ``__getitem__`` code -- I don't think they need to. I haven't found anyone trying to use ``str`` as a mixin class/ABC, and it seems that this would be very difficult to do given that none of its methods currently rely on ``self.__class__.__getitem__``. If ``return self[:]`` in the PEP is too closely linked to "must call user-supplied ``__getitem__`` methods" for it not to be true, and so you're suggesting ``return self`` is more faithful, I can understand. So now if I understand the dilemma up to this point we have: Benefits of writing ``return self`` in the PEP: - Makes it clear that the optimization of not copying is allowed - Makes it clear that ``self.__class__.__getitem__`` isn't used Benefits of writing ``return self[:]`` in the PEP: - Makes it clear that returning self is an implementation detail - For subclasses not overriding ``__getitem__`` (the majority of cases), makes it clear that this method will return a base str like the other str methods. Did I miss anything? All the best, Dennis
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
First off, thank you for being so patient -- trying to champion a PEP can be exhausting. On 03/26/2020 05:22 PM, Dennis Sweeney wrote:
Ethan Furman wrote:
Okay.
Okay.
The only thing you missed is that, for me at least, points A, C, and D are not at all clear from the example code. If I wanted to be explicit about the return type being `str` I would write: return str(self) # subclasses are coerced to str -- ~Ethan~
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I appreciate the input and attention to detail! Using the ``str()`` constructor was sort of what I had thought originally, and that's why I had gone overboard with "casting" in one iteration of the sample code. When I realized that this isn't quite "casting" and that ``__str__`` can be overridden, I went even more overboard and suggested that ``str.__getitem__(self, ...)`` and ``str.__len__(self)`` could be written, which does have the behavior of effectively "casting", but looks nasty. Do you think that the following is a happy medium? def removeprefix(self: str, prefix: str, /) -> str: # coerce subclasses to str self_str = str(self) prefix_str = str(prefix) if self_str.startswith(prefix_str): return self_str[len(prefix_str):] else: return self_str def removesuffix(self: str, suffix: str, /) -> str: # coerce subclasses to str self_str = str(self) suffix_str = str(suffix) if suffix_str and self_str.endswith(suffix_str): return self_str[:-len(suffix_str)] else: return self_str Followed by the text: If ``type(self) is str`` (rather than a subclass) and if the given affix is empty or is not found, then these methods may, but are not required to, make the optimization of returning ``self``.
data:image/s3,"s3://crabby-images/05644/056443d02103b56fe1c656455ffee12aa1a01f1f" alt=""
On Wed, Mar 25, 2020 at 5:42 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
may, but are not required to, make the optimization of returning
Note that ``self[:]`` might not actually make a copy of ``self``. If the affix is empty or not found, and if ``type(self)`` is immutable, then these methods may, but are not required to, make the optimization of returning ``self``. ... [...]
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I was trying to start with the the intended behavior of the str class, then move on to generalizing to other classes, because I think completing a single example and *then* generalizing is an instructional style that's easier to digest, whereas intermixing all of the examples at once can get confused (can I call str.removeprefix(object(), 17)?). Is something missing that's not already there in the following sentence in the PEP? Although the methods on the immutable ``str`` and ``bytes`` types may make the aforementioned optimization of returning the original object, ``bytearray.removeprefix()`` and ``bytearray.removesuffix()`` should always return a copy, never the original object. Best, Dennis
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
How about just presenting pseudo code with the caveat that that's for the base str and bytes classes only, and then stipulating that for subclasses the return value is still a str/bytes/bytearray instance, and leaving it at that? After all the point of the Python code is to show what the C code should do in a way that's easy to grasp -- giving a Python implementation is not meant to constrain the C implementation to have *exactly* the same behavior in all corner cases (since that would lead to seriously contorted C code). On Fri, Mar 27, 2020 at 1:02 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I like how that would take the pressure off of the Python sample. How's something like this? Specification ============= The builtin ``str`` class will gain two new methods which will behave as follows when ``type(self) is str``:: def removeprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix): return self[len(prefix):] else: return self def removesuffix(self: str, suffix: str, /) -> str: if suffix and self.endswith(suffix): return self[:-len(suffix)] else: return self These methods, even when called on ``str`` subclasses, should always return base ``str`` objects. One should not rely on the behavior of ``self`` being returned (as in ``s.removesuffix('') is s``) -- this optimization should be considered an implementation detail. To test whether any affixes were removed during the call, one may use the constant-time behavior of comparing the lengths of the original and new strings:: >>> string = 'Python String Input' >>> new_string = string.removeprefix('Py') >>> modified = (len(string) != len(new_string)) >>> modified True One may also continue using ``startswith()`` and ``endswith()`` methods for control flow instead of testing the lengths as above. Note that without the check for the truthiness of ``suffix``, ``s.removesuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``. Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()`` will accept any bytes-like object as an argument. Although the methods on the immutable ``str`` and ``bytes`` types may make the aforementioned optimization of returning the original object, ``bytearray.removeprefix()`` and ``bytearray.removesuffix()`` should *always* return a copy, never the original object. The two methods will also be added to ``collections.UserString``, with similar behavior. My hesitation to write "return self" is resolved by saying that it should not be relied on, so I think this is a win. Best, Dennis
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Mar 27, 2020 at 1:55 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I'd suggest to drop the last sentence ("One should ... detail.") and instead write 'return self[:]' in the methods.
If I saw that in a code review I'd flag it for non-obviousness. One should use 'string != new_string' *unless* there is severe pressure to squeeze every nanosecond out of this particular code (and it better be inside an inner loop).
One may also continue using ``startswith()`` and ``endswith()`` methods for control flow instead of testing the lengths as above.
That's worse, in a sense, since "foofoobar".removeprefix("foo") returns "foobar" which still starts with "foo". Note that without the check for the truthiness of ``suffix``,
``s.removesuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``.
That's a good one (I started suggesting dropping that when I read this :-) but maybe it ought to go in a comment (and shorter -- at most one line).
This could also be simplified by writing 'return self[:]'.
Writing 'return self[:]' seems to say the same thing in fewer words though. :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
I meant that startswith might be called before removeprefix, as it was in the ``deccheck.py`` example.
I thought that someone had suggested that such things go in the PEP, but since these are more stylistic considerations, I would be more than happy to trim it down to just The builtin ``str`` class will gain two new methods which will behave as follows when ``type(self) is type(prefix) is str``:: def removeprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix): return self[len(prefix):] else: return self[:] def removesuffix(self: str, suffix: str, /) -> str: # suffix='' should not call self[:-0]. if suffix and self.endswith(suffix): return self[:-len(suffix)] else: return self[:] These methods, even when called on ``str`` subclasses, should always return base ``str`` objects. Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()`` will accept any bytes-like object as an argument. The two methods will also be added to ``collections.UserString``, with similar behavior.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Mar 27, 2020 at 3:29 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
Not having read the full PEP, that wasn't clear to me. Sorry!
I'm sure someone did. But not every bit of feedback is worth acting upon, and sometimes a weird compromise is cooked up that addresses somebody's nit while making things less understandable for everyone else. I think this is one of those cases.
Excellent! -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Sat., 28 Mar. 2020, 8:39 am Guido van Rossum, <guido@python.org> wrote:
I think that may have been me in a tangent thread where folks were worried about O(N) checks on long strings. I know at least I temporarily forgot to account for string equality checks starting with a few O(1) checks to speed up common cases (IIRC: identity, length, first code point, last code point), which means explicitly calling len() is just as likely to slow things down as it is to speed them up. Cheers, Nick.
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 25/03/20 9:14 am, Dennis Sweeney wrote:
No, I don't think so. The purpose of a Python implementation of a proposed feature is to get the intended semantics across, not to reproduce all the quirks of an imagined C implementation. If you were to bake these details into a Python reference implementation, you would be implying that these are *intended* restrictions, which (unless I misunderstand) is not what you are intending. (Back when yield-fron was being designed, I described the intended semantics in prose, and gave an approximate Python equivalent, which went through several revisions as we thrashed out exactly how the feature should behave. But I don't think it ever exactly matched all the details of the actual implementation, nor was it intended to. The prose turned out to be much more readable, anway.:-) -- Greg
data:image/s3,"s3://crabby-images/8acff/8acff8df3a058787867f7329e81eaa107891f153" alt=""
On 24 Mar 2020, at 2:42, Steven D'Aprano wrote:
Actually I would like for other string methods to gain the ability to search for/chop off multiple substrings too. A `find()` that supports multiple search strings (and returns the leftmost position where a search string can be found) is a great help in implementing some kind of tokenizer: ```python def tokenize(source, delimiter): lastpos = 0 while True: pos = source.find(delimiter, lastpos) if pos == -1: token = source[lastpos:].strip() if token: yield token break else: token = source[lastpos:pos].strip() if token: yield token yield source[pos] lastpos = pos + 1 print(list(tokenize(" [ 1, 2, 3] ", ("[", ",", "]")))) ``` This would output `['[', '1', ',', '2', ',', '3', ']']` if `str.find()` supported multiple substring. Of course to be really usable `find()` would have to return **which** substring was found, which would make the API more complicated (and somewhat incompatible with the existing `find()`). But for `cutprefix()` (or whatever it's going to be called). I'm +1 on supporting multiple prefixes. For ambiguous cases, IMHO the most straight forward option would be to chop off the first prefix found.
[...]
Servus, Walter
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Walter Dörwald writes:
In other words, you want the equivalent of Emacs's "(search-forward (regexp-opt list-of-strings))", which also meets the requirement of returning which string was found (as "(match-string 0)"). Since Python already has a functionally similar API for regexps, we can add a regexp-opt (with appropriate name) method to re, perhaps as .compile_string_list(), and provide a convenience function re.search_string_list() for your application. I'm applying practicality before purity, of course. To some extent we want to encourage simple string approaches, and putting this in regex is not optimal for that. Steve
data:image/s3,"s3://crabby-images/8acff/8acff8df3a058787867f7329e81eaa107891f153" alt=""
On 25 Mar 2020, at 9:48, Stephen J. Turnbull wrote:
Sounds like it. I'm not familiar with Emacs.
If you're using regexps anyway, building the appropriate or-expression shouldn't be a problem. I guess that's what most lexers/tokenizers do anyway.
Exactly. I'm always a bit hesitant when using regexps, if there's a simpler string approach.
Steve
Servus, Walter
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Hi Dennis, Thanks for the updated PEP, it looks way better! I love the ability to pass a tuple of strings ;-) -- The behavior of tuple containing an empty string is a little bit surprising. cutsuffix("Hello World", ("", " World")) returns "Hello World", whereas cutsuffix("Hello World", (" World", "")) returns "Hello". cutprefix() has a the same behavior: the first empty strings stops the loop and returns the string unchanged. I would prefer to raise ValueError("empty separator") to avoid any risk of confusion. I'm not sure that str.cutprefix("") or str.cutsuffix("") does make any sense. "abc".startswith("") and "abc".startswith(("", "a")) are true, but that's fine since startswith() doesn't modify the string. Moreover, we cannot change the behavior now :-) But for new methods, we can try to design them correctly to avoid any risk of confusion. -- It reminds me https://bugs.python.org/issue28029: "".replace("", s, n) now returns s instead of an empty string for all non-zero n. The behavior changes in Python 3.9. There are also discussions about "abc".split("") and re.compile("").split("abc"). str.split() raises ValueError("empty separator") whereas re.split returns ['', 'a', 'b', 'c', ''] which can be (IMO) surprising. See also https://bugs.python.org/issue28937 "str.split(): allow removing empty strings (when sep is not None)". Note: on the other wise, str.strip("") is accepted and returns the string unmodified. But this method doesn't accept a tuple of substrings. It's different than cutprefix/cutsuffix. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Tue, 24 Mar 2020 19:14:16 +0100 Victor Stinner <vstinner@python.org> wrote: []
str.cutprefix("")/str.cutsuffix("") definitely makes sense, e.g.: === config.something === # If you'd like to remove some prefix from your lines, set it here REMOVE_PREFIX = "" ====== === src.py === ... line = line.cutprefix(config.REMOVE_PREFIX) ... ====== Now one may ask whether str.cutprefix(("", "nonempty")) makes sense. A response can be "the more complex functionality, the more complex and confusing corner cases there're to handle". [] -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/5dd46/5dd46d9a69ae935bb5fafc0a5020e4a250324784" alt=""
Hello, On Tue, 24 Mar 2020 22:51:55 +0100 Victor Stinner <vstinner@python.org> wrote:
Or even just: if line.startswith(config.REMOVE_PREFIX): line = line[len(config.REMOVE_PREFIX):] But the point taken - indeed, any confusing, inconsistent behavior can be fixed on users' side with more if's, once they discover it. -- Best regards, Paul mailto:pmiscml@gmail.com
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Mar 24, 2020 at 07:14:16PM +0100, Victor Stinner wrote:
They make as much sense as any other null-operation, such as subtracting 0 or deleting empty slices from lists. Every string s is unchanged if you prepend or concatenate the empty string: assert s == ''+s == s+'' so removing the empty string should obey the same invariant: assert s == s.removeprefix('') == s.removesuffix('') -- Steven
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
It seems that there is a consensus on the names ``removeprefix`` and ``removesuffix``. I will update the PEP accordingly. I'll also simplify sample Python implementation to primarily reflect *intent* over strict type-checking correctness, and I'll adjust the accompanying commentary accordingly. Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP? All the best, Dennis
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 3/24/2020 7:21 PM, Dennis Sweeney wrote:
It seems that there is a consensus on the names ``removeprefix`` and ``removesuffix``. I will update the PEP accordingly. I'll also simplify sample Python implementation to primarily reflect *intent* over strict type-checking correctness, and I'll adjust the accompanying commentary accordingly.
Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP?
No objection. I think that's a good idea. Eric
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Le mer. 25 mars 2020 à 00:29, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP?
name.cutsuffix(('Mixin', 'Tests', 'Test')) is used in the "Motivating examples from the Python standard library" section. It looks like a nice usage of this feature. You added "There were many other such examples in the stdlib." What do you mean by controversial? I proposed to raise an empty if the prefix/suffix is empty to make cutsuffix(("", "suffix")) less surprising. But I'm also fine if you keep this behavior, since startswith/endswith accepts an empty string, and someone wrote that accepting an empty prefix/suffix is an useful feature. Or did someone write that cutprefix/cutsuffix must not accept a tuple of strings? (I'm not sure that I was able to read carefully all emails.) I like the ability to pass multiple prefixes and suffixes because it makes the method similar to lstrip(), rstrip(), strip(), startswith(), endswith() with all accepts multiple "values" (characters to remove, prefixes, suffixes). Victor -- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
There were at least two comments suggesting keeping it to one affix at a time: https://mail.python.org/archives/list/python-dev@python.org/message/GPXSIDLK... https://mail.python.org/archives/list/python-dev@python.org/message/EDWFPEGQ... But I didn't see any big objections to the rest of the PEP, so I think maybe we keep it restricted for now.
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
Thanks for the pointers to emails. Ethan Furman: "This is why replace() still only takes a single substring to match and this isn't supported: (...)" Hum ok, it makes sense. I agree that we can start with only accepting str (reject tuple), and maybe reconsider the idea of accepting a tuple of str later. Please move the idea in Rejected Ideas, but try also to summarize the reasons why the idea was rejected. I saw: * surprising result for empty prefix/suffix * surprising result for "FooBar text".cutprefix(("Foo", "FooBar")) * issue with unordered sequence like set: only accept tuple which is ordered * str.replace() only accepts str.replace(str, str) to avoid these issues: the idea of accepting str.replace(tuple of str, str) or variant was rejected multiple times. XXX does someone have references to past discussions? I found https://bugs.python.org/issue33647 which is a little bit different. You may mention re.sub() as an existing efficient solution for the complex cases. I have to confess that I had to think twice when I wrote my example line.cutsuffix(("\r\n", "\r", "\n")). Did I write suffixes in the correct order to get what I expect? :-) "\r\n" starts with "\r". Victor Le mer. 25 mars 2020 à 01:44, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Wed, 25 Mar 2020 at 00:42, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
That sounds like a good idea. The issue for me is how the function should behave with a list of affixes if one is a prefix of another, e.g.,removeprefix(('Test', 'Tests')). The empty string case is just one form of that. The behaviour should be defined clearly, and while I imagine "always remove the longest" is the "obvious" sensible choice, I am fairly certain there will be other opinions :-) So deferring the decision for now until we have more experience with the single-affix form seems perfectly reasonable. I'm not even sure that switching to multiple affixes later would need a PEP - it might be fine to add via a simple feature request issue. But that can be a decision for later, too. Paul
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 25Mar2020 08:14, Paul Moore <p.f.moore@gmail.com> wrote:
I'd like to preface this with "I'm fine to implement multiple affixes later, if at all". That said: To me "first match" is the _only_ sensible choice. "longest match" can always be implemented with a "first match" function by sorting on length if desired. Also, "longest first" requires the implementation to do a prescan of the supplied affixes whereas "first match" lets the implementation just iterate over the choices as supplied. I'm beginning to think I must again threaten my partner's anecdote about Netscape Proxy's rule system, which prioritised rules by the lexical length of their regexp, not their config file order of appearance. That way lies (and, indeeed, lay) madness. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
PEP 616 -- String methods to remove prefixes and suffixes is available here: https://www.python.org/dev/peps/pep-0616/ Changes: - Only accept single affixes, not tuples - Make the specification more concise - Make fewer stylistic prescriptions for usage - Fix typos A reference implementation GitHub PR is up to date here: https://github.com/python/cpython/pull/18939 Are there any more comments for it before submission?
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
What do you think of adding a Version History section which lists most important changes since your proposed the first version of the PEP? I recall: * Version 3: don't accept tuple * Version 2: Rename cutprefix/cutsuffix to removeprefix/removesuffix, accept tuple * Version 1: initial version For example, for my PEP 587, I wrote detailed changes, but I don't think that you should go into the details ;-) https://www.python.org/dev/peps/pep-0587/#version-history Victor Le sam. 28 mars 2020 à 06:11, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
My intent is to help people like me to follow the discussion on the PEP. There are more than 100 messages, it's hard to follow PEP updates. Victor Le dim. 29 mars 2020 à 14:55, Rob Cliffe via Python-Dev <python-dev@python.org> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/a2b0f/a2b0fbabca194311354c4875a2a4f462f22e91b9" alt=""
Hello all, It seems that most of the discussion has settled down, but I didn't quite understand from reading PEP 1 what the next step should be -- is this an appropriate time to open an issue on the Steering Council GitHub repository requesting pronouncement on PEP 616? Best, Dennis
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
I suggest you to wait one more week to let other people comment the PEP. After this delay, if you consider that the PEP is ready for pronouncement, you can submit it to the Steering Council, right. Victor Le mer. 1 avr. 2020 à 21:56, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Thu., 2 Apr. 2020, 8:30 am Victor Stinner, <vstinner@python.org> wrote:
Note that the submission to the Steering Council doesn't have to be a request for immediate pronouncement - it's a notification that the PEP is mature enough for the Council to decide whether to appoint a Council member as BDFL-Delegate or to appoint someone else. The decision on whether to wait for more questions is then up to the Council and/or the appointed BDFL-Delegate. PEP 616 definitely looks mature enough for that step to me (and potentially even immediately accepted - it did get dissected pretty thoroughly, after all!) Cheers, Nick.
participants (33)
-
Barney Gale
-
Brett Cannon
-
Cameron Simpson
-
Chris Angelico
-
Dan Stromberg
-
Dennis Sweeney
-
Eric Fahlgren
-
Eric V. Smith
-
Ethan Furman
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
Ivan Pozdeev
-
Kyle Stanley
-
Mike Miller
-
MRAB
-
musbur@posteo.org
-
Nathaniel Smith
-
Ned Batchelder
-
Nick Coghlan
-
Paul Ganssle
-
Paul Moore
-
Paul Sokolovsky
-
Rhodri James
-
Rob Cliffe
-
Sebastian Rittau
-
senthil@uthcode.com
-
Stephen J. Turnbull
-
Steve Dower
-
Steve Holden
-
Steven D'Aprano
-
Victor Stinner
-
Walter Dörwald