PEP 616 -- String methods to remove prefixes and suffixes
Browser Link: https://www.python.org/dev/peps/pep-0616/ PEP: 616 Title: String methods to remove prefixes and suffixes Author: Dennis Sweeney <sweeney.dennis650@gmail.com> Sponsor: Eric V. Smith <eric@trueblade.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-Mar-2020 Python-Version: 3.9 Post-History: 30-Aug-2002 Abstract ======== This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects. In particular, the methods would be added to Unicode ``str`` objects, binary ``bytes`` and ``bytearray`` objects, and ``collections.UserString``. If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``. The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is roughly equivalent to ``s[:-len(suf)] if suf and s.endswith(suf) else s``. Rationale ========= There have been repeated issues [#confusion]_ on the Bug Tracker and StackOverflow related to user confusion about the existing ``str.lstrip`` and ``str.rstrip`` methods. These users are typically expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they are surprised that the parameter for ``lstrip`` is interpreted as a set of characters, not a substring. This repeated issue is evidence that these methods are useful, and the new methods allow a cleaner redirection of users to the desired behavior. As another testimonial for the usefulness of these methods, several users on Python-Ideas [#pyid]_ reported frequently including similar functions in their own code for productivity. The implementation often contained subtle mistakes regarding the handling of the empty string (see `Specification`_). Specification ============= The builtin ``str`` class will gain two new methods with roughly the following behavior:: def cutprefix(self: str, pre: str, /) -> str: if self.startswith(pre): return self[len(pre):] return self[:] def cutsuffix(self: str, suf: str, /) -> str: if suf and self.endswith(suf): return self[:-len(suf)] return self[:] The only difference between the real implementation and the above is that, as with other string methods like ``replace``, the methods will raise a ``TypeError`` if any of ``self``, ``pre`` or ``suf`` is not an instace of ``str``, and will cast subclasses of ``str`` to builtin ``str`` objects. Note that without the check for the truthyness of ``suf``, ``s.cutsuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``. Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()`` will accept any bytes-like object as an argument. Note that the ``bytearray`` methods return a copy of ``self``; they do not operate in place. The following behavior is considered a CPython implementation detail, but is not guaranteed by this specification:: >>> x = 'foobar' * 10**6 >>> x.cutprefix('baz') is x is x.cutsuffix('baz') True >>> x.cutprefix('') is x is x.cutsuffix('') True That is, for CPython's immutable ``str`` and ``bytes`` objects, the methods return the original object when the affix is not found or if the affix is empty. Because these types test for equality using shortcuts for identity and length, the following equivalent expressions are evaluated at approximately the same speed, for any ``str`` objects (or ``bytes`` objects) ``x`` and ``y``:: >>> (True, x[len(y):]) if x.startswith(y) else (False, x) >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x) The two methods will also be added to ``collections.UserString``, where they rely on the implementation of the new ``str`` methods. Motivating examples from the Python standard library ==================================================== The examples below demonstrate how the proposed methods can make code one or more of the following: Less fragile: The code will not depend on the user to count the length of a literal. More performant: The code does not require a call to the Python built-in ``len`` function. More descriptive: The methods give a higher-level API for code readability, as opposed to the traditional method of string slicing. refactor.py ----------- - Current:: if fix_name.startswith(self.FILE_PREFIX): fix_name = fix_name[len(self.FILE_PREFIX):] - Improved:: fix_name = fix_name.cutprefix(self.FILE_PREFIX) c_annotations.py: ----------------- - Current:: if name.startswith("c."): name = name[2:] - Improved:: name = name.cutprefix("c.") find_recursionlimit.py ---------------------- - Current:: if test_func_name.startswith("test_"): print(test_func_name[5:]) else: print(test_func_name) - Improved:: print(test_finc_name.cutprefix("test_")) deccheck.py ----------- This is an interesting case because the author chose to use the ``str.replace`` method in a situation where only a prefix was intended to be removed. - Current:: if funcname.startswith("context."): self.funcname = funcname.replace("context.", "") self.contextfunc = True else: self.funcname = funcname self.contextfunc = False - Improved:: if funcname.startswith("context."): self.funcname = funcname.cutprefix("context.") self.contextfunc = True else: self.funcname = funcname self.contextfunc = False - Arguably further improved:: self.contextfunc = funcname.startswith("context.") self.funcname = funcname.cutprefix("context.") test_i18n.py ------------ - Current:: if test_func_name.startswith("test_"): print(test_func_name[5:]) else: print(test_func_name) - Improved:: print(test_finc_name.cutprefix("test_")) - Current:: if creationDate.endswith('\\n'): creationDate = creationDate[:-len('\\n')] - Improved:: creationDate = creationDate.cutsuffix('\\n') shared_memory.py ---------------- - Current:: reported_name = self._name if _USE_POSIX and self._prepend_leading_slash: if self._name.startswith("/"): reported_name = self._name[1:] return reported_name - Improved:: if _USE_POSIX and self._prepend_leading_slash: return self._name.cutprefix("/") return self._name build-installer.py ------------------ - Current:: if archiveName.endswith('.tar.gz'): retval = os.path.basename(archiveName[:-7]) if ((retval.startswith('tcl') or retval.startswith('tk')) and retval.endswith('-src')): retval = retval[:-4] - Improved:: if archiveName.endswith('.tar.gz'): retval = os.path.basename(archiveName[:-7]) if retval.startswith(('tcl', 'tk')): retval = retval.cutsuffix('-src') Depending on personal style, ``archiveName[:-7]`` could also be changed to ``archiveName.cutsuffix('.tar.gz')``. test_core.py ------------ - Current:: if output.endswith("\n"): output = output[:-1] - Improved:: output = output.cutsuffix("\n") cookiejar.py ------------ - Current:: def strip_quotes(text): if text.startswith('"'): text = text[1:] if text.endswith('"'): text = text[:-1] return text - Improved:: def strip_quotes(text): return text.cutprefix('"').cutsuffix('"') - Current:: if line.endswith("\n"): line = line[:-1] - Improved:: line = line.cutsuffix("\n") fixdiv.py --------- - Current:: def chop(line): if line.endswith("\n"): return line[:-1] else: return line - Improved:: def chop(line): return line.cutsuffix("\n") test_concurrent_futures.py -------------------------- In the following example, the meaning of the code changes slightly, but in context, it behaves the same. - Current:: if name.endswith(('Mixin', 'Tests')): return name[:-5] elif name.endswith('Test'): return name[:-4] else: return name - Improved:: return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test') msvc9compiler.py ---------------- - Current:: if value.endswith(os.pathsep): value = value[:-1] - Improved:: value = value.cutsuffix(os.pathsep) test_pathlib.py --------------- - Current:: self.assertTrue(r.startswith(clsname + '('), r) self.assertTrue(r.endswith(')'), r) inner = r[len(clsname) + 1 : -1] - Improved:: self.assertTrue(r.startswith(clsname + '('), r) self.assertTrue(r.endswith(')'), r) inner = r.cutprefix(clsname + '(').cutsuffix(')') Rejected Ideas ============== Expand the lstrip and rstrip APIs --------------------------------- Because ``lstrip`` takes a string as its argument, it could be viewed as taking an iterable of length-1 strings. The API could therefore be generalized to accept any iterable of strings, which would be successively removed as prefixes. While this behavior would be consistent, it would not be obvious for users to have to call ``'foobar'.cutprefix(('foo,))`` for the common use case of a single prefix. Allow multiple prefixes ----------------------- Some users discussed the desire to be able to remove multiple prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``. However, this adds ambiguity about the order in which the prefixes are removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``. After this proposal, this can be spelled explicitly as ``s.cutprefix('Foo').cutprefix('FooBar')``. Remove multiple copies of a prefix ---------------------------------- This is the behavior that would be consistent with the aforementioned expansion of the ``lstrip/rstrip`` API -- repeatedly applying the function until the argument is unchanged. This behavior is attainable from the proposed behavior via the following:: >>> s = 'foo' * 100 + 'bar' >>> while s != (s := s.cutprefix("foo")): pass >>> s 'bar' The above can be modififed by chaining multiple ``cutprefix`` calls together to achieve the full behavior of the ``lstrip``/``rstrip`` generalization, while being explicit in the order of removal. While the proposed API could later be extended to include some of these use cases, to do so before any observation of how these methods are used in practice would be premature and may lead to choosing the wrong behavior. Raising an exception when not found ----------------------------------- There was a suggestion that ``s.cutprefix(pre)`` should raise an exception if ``not s.startswith(pre)``. However, this does not match with the behavior and feel of other string methods. There could be ``required=False`` keyword added, but this violates the KISS principle. Alternative Method Names ------------------------ Several alternatives method names have been proposed. Some are listed below, along with commentary for why they should be rejected in favor of ``cutprefix`` (the same arguments hold for ``cutsuffix``) ``ltrim`` "Trim" does in other languages (e.g. JavaScript, Java, Go, PHP) what ``strip`` methods do in Python. ``lstrip(string=...)`` This would avoid adding a new method, but for different behavior, it's better to have two different methods than one method with a keyword argument that select the behavior. ``cut_prefix`` All of the other methods of the string API, e.g. ``str.startswith()``, use ``lowercase`` rather than ``lower_case_with_underscores``. ``cutleft``, ``leftcut``, or ``lcut`` The explicitness of "prefix" is preferred. ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc. All of these might have been acceptable, but they have more characters than ``cut``. Some suggested that the verb "cut" implies mutability, but the string API already contains verbs like "replace", "strip", "split", and "swapcase". ``stripprefix`` Users may benefit from the mnemonic that "strip" means working with sets of characters, while other methods work with substrings, so re-using "strip" here should be avoided. Reference Implementation ======================== See the pull request on GitHub [#pr]_. References ========== .. [#pr] GitHub pull request with implementation (https://github.com/python/cpython/pull/18939) .. [#pyid] Discussion on Python-Ideas (https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSU...) .. [#confusion] Comment listing Bug Tracker and StackOverflow issues (https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFI...) Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
On 03/20/2020 11:52 AM, Dennis Sweeney wrote:
Browser Link: https://www.python.org/dev/peps/pep-0616/
PEP: 616 Title: String methods to remove prefixes and suffixes
Thank you, Dennis, for putting this together! And Eric for sponsoring. :) Overall I think it's a good idea, but...
Alternative Method Names ------------------------
``stripprefix`` Users may benefit from the mnemonic that "strip" means working with sets of characters, while other methods work with substrings, so re-using "strip" here should be avoided.
Um, what mnemonic? I am strongly opposed to the chosen names of `cut*` -- these methods do basically the same thing as the existing `strip` methods (remove something from either end of a string), and so should have similar names: - the existence of `stripsuffix` is a clue/reminder that `strip` doesn't work with substrings - if all of these similar methods have similar names they will be grouped together in the documentation making discovery of the correct one much easier. So for this iteration of the PEP, I am -1 -- ~Ethan~
Thanks for the feedback! I meant mnemonic as in the broader sense of "way of remembering things", not some kind of rhyming device or acronym. Maybe "mnemonic" isn't the perfect word. I was just trying to say that the structure of how the methods are named should how their behavior relates to one another, which it seems you agree with. Fair enough that ``[l/r]strip`` and the proposed methods share the behavior of "removing something from the end of a string". From that perspective, they're similar. But my thought was that ``s.lstrip("abc")`` has extremely similar behavior when changing "lstrip" to "rstrip" or "strip" -- the argument is interpreted in the exactly same way (as a character set) in each case. Looking at how the argument is used, I'd argue that ``lstrip``/``rstrip``/``strip`` are much more similar to each other than they are to the proposed methods, and that the proposed methods are perhaps more similar to something like ``str.replace``. But it does seem pretty subjective what the threshold is for behavior similar enough to have related names -- I see where you're coming from. Also, the docs at ( https://docs.python.org/3/library/stdtypes.html?highlight=lstrip#string-meth... ) are alphabetical, not grouped by "similar names", so even ``lstrip``, ``strip``, and ``rstrip`` are already in different places. Maybe the name "stripprefix" would be more discoverable when "Ctrl-f"ing the docs, if it weren't for the following addition in the linked PR: .. method:: str.lstrip([chars]) Return a copy of the string with leading characters removed. The *chars* argument is a string specifying the set of characters to be removed. If omitted or ``None``, the *chars* argument defaults to removing whitespace. The *chars* argument is not a prefix; rather, all combinations of its values are stripped:: >>> ' spacious '.lstrip() 'spacious ' >>> 'www.example.com'.lstrip('cmowz.') 'example.com' + See :meth:`str.cutprefix` for a method that will remove a single prefix + string rather than all of a set of characters.
On Fri, 20 Mar 2020 20:49:12 -0000 "Dennis Sweeney" <sweeney.dennis650@gmail.com> wrote:
exactly same way (as a character set) in each case. Looking at how the argument is used, I'd argue that ``lstrip``/``rstrip``/``strip`` are much more similar to each other than they are to the proposed methods
Correct, but I don't like the word "cut" because it suggests that something is cut into pieces which can be used later separately. I'd propose to use "trim" instead of "cut" because it makes clear that something is cut off and discarded, and it is clearly different from "strip".
On 21Mar2020 14:17, musbur@posteo.org <musbur@posteo.org> wrote:
On Fri, 20 Mar 2020 20:49:12 -0000 "Dennis Sweeney" <sweeney.dennis650@gmail.com> wrote:
exactly same way (as a character set) in each case. Looking at how the argument is used, I'd argue that ``lstrip``/``rstrip``/``strip`` are much more similar to each other than they are to the proposed methods
Correct, but I don't like the word "cut" because it suggests that something is cut into pieces which can be used later separately.
I'd propose to use "trim" instead of "cut" because it makes clear that something is cut off and discarded, and it is clearly different from "strip".
Please, NO. "trim" is a VERY well known PHP function, and does what our strip does. I've very against this (otherwise fine) word for this reason. I still prefer "cut", though the consensus seems to be for "strip". Cheers, Cameron Simpson <cs@cskk.id.au>
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
For clarity, I'll change If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. to If ``s`` does not have ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns ``s`` or an unchanged copy of ``s``. For consistency with the Specification section, I'll also change s[len(pre):] if s.startswith(pre) else s to s[len(pre):] if s.startswith(pre) else s[:] and similarly change the ``cutsuffix`` snippet.
On 2020-03-20 21:49, Dennis Sweeney wrote:
For clarity, I'll change
If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned.
to
If ``s`` does not have ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns ``s`` or an unchanged copy of ``s``.
For consistency with the Specification section, I'll also change
s[len(pre):] if s.startswith(pre) else s
to
s[len(pre):] if s.startswith(pre) else s[:]
and similarly change the ``cutsuffix`` snippet.
If ``s`` is immutable, why return a copy when the original will do? s[len(pre) : ] if pre and s.startswith(pre) else s s[ : -len(suf)] if suf and s.endswith(suf) else s
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so. This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be. So one writes (assuming methods): prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also I am explicitly in favour of returning self if unchanged. Cheers, Cameron Simpson <cs@cskk.id.au>
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not. --Ned.
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not.
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not. Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question. Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix". Eric
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings. Victor Le sam. 21 mars 2020 à 17:23, Eric V. Smith <eric@trueblade.com> a écrit :
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not.
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not.
Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question.
Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix".
Eric
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZ... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings.
Good point. And I guess it's still a problem for interned strings, since even a copy could be the same object:
s = 'for' s[:] is 'for' True
So I now agree with Ned, we shouldn't be prescriptive here, and we should explicitly say in the PEP that there's no way to tell if the strip/cut/whatever took place, other than comparing via equality, not identity. Eric
Victor
Le sam. 21 mars 2020 à 17:23, Eric V. Smith <eric@trueblade.com> a écrit :
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self. My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not. The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I
On 3/21/2020 11:20 AM, Ned Batchelder wrote: think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not.
Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question.
Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix".
Eric
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZ... Code of Conduct: http://python.org/psf/codeofconduct/
In that case, the PEP should advice to use .startwith() or .endswith() explicitly if the caller requires to know if the string is going to be modified. Example: modified = False # O(n) complexity where n=len("prefix:") if line.startswith("prefix:"): line = line.cutprefix("prefix: ") modified = True It should be more efficient than: old_line = line line = line.cutprefix("prefix: ") modified = (line != old_line) # O(n) complexity where n=len(line) since the checked prefix is usually way shorter than the whole string. Victor Le sam. 21 mars 2020 à 17:45, Eric V. Smith <eric@trueblade.com> a écrit :
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings.
Good point. And I guess it's still a problem for interned strings, since even a copy could be the same object:
s = 'for' s[:] is 'for' True
So I now agree with Ned, we shouldn't be prescriptive here, and we should explicitly say in the PEP that there's no way to tell if the strip/cut/whatever took place, other than comparing via equality, not identity.
Eric
Victor
Le sam. 21 mars 2020 à 17:23, Eric V. Smith <eric@trueblade.com> a écrit :
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
> If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then > ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has > been removed. If ``s`` does not have ``pre`` as a prefix, an > unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` > is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``. > The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self. My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not. The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I
On 3/21/2020 11:20 AM, Ned Batchelder wrote: think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not.
Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question.
Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix".
Eric
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZ... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On 3/21/2020 12:50 PM, Victor Stinner wrote:
In that case, the PEP should advice to use .startwith() or .endswith() explicitly if the caller requires to know if the string is going to be modified. Example:
modified = False # O(n) complexity where n=len("prefix:") if line.startswith("prefix:"): line = line.cutprefix("prefix: ") modified = True
It should be more efficient than:
old_line = line line = line.cutprefix("prefix: ") modified = (line != old_line) # O(n) complexity where n=len(line)
since the checked prefix is usually way shorter than the whole string.
Agreed (except the string passed to startswith should be the same as the one used in cutprefix!). Eric
Victor
Le sam. 21 mars 2020 à 17:45, Eric V. Smith <eric@trueblade.com> a écrit :
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings. Good point. And I guess it's still a problem for interned strings, since even a copy could be the same object:
s = 'for' s[:] is 'for' True
So I now agree with Ned, we shouldn't be prescriptive here, and we should explicitly say in the PEP that there's no way to tell if the strip/cut/whatever took place, other than comparing via equality, not identity.
Eric
Victor
Le sam. 21 mars 2020 à 17:23, Eric V. Smith <eric@trueblade.com> a écrit :
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote: > On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney > <sweeney.dennis650@gmail.com> > wrote: > >> If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then >> ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has >> been removed. If ``s`` does not have ``pre`` as a prefix, an >> unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` >> is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``. >> > The second sentence above unambiguously states that cutprefix > returns 'an > unchanged *copy*', but the example contradicts that and shows that > 'self' > may be returned and not a copy. I think it should be reworded to > explicitly allow the optimization of returning self. My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not. The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I
On 3/21/2020 11:20 AM, Ned Batchelder wrote: think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not.
Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question.
Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix".
Eric
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HYSZSIAZ... Code of Conduct: http://python.org/psf/codeofconduct/
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings.
Are you suggesting that it could become impossible to write this function: def myself(o): return o and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language.
Good point. And I guess it's still a problem for interned strings, since even a copy could be the same object:
s = 'for' s[:] is 'for' True
So I now agree with Ned, we shouldn't be prescriptive here, and we should explicitly say in the PEP that there's no way to tell if the strip/cut/whatever took place, other than comparing via equality, not identity.
Unless Victor asserts that a function like myself() above cannot be relied on to have its return value "is" its passed in value, I disagree. The beauty of returning the original object on no change is that the test is O(1) and the criterion is clear. It is easy to document that stripping an empty affix returns the original string. I guess a test for len(stripped_string) == len(unstripped_string) is also O(1), and is less prescriptive. I just don't see the weight to Ned's characterisation of "a is/is-not b" as overly prescriptive; returning the same reference as one is given seems nearly the easiest thing a function can ever do. Cheers, Cameron Simpson <cs@cskk.id.au>
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings.
Are you suggesting that it could become impossible to write this function:
def myself(o): return o
and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language.
Other way around - because strings are immutable, their identity isn't supposed to matter, so it's possible that functions that currently return the exact same object in some cases may in the future start returning a different object with the same value. Right now, in CPython, with no tagged pointers, we return the full existing pointer wherever we can, as that saves us a data copy. With tagged pointers, the pointer storage effectively *is* the instance, so you can't really replicate that existing "copy the reference not the storage" behaviour any more. That said, it's also possible that identity for tagged pointers would be value based (similar to the effect of the small integer cache and string interning), in which case the entire question would become moot. Either way, the PEP shouldn't be specifying that a new object *must* be returned, and it also shouldn't be specifying that the same object *can't* be returned. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 3/22/2020 1:42 AM, Nick Coghlan wrote:
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings. Are you suggesting that it could become impossible to write this function:
def myself(o): return o
and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language. Other way around - because strings are immutable, their identity isn't supposed to matter, so it's possible that functions that currently return the exact same object in some cases may in the future start returning a different object with the same value.
Right now, in CPython, with no tagged pointers, we return the full existing pointer wherever we can, as that saves us a data copy. With tagged pointers, the pointer storage effectively *is* the instance, so you can't really replicate that existing "copy the reference not the storage" behaviour any more.
That said, it's also possible that identity for tagged pointers would be value based (similar to the effect of the small integer cache and string interning), in which case the entire question would become moot.
Either way, the PEP shouldn't be specifying that a new object *must* be returned, and it also shouldn't be specifying that the same object *can't* be returned.
Agreed. I think the PEP should say that a str will be returned (in the event of a subclass, assuming that's what we decide), but if the argument is exactly a str, that it may or may not return the original object. Eric
I don't see any rationale in the PEP or in the python-ideas thread (admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass" there). Is this just for consistency with other methods like .casefold? I can understand why you'd want it to be consistent, but I think it's misguided in this case. It adds unnecessary complexity for subclass implementers to need to re-implement these two additional methods, and I can see no obvious reason why this behavior would be necessary, since these methods can be implemented in terms of string slicing. Even if you wanted to use `str`-specific optimizations in C that aren't available if you are constrained to use the subclass's __getitem__, it's inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast path" that doesn't use slice. I think defining this in terms of string slicing makes the most sense (and, notably, slice itself returns `str` unless explicitly overridden, the default is for it to return `str` anyway...). Either way, it would be nice to see the rationale included in the PEP somewhere. Best, Paul On 3/22/20 7:16 AM, Eric V. Smith wrote:
On 3/22/2020 1:42 AM, Nick Coghlan wrote:
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings. Are you suggesting that it could become impossible to write this function:
def myself(o): return o
and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language. Other way around - because strings are immutable, their identity isn't supposed to matter, so it's possible that functions that currently return the exact same object in some cases may in the future start returning a different object with the same value.
Right now, in CPython, with no tagged pointers, we return the full existing pointer wherever we can, as that saves us a data copy. With tagged pointers, the pointer storage effectively *is* the instance, so you can't really replicate that existing "copy the reference not the storage" behaviour any more.
That said, it's also possible that identity for tagged pointers would be value based (similar to the effect of the small integer cache and string interning), in which case the entire question would become moot.
Either way, the PEP shouldn't be specifying that a new object *must* be returned, and it also shouldn't be specifying that the same object *can't* be returned.
Agreed. I think the PEP should say that a str will be returned (in the event of a subclass, assuming that's what we decide), but if the argument is exactly a str, that it may or may not return the original object.
Eric
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JHM7T6JZ... Code of Conduct: http://python.org/psf/codeofconduct/
tl; dr A method implemented in C is more efficient than hand-written pure-Python code, and it's less error-prone I don't think if it has already been said previously, but I hate having to compute manually the string length when writing: if line.startswith("prefix"): line = line[6:] Usually what I do is to open a Python REPL and I type: len("prefix") and copy-paste the result :-) Passing directly the length is a risk of mistake. What if I write line[7:] and it works most of the time because of a space, but sometimes the space is omitted randomly and the application fails? -- The lazy approach is: if line.startswith("prefix"): line = line[len("prefix"):] Such code makes my "micro-optimizer hearth" bleeding since I know that Python is stupid and calls len() at runtime, the compiler is unable to optimize it (sadly for good reasons, len name can be overriden) :-( => line.cutprefix("prefix") is more efficient! ;-) It's also also shorter. Victor Le dim. 22 mars 2020 à 17:02, Paul Ganssle <paul@ganssle.io> a écrit :
I don't see any rationale in the PEP or in the python-ideas thread (admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass" there). Is this just for consistency with other methods like .casefold?
I can understand why you'd want it to be consistent, but I think it's misguided in this case. It adds unnecessary complexity for subclass implementers to need to re-implement these two additional methods, and I can see no obvious reason why this behavior would be necessary, since these methods can be implemented in terms of string slicing.
Even if you wanted to use `str`-specific optimizations in C that aren't available if you are constrained to use the subclass's __getitem__, it's inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast path" that doesn't use slice.
I think defining this in terms of string slicing makes the most sense (and, notably, slice itself returns `str` unless explicitly overridden, the default is for it to return `str` anyway...).
Either way, it would be nice to see the rationale included in the PEP somewhere.
Best, Paul
On 3/22/20 7:16 AM, Eric V. Smith wrote:
On 3/22/2020 1:42 AM, Nick Coghlan wrote:
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
On 3/21/2020 12:39 PM, Victor Stinner wrote:
Well, if CPython is modified to implement tagged pointers and supports storing a short strings (a few latin1 characters) as a pointer, it may become harder to keep the same behavior for "x is y" where x and y are strings. Are you suggesting that it could become impossible to write this function:
def myself(o): return o
and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language. Other way around - because strings are immutable, their identity isn't supposed to matter, so it's possible that functions that currently return the exact same object in some cases may in the future start returning a different object with the same value.
Right now, in CPython, with no tagged pointers, we return the full existing pointer wherever we can, as that saves us a data copy. With tagged pointers, the pointer storage effectively *is* the instance, so you can't really replicate that existing "copy the reference not the storage" behaviour any more.
That said, it's also possible that identity for tagged pointers would be value based (similar to the effect of the small integer cache and string interning), in which case the entire question would become moot.
Either way, the PEP shouldn't be specifying that a new object *must* be returned, and it also shouldn't be specifying that the same object *can't* be returned.
Agreed. I think the PEP should say that a str will be returned (in the event of a subclass, assuming that's what we decide), but if the argument is exactly a str, that it may or may not return the original object.
Eric
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JHM7T6JZ... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RTQWEE4K... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
Sorry, I think I accidentally left out a clause here - I meant that the rationale for /always returning a 'str'/ (as opposed to returning a subclass) is missing, it just says in the PEP:
The only difference between the real implementation and the above is that, as with other string methods like replace, the methods will raise a TypeError if any of self, pre or suf is not an instace of str, and will cast subclasses of str to builtin str objects.
I think the rationale for these differences is not made entirely clear, specifically the "and will cast subclasses of str to builtin str objects" part. I think it would be best to define the truncation in terms of __getitem__ - possibly with the caveat that implementations are allowed (but not required) to return `self` unchanged if no match is found. Best, Paul P.S. Dennis - just noticed in this reply that there is a typo in the PEP - s/instace/instance On 3/22/20 12:15 PM, Victor Stinner wrote:
tl; dr A method implemented in C is more efficient than hand-written pure-Python code, and it's less error-prone
I don't think if it has already been said previously, but I hate having to compute manually the string length when writing:
if line.startswith("prefix"): line = line[6:]
Usually what I do is to open a Python REPL and I type: len("prefix") and copy-paste the result :-)
Passing directly the length is a risk of mistake. What if I write line[7:] and it works most of the time because of a space, but sometimes the space is omitted randomly and the application fails?
--
The lazy approach is:
if line.startswith("prefix"): line = line[len("prefix"):]
Such code makes my "micro-optimizer hearth" bleeding since I know that Python is stupid and calls len() at runtime, the compiler is unable to optimize it (sadly for good reasons, len name can be overriden) :-(
=> line.cutprefix("prefix") is more efficient! ;-) It's also also shorter.
Victor
Le dim. 22 mars 2020 à 17:02, Paul Ganssle <paul@ganssle.io> a écrit :
I don't see any rationale in the PEP or in the python-ideas thread (admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass" there). Is this just for consistency with other methods like .casefold?
I can understand why you'd want it to be consistent, but I think it's misguided in this case. It adds unnecessary complexity for subclass implementers to need to re-implement these two additional methods, and I can see no obvious reason why this behavior would be necessary, since these methods can be implemented in terms of string slicing.
Even if you wanted to use `str`-specific optimizations in C that aren't available if you are constrained to use the subclass's __getitem__, it's inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast path" that doesn't use slice.
I think defining this in terms of string slicing makes the most sense (and, notably, slice itself returns `str` unless explicitly overridden, the default is for it to return `str` anyway...).
Either way, it would be nice to see the rationale included in the PEP somewhere.
Best, Paul
On 3/22/20 7:16 AM, Eric V. Smith wrote:
On 3/22/2020 1:42 AM, Nick Coghlan wrote:
On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <cs@cskk.id.au> wrote:
On 21Mar2020 12:45, Eric V. Smith <eric@trueblade.com> wrote:
On 3/21/2020 12:39 PM, Victor Stinner wrote: > Well, if CPython is modified to implement tagged pointers and > supports > storing a short strings (a few latin1 characters) as a pointer, it > may > become harder to keep the same behavior for "x is y" where x and y > are > strings. Are you suggesting that it could become impossible to write this function:
def myself(o): return o
and not be able to rely on "o is myself(o)"? That seems... a pretty nasty breaking change for the language. Other way around - because strings are immutable, their identity isn't supposed to matter, so it's possible that functions that currently return the exact same object in some cases may in the future start returning a different object with the same value.
Right now, in CPython, with no tagged pointers, we return the full existing pointer wherever we can, as that saves us a data copy. With tagged pointers, the pointer storage effectively *is* the instance, so you can't really replicate that existing "copy the reference not the storage" behaviour any more.
That said, it's also possible that identity for tagged pointers would be value based (similar to the effect of the small integer cache and string interning), in which case the entire question would become moot.
Either way, the PEP shouldn't be specifying that a new object *must* be returned, and it also shouldn't be specifying that the same object *can't* be returned. Agreed. I think the PEP should say that a str will be returned (in the event of a subclass, assuming that's what we decide), but if the argument is exactly a str, that it may or may not return the original object.
Eric
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JHM7T6JZ... Code of Conduct: http://python.org/psf/codeofconduct/
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RTQWEE4K... Code of Conduct: http://python.org/psf/codeofconduct/
On 3/22/2020 12:25 PM, Paul Ganssle wrote:
Sorry, I think I accidentally left out a clause here - I meant that the rationale for /always returning a 'str'/ (as opposed to returning a subclass) is missing, it just says in the PEP:
The only difference between the real implementation and the above is that, as with other string methods like replace, the methods will raise a TypeError if any of self, pre or suf is not an instace of str, and will cast subclasses of str to builtin str objects.
I think the rationale for these differences is not made entirely clear, specifically the "and will cast subclasses of str to builtin str objects" part.
Agreed. I don't understand the rationale, either. If we stick with it, it should definitely be stated. And if we don't, that reason should be explained, too. Eric
On Sun, Mar 22, 2020 at 4:20 AM Eric V. Smith <eric@trueblade.com> wrote:
Agreed. I think the PEP should say that a str will be returned (in the event of a subclass, assuming that's what we decide), but if the argument is exactly a str, that it may or may not return the original object.
Yes. Returning self if the class is exactly str is *just* an optimization -- it must not be mandated nor ruled out. And we *have* to decide that it returns a plain str instance if called on a subclass instance (unless overridden, of course) since the base class (str) won't know the signature of the subclass constructor. That's also why all other str methods return an instance of plain str when called on a subclass instance. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
And we *have* to decide that it returns a plain str instance if called on a subclass instance (unless overridden, of course) since the base class (str) won't know the signature of the subclass constructor. That's also why all other str methods return an instance of plain str when called on a subclass instance.
My suggestion is to rely on __getitem__ here (for subclasses), in which case we don't actually need to know the subclass constructor. The rough implementation in the PEP shows how to do it without needing to know the subclass constructor: def redbikeshed(self, prefix): if self.startswith(pre): return self[len(pre):] return self[:] The actual implementation doesn't need to be implemented that way, as long as the result is always there result of slicing the original string, it's safe to do so* and more convenient for subclass implementers (who now only have to implement __getitem__ to get the affix-trimming functions for free). One downside to this scheme is that I think it makes getting the type hinting right more complicated, since the return type of these functions is basically, "Whatever the return type of self.__getitem__ is", but I don't think anyone will complain if you write -> str with the understanding that __getitem__ should return a str or a subtype thereof. Best, Paul *Assuming they haven't messed with __getitem__ to do something non-standard, but if they've done that I think they've tossed Liskov substitution out the window and will have to re-implement these methods if they want them to work. On 3/22/20 2:03 PM, Guido van Rossum wrote:
On Sun, Mar 22, 2020 at 4:20 AM Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:
Agreed. I think the PEP should say that a str will be returned (in the event of a subclass, assuming that's what we decide), but if the argument is exactly a str, that it may or may not return the original object.
Yes. Returning self if the class is exactly str is *just* an optimization -- it must not be mandated nor ruled out.
-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>) /Pronouns: he/him //(why is my pronoun here?)/ <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZZTY3OCJ... Code of Conduct: http://python.org/psf/codeofconduct/
On 21/03/2020 16:15, Eric V. Smith wrote:
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not.
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not.
Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question.
Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix".
Eric
*If* no python implementation would have a problem with the "is" test (and from a position of total ignorance I would guess that this is the case :-)), then it would be a useful feature and it is easier to define it now than try to force conformance later. I have no problem with 's.startswith("") == True and s.cutprefix("") is s'. YMMV. Rob Cliffe
On 3/21/20 12:51 PM, Rob Cliffe via Python-Dev wrote:
On 21/03/2020 16:15, Eric V. Smith wrote:
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not.
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not.
Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question.
Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix".
Eric
*If* no python implementation would have a problem with the "is" test (and from a position of total ignorance I would guess that this is the case :-)), then it would be a useful feature and it is easier to define it now than try to force conformance later. I have no problem with 's.startswith("") == True and s.cutprefix("") is s'. YMMV.
Why take on that "*If*" conditional? We're constantly telling people not to compare strings with "is". So why define how "is" will behave in this PEP? It's the implementation's decision whether to return a new immutable object with the same value, or the same object. As Steven points out elsewhere in this thread, Python's builtins' behavior differ, across methods and versions, in this regard. I certainly didn't know that, and it was probably news to you as well. So why do we need to nail it down for suffixes and prefixes? There will be no conformance to force later, because if the value doesn't change, then it doesn't matter whether it's a new string or the same string. --Ned.
On 21/03/2020 20:16, Ned Batchelder wrote:
On 3/21/20 12:51 PM, Rob Cliffe via Python-Dev wrote:
On 21/03/2020 16:15, Eric V. Smith wrote:
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
On 3/20/20 9:34 PM, Cameron Simpson wrote:
On 20Mar2020 13:57, Eric Fahlgren <ericfahlgren@gmail.com> wrote:
On Fri, Mar 20, 2020 at 11:56 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
> If ``s`` is one these objects, and ``s`` has ``pre`` as a > prefix, then > ``s.cutprefix(pre)`` returns a copy of ``s`` in which that > prefix has > been removed. If ``s`` does not have ``pre`` as a prefix, an > unchanged copy of ``s`` is returned. In summary, > ``s.cutprefix(pre)`` > is roughly equivalent to ``s[len(pre):] if s.startswith(pre) > else s``. >
The second sentence above unambiguously states that cutprefix returns 'an unchanged *copy*', but the example contradicts that and shows that 'self' may be returned and not a copy. I think it should be reworded to explicitly allow the optimization of returning self.
My versions of these (plain old functions) return self if unchanged, and are explicitly documented as doing so.
This has the concrete advantage that one can test for nonremoval if the suffix with "is", which is very fast, instead of == which may not be.
So one writes (assuming methods):
prefix = cutsuffix(s, 'abc') if prefix is s: ... no change else: ... definitely changed, s != prefix also
I am explicitly in favour of returning self if unchanged.
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not.
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. I think if we don't specify the behavior one way or the other, people are going to rely on Cpython's behavior here, consciously or not.
Is there some python implementation that would have a problem with the "is" test, if we were being this prescriptive? Honest question.
Of course this would open the question of what to do if the suffix is the empty string. But since "'foo'.startswith('')" is True, maybe we'd have to return a copy in that case. It would be odd to have "s.startswith('')" be true, but "s.cutprefix('') is s" also be True. Or, since there's already talk in the PEP about what happens if the prefix/suffix is the empty string, and if we adopt the "is" behavior we'd add more details there. Like "if the result is the same object as self, it means either the suffix is the empty string, or self didn't start with the suffix".
Eric
*If* no python implementation would have a problem with the "is" test (and from a position of total ignorance I would guess that this is the case :-)), then it would be a useful feature and it is easier to define it now than try to force conformance later. I have no problem with 's.startswith("") == True and s.cutprefix("") is s'. YMMV.
Why take on that "*If*" conditional? We're constantly telling people not to compare strings with "is". So why define how "is" will behave in this PEP? It's the implementation's decision whether to return a new immutable object with the same value, or the same object.
As Steven points out elsewhere in this thread, Python's builtins' behavior differ, across methods and versions, in this regard. I certainly didn't know that, and it was probably news to you as well. So why do we need to nail it down for suffixes and prefixes?
There will be no conformance to force later, because if the value doesn't change, then it doesn't matter whether it's a new string or the same string.
--Ned. Conceded. Rob Cliffe
On Sat, Mar 21, 2020 at 12:15:21PM -0400, Eric V. Smith wrote:
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not.
I agree with Ned -- whether the string object is returned unchanged or a copy is an implementation decision, not a language decision. [Eric]
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing.
We don't make this guarantee about string identity for any other string method, and CPython's behaviour varies from method to method: py> s = 'a b c' py> s is s.strip() True py> s is s.lower() False and version to version: py> s is s.replace('a', 'a') # 2.7 False py> s is s.replace('a', 'a') # 3.5 True I've never seen anyone relying on this behaviour, and I don't expect these new methods will change that. Thinking that `is` is another way of writing `==`, yes, I see that frequently. But relying on object identity to see whether a new string was created by a method, no. If you want to know whether a prefix/suffix was removed, there's a more reliable way than identity and a cheaper way than O(N) equality. Just compare the length of the string before and after. If the lengths are the same, nothing was removed. -- Steven
On 3/21/2020 2:09 PM, Steven D'Aprano wrote:
On Sat, Mar 21, 2020 at 12:15:21PM -0400, Eric V. Smith wrote:
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not. I agree with Ned -- whether the string object is returned unchanged or a copy is an implementation decision, not a language decision.
[Eric]
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing. We don't make this guarantee about string identity for any other string method, and CPython's behaviour varies from method to method:
py> s = 'a b c' py> s is s.strip() True py> s is s.lower() False
and version to version:
py> s is s.replace('a', 'a') # 2.7 False py> s is s.replace('a', 'a') # 3.5 True
I've never seen anyone relying on this behaviour, and I don't expect these new methods will change that. Thinking that `is` is another way of writing `==`, yes, I see that frequently. But relying on object identity to see whether a new string was created by a method, no. Agreed. I think the PEP should just write the Python pseudo-code without copying, and it should mention that whether or not the original string is returned is an implementation detail. Then I think the actual documentation should just omit any discussion of it, like the existing docs for lstrip(). If you want to know whether a prefix/suffix was removed, there's a more reliable way than identity and a cheaper way than O(N) equality. Just compare the length of the string before and after. If the lengths are the same, nothing was removed.
That's a good point. This should probably go in the PEP, and maybe the documentation. Eric
On 21Mar2020 14:40, Eric V. Smith <eric@trueblade.com> wrote:
On 3/21/2020 2:09 PM, Steven D'Aprano wrote:
If you want to know whether a prefix/suffix was removed, there's a more reliable way than identity and a cheaper way than O(N) equality. Just compare the length of the string before and after. If the lengths are the same, nothing was removed.
That's a good point. This should probably go in the PEP, and maybe the documentation.
+1000 to this. - Cameron
On 21 Mar 2020, at 19:09, Steven D'Aprano wrote:
On Sat, Mar 21, 2020 at 12:15:21PM -0400, Eric V. Smith wrote:
On 3/21/2020 11:20 AM, Ned Batchelder wrote:
Why be so prescriptive? The semantics of these functions should be about what the resulting string contains. Leave it to implementors to decide when it is OK to return self or not.
I agree with Ned -- whether the string object is returned unchanged or a copy is an implementation decision, not a language decision.
[Eric]
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing.
We don't make this guarantee about string identity for any other string method, and CPython's behaviour varies from method to method:
py> s = 'a b c' py> s is s.strip() True py> s is s.lower() False
and version to version:
py> s is s.replace('a', 'a') # 2.7 False py> s is s.replace('a', 'a') # 3.5 True
And it is different for string subclasses, because the method always returns an instance of the baseclass:
class str2(str): ... pass ... isinstance(str2('a b c').strip(), str2) False isinstance(str2('a b c').strip(), str2) False
Servus, Walter
On 22Mar2020 05:09, Steven D'Aprano <steve@pearwood.info> wrote:
I agree with Ned -- whether the string object is returned unchanged or a copy is an implementation decision, not a language decision.
[Eric]
The only reason I can think of is to enable the test above: did a suffix/prefix removal take place? That seems like a useful thing.
We don't make this guarantee about string identity for any other string method, and CPython's behaviour varies from method to method:
py> s = 'a b c' py> s is s.strip() True py> s is s.lower() False
and version to version:
py> s is s.replace('a', 'a') # 2.7 False py> s is s.replace('a', 'a') # 3.5 True
I've never seen anyone relying on this behaviour, and I don't expect these new methods will change that. Thinking that `is` is another way of writing `==`, yes, I see that frequently. But relying on object identity to see whether a new string was created by a method, no.
Well, ok, expressed on this basis, colour me convinced. I'm not ok with not mandating that no change to the string returns an equal string (but, really, _only_ because i can do a test with len(), as I consider a test of content wildly excessive - potentially quite expensive - strings are not always short).
If you want to know whether a prefix/suffix was removed, there's a more reliable way than identity and a cheaper way than O(N) equality. Just compare the length of the string before and after. If the lengths are the same, nothing was removed.
Aye. Cheers, Cameron Simpson <cs@cskk.id.au>
Hi Dennis, Thanks for writing a proper PEP. It easier to review a specification than an implementation. Le ven. 20 mars 2020 à 20:00, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
Abstract ========
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
It would be nice to describe the behavior of these methods in a short sentence here.
In particular, the methods would be added to Unicode ``str`` objects, binary ``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
IMHO the abstract should stop here. You should move the above text in the Specification section. The abstract shouldn't go into details.
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is roughly equivalent to ``s[:-len(suf)] if suf and s.endswith(suf) else s``.
(...)
The builtin ``str`` class will gain two new methods with roughly the following behavior::
def cutprefix(self: str, pre: str, /) -> str: if self.startswith(pre): return self[len(pre):] return self[:]
I'm not sure that I'm comfortable with not specifying if the method must return the string unmodified or return a copy if it doesn't start with the prefix. It can subtle causes: see the "Allow multiple prefixes" example which expects that it doesn't return a copy. Usually, PyPy does its best to mimick exactly CPython behavior anyway, since applications rely on CPython exact behavior (even if it's bad thing). Hopefully, Python 3.8 started to emit a SyntaxWarning when "is" operator is used to compare an object to a string (like: x is "abc"). I suggest to always require to return the unmodified string. Honestly, it's not hard to guarantee and implement this behavior in Python! IMHO you should also test if pre is non-empty just to make the intent more explicit. Note: please rename "pre" to "prefix". In short, I propose: def cutprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix) and prefix: return self[len(prefix):] else: return self I call startswith() before testing if pre is non-empty to inherit of startswith() input type validation. For example, "a".startswith(b'x') raises a TypeError. I also suggest to avoid/remove the duplicated "rough specification" of the abstract: "s[len(pre):] if s.startswith(pre) else s". Only one specification per PEP is enough ;-)
The two methods will also be added to ``collections.UserString``, where they rely on the implementation of the new ``str`` methods.
I don't think that mentioning "where they rely on the implementation of the new ``str`` methods" is worth it. The spec can leave this part to the implementation.
Motivating examples from the Python standard library ====================================================
The examples below demonstrate how the proposed methods can make code one or more of the following: (...)
IMO there are too many examples. For example, refactor.py and c_annotations.py are more or less the same. Just keep refactor.py. Overall, 2 or 3 examples should be enough.
Allow multiple prefixes -----------------------
Some users discussed the desire to be able to remove multiple prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``. However, this adds ambiguity about the order in which the prefixes are removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``. After this proposal, this can be spelled explicitly as ``s.cutprefix('Foo').cutprefix('FooBar')``.
I like the ability to specify multiple prefixes or suffixes. If the order is an issue, only allow tuple and list types and you're done. I don't see how disallowing s.cutprefix(('Foo', 'FooBar')) but allowing s.cutprefix('Foo').cutprefix('FooBar') prevents any risk of mistake. I'm sure that there are many use cases for cutsuffix() accepting multiple suffixes. IMO it makes the method even more attractive and efficient. Example to remove newline suffix (Dos, Unix and macOS newlines): line.cutsuffix(("\r\n", "\n", "\r")). It's not ambitious: "\r\n" is tested first explicitly, then "\r".
Remove multiple copies of a prefix ----------------------------------
This is the behavior that would be consistent with the aforementioned expansion of the ``lstrip/rstrip`` API -- repeatedly applying the function until the argument is unchanged. This behavior is attainable from the proposed behavior via the following::
>>> s = 'foo' * 100 + 'bar' >>> while s != (s := s.cutprefix("foo")): pass >>> s 'bar'
Well, even if it's less efficient, I think that I would prefer to write: while s.endswith("\n"): s = s.cutsuffix("\n") ... especially because the specification doesn't (currently) require to return the string unmodified if it doesn't end with the suffix...
Raising an exception when not found -----------------------------------
There was a suggestion that ``s.cutprefix(pre)`` should raise an exception if ``not s.startswith(pre)``. However, this does not match with the behavior and feel of other string methods. There could be ``required=False`` keyword added, but this violates the KISS principle.
You may add that it makes cutprefix() and cutsuffix() methods consistent with the strip() functions family. "abc".strip() doesn't raise. startswith() and endswith() methods can be used to explicitly raise an exception if there is no match. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On 20/03/2020 22:21, Victor Stinner wrote:
Motivating examples from the Python standard library ====================================================
The examples below demonstrate how the proposed methods can make code one or more of the following: (...) IMO there are too many examples. For example, refactor.py and c_annotations.py are more or less the same. Just keep refactor.py.
Overall, 2 or 3 examples should be enough.
In which case adding something like `There were many other such examples in the stdlib.` would make the PEP more compelling. Rob Cliffe
Hi Victor. I accidentally created a new thread, but I intended everything below as a response: Thanks for the review!
In short, I propose: def cutprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix) and prefix: return self[len(prefix):] else: return self I call startswith() before testing if pre is non-empty to inherit of startswith() input type validation. For example, "a".startswith(b'x') raises a TypeError.
This still erroneously accepts tuples and and would return return str subclasses unchanged. If we want to make the Python be the spec with accuracy about type-checking, then perhaps we want: def cutprefix(self: str, prefix: str, /) -> str: if not isinstance(prefix, str): raise TypeError(f'cutprefix() argument must be str, ' f'not {type(prefix).__qualname__}') self = str(self) prefix = str(prefix) if self.startswith(prefix): return self[len(prefix):] else: return self For accepting multiple prefixes, I can't tell if there's a consensus about whether ``s = s.cutprefix("a", "b", "c")`` should be the same as for prefix in ["a", "b", "c"]: s = s.cutprefix(prefix) or for prefix in ["a", "b", "c"]: if s.startwith(prefix): s = s.cutprefix(prefix) break The latter seems to be harder for users to implement through other means, and it's the behavior that test_concurrent_futures.py has implemented now, so maybe that's what we want. Also, it seems more elegant to me to accept variadic arguments, rather than a single tuple of arguments. Is it worth it to match the related-but-not-the-same API of "startswith" if it makes for uglier Python? My gut reaction is to prefer the varargs, but maybe someone has a different perspective. I can submit a revision to the PEP with some changes soon.
Le dim. 22 mars 2020 à 01:45, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
For accepting multiple prefixes, I can't tell if there's a consensus about whether ``s = s.cutprefix("a", "b", "c")`` should be the same as
for prefix in ["a", "b", "c"]: s = s.cutprefix(prefix)
or
for prefix in ["a", "b", "c"]: if s.startwith(prefix): s = s.cutprefix(prefix) break
The latter seems to be harder for users to implement through other means, and it's the behavior that test_concurrent_futures.py has implemented now, so maybe that's what we want.
I expect that "FooBar".cutprefix(("Foo", "Bar")) returns "Bar". IMO it's consistent with "FooFoo".cutprefix("Foo") which only returns "Foo" and not "": https://www.python.org/dev/peps/pep-0616/#remove-multiple-copies-of-a-prefix If you want to remove both prefixes, "FooBar".cutprefix("Foo").cutprefix("Bar") should be called to get "".
Also, it seems more elegant to me to accept variadic arguments, rather than a single tuple of arguments. Is it worth it to match the related-but-not-the-same API of "startswith" if it makes for uglier Python? My gut reaction is to prefer the varargs, but maybe someone has a different perspective.
I suggest to accept a tuple of strings: str.cutprefix(("prefix1", "prefix2")) To be consistent with startswith(): str.startswith(("prefix1", "prefix2")) cutprefix() and startswith() can be used together and so I would prefer to have the same API: prefixes = ("context: ", "ctx:") has_prefix = False if line.startswith(prefixes): line = line.cutprefix(prefixes) has_prefix = True A different API would look more surprising, no? Compare it to: prefixes = ("context: ", "ctx:") has_prefix = False if line.startswith(prefixes): line = line.cutprefix(*prefixes) # <== HERE has_prefix = True The difference is even more visible is you pass directly the prefixes: .cutprefix("context: ", "ctx:") vs .cutprefix(("context: ", "ctx:")) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Fri, Mar 20, 2020 at 3:28 PM Victor Stinner <vstinner@python.org> wrote:
The builtin ``str`` class will gain two new methods with roughly the following behavior::
def cutprefix(self: str, pre: str, /) -> str: if self.startswith(pre): return self[len(pre):] return self[:]
I tend to be mistrustful of code that tries to guess the best thing to do, when something expected isn't found. How about: def cutprefix(self: str, pre: str, raise_on_no_match: bool=False, /) -> str: if self.startswith(pre): return self[len(pre):] if raise_on_no_match: raise ValueError('prefix not found') return self[:]
On 23/03/2020 14:50, Dan Stromberg wrote:
On Fri, Mar 20, 2020 at 3:28 PM Victor Stinner <vstinner@python.org> wrote:
The builtin ``str`` class will gain two new methods with roughly the following behavior::
def cutprefix(self: str, pre: str, /) -> str: if self.startswith(pre): return self[len(pre):] return self[:]
I tend to be mistrustful of code that tries to guess the best thing to do, when something expected isn't found.
How about:
def cutprefix(self: str, pre: str, raise_on_no_match: bool=False, /) -> str: if self.startswith(pre): return self[len(pre):] if raise_on_no_match: raise ValueError('prefix not found') return self[:]
I'm firmly of the opinion that the functions should either raise or not, and should definitely not have a parameter to switch behaviours. Probably it should do nothing; if the programmer needs to know that the prefix wasn't there, cutprefix() probably wasn't the right thing to use anyway. -- Rhodri James *-* Kynesim Ltd
On 3/23/2020 12:02 PM, Rhodri James wrote:
On 23/03/2020 14:50, Dan Stromberg wrote:
I tend to be mistrustful of code that tries to guess the best thing to do, when something expected isn't found.
How about:
def cutprefix(self: str, pre: str, raise_on_no_match: bool=False, /) -> str: if self.startswith(pre): return self[len(pre):] if raise_on_no_match: raise ValueError('prefix not found') return self[:]
I'm firmly of the opinion that the functions should either raise or not, and should definitely not have a parameter to switch behaviours. Probably it should do nothing; if the programmer needs to know that the prefix wasn't there, cutprefix() probably wasn't the right thing to use anyway.
Agreed, and I think we shouldn't raise. If raising is important, the user can write a trivial wrapper that raises if no substitution was done. Let's not over-complicate this. Eric
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith. -n -- Nathaniel J. Smith -- https://vorpus.org
On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor
Prefix and suffix aren't jargon. They teach those words to kids in primary school. Why the concern over "jargon"? We happily talk about exception, metaclass, thread, process, CPU, gigabyte, async, ethernet, socket, hexadecimal, iterator, class, instance, HTTP, boolean, etc without blinking, but you're shying at prefix and suffix? -- Steven
On Sun, Mar 22, 2020 at 5:41 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor
Prefix and suffix aren't jargon. They teach those words to kids in primary school.
Why the concern over "jargon"? We happily talk about exception, metaclass, thread, process, CPU, gigabyte, async, ethernet, socket, hexadecimal, iterator, class, instance, HTTP, boolean, etc without blinking, but you're shying at prefix and suffix?
As a general rule, jargon from your OWN domain is easier to justify than jargon from some OTHER domain. (Though in this case, I agree that "prefix" and "suffix" shouldn't be a problem.) ChrisA
Even then, it seems that prefix is an established computer science term: [1] https://en.wikipedia.org/wiki/Substring#Prefix [2] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). Chapter 15.4: Longest common subsequence And a quick search reveals that it's used hundreds of times in the docs: https://docs.python.org/3/search.html?q=prefix
On Sat, Mar 21, 2020 at 11:35 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor
Prefix and suffix aren't jargon. They teach those words to kids in primary school.
Whereas they don't have to teach "start" and "end", because kids already know them before they start school.
Why the concern over "jargon"? We happily talk about exception, metaclass, thread, process, CPU, gigabyte, async, ethernet, socket, hexadecimal, iterator, class, instance, HTTP, boolean, etc without blinking, but you're shying at prefix and suffix?
Yeah. Jargon is fine when there's no regular word with appropriate precision, but we shouldn't use jargon just for jargon's sake. Python has a long tradition of preferring regular words when possible, e.g. using not/and/or instead of !/&&/||, and startswith/endswith instead of hasprefix/hassuffix. -n -- Nathaniel J. Smith -- https://vorpus.org
On Sun, Mar 22, 2020 at 1:02 PM Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Mar 21, 2020 at 11:35 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Mar 20, 2020 at 06:18:20PM -0700, Nathaniel Smith wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor
Prefix and suffix aren't jargon. They teach those words to kids in primary school.
Whereas they don't have to teach "start" and "end", because kids already know them before they start school.
Why the concern over "jargon"? We happily talk about exception, metaclass, thread, process, CPU, gigabyte, async, ethernet, socket, hexadecimal, iterator, class, instance, HTTP, boolean, etc without blinking, but you're shying at prefix and suffix?
Yeah. Jargon is fine when there's no regular word with appropriate precision, but we shouldn't use jargon just for jargon's sake. Python has a long tradition of preferring regular words when possible, e.g. using not/and/or instead of !/&&/||, and startswith/endswith instead of hasprefix/hassuffix.
Given that the word "prefix" appears in help("".startswith), I don't think there's really a lot to be gained by arguing this point :) There's absolutely nothing wrong with the word. But Dennis, welcome to the wonderful world of change proposals, where you will experience insane amounts of pushback and debate on the finest points of bikeshedding, whether or not people actually even support the proposal at all... ChrisA
But Dennis, welcome to the wonderful world of change proposals, where you will experience insane amounts of pushback and debate on the finest points of bikeshedding, whether or not people actually even support the proposal at all...
Lol -- thanks! In my mind, another reason that I like including the words "prefix" and "suffix" over "start" and "end" is that, even though using the verb "end" in "endswith" is unambiguous, the noun "end" can be used as either the initial or final end, as in "remove this thing from both ends of the string. So "suffix" feels more precise to me.
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com> wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith.
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation) I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API. The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"? * uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes. We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string: s = s.replacestart('context.' , '') This approach would also very cleanly handle the last example from the PEP: s = s.replaceend(('Mixin', 'Tests', 'Test'), '') The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name. Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that): s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None Cheers, Nick.
Nick Coghlan wrote:
The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"?
* uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes.
FWIW, I don't place as much value on being consistent with "startswith()" and "endswith()". But with it being substring based, I think the term "replace" actually makes a lot more sense here compared to "cut". +1 On Sat, Mar 21, 2020 at 9:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com> wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith.
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation)
I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API.
The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"?
* uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes.
We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string:
s = s.replacestart('context.' , '')
This approach would also very cleanly handle the last example from the PEP:
s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name.
Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that):
s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None
Cheers, Nick.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VQULYFFT... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com> wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith.
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation)
To my language sense, hasprefix/hassuffix are horrible compared to startswith/endswith. If you were to talk about this kind of condition using English instead of Python, you wouldn't say "if x has prefix y", you'd say "if x starts with y". (I doubt any programming language uses hasPrefix or has_prefix for this, making it a strawman.) *But*, what would you say if you wanted to express the idea or removing something from the start or end? It's pretty verbose to say "remove y from the end of x", and it's not easy to translate that into a method name. x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which confuses the reader. The thing is that it's hard to translate "starts" (a verb) into a noun -- the "start" of something is its very beginning (i.e., in Python, position zero), while a "prefix" is a noun that specifically describes an initial substring (and I'm glad we don't have to use *that* :-).
I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API.
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)
The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"?
* uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes.
We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string:
s = s.replacestart('context.' , '')
This feels like a hypergeneralization. In 99.9% of use cases we just need to remove the prefix or suffix. If you want to replace the suffix with something else, you can probably use string concatenation. (In the one use case I can think of, changing "foo.c" into "foo.o", it would make sense that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o" actually works better there.
This approach would also very cleanly handle the last example from the PEP:
s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
Maybe the proposed functions can optionally take a tuple of prefixes/suffixes, like startswith/endswith do?
The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name.
Agreed on the second part, I just really don't like the 'ee'.
Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that):
s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None
Cheers, Nick.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 22.03.2020 6:38, Guido van Rossum wrote:
On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com <mailto:sweeney.dennis650@gmail.com>> wrote: > This is a proposal to add two new methods, ``cutprefix`` and > ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith.
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation)
To my language sense, hasprefix/hassuffix are horrible compared to startswith/endswith. If you were to talk about this kind of condition using English instead of Python, you wouldn't say "if x has prefix y", you'd say "if x starts with y". (I doubt any programming language uses hasPrefix or has_prefix for this, making it a strawman.)
*But*, what would you say if you wanted to express the idea or removing something from the start or end? It's pretty verbose to say "remove y from the end of x", and it's not easy to translate that into a method name. x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which confuses the reader.
The thing is that it's hard to translate "starts" (a verb) into a noun -- the "start" of something is its very beginning (i.e., in Python, position zero), while a "prefix" is a noun that specifically describes an initial substring (and I'm glad we don't have to use *that* :-).
I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API.
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"?
* uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes.
We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string:
s = s.replacestart('context.' , '')
This feels like a hypergeneralization. In 99.9% of use cases we just need to remove the prefix or suffix. If you want to replace the suffix with something else, you can probably use string concatenation. (In the one use case I can think of, changing "foo.c" into "foo.o", it would make sense that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o" actually works better there.
This approach would also very cleanly handle the last example from the PEP:
s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
Maybe the proposed functions can optionally take a tuple of prefixes/suffixes, like startswith/endswith do?
The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name.
Agreed on the second part, I just really don't like the 'ee'.
Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that):
s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None
Cheers, Nick.
-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>) /Pronouns: he/him //(why is my pronoun here?)/ <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Q33NGX3N... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
Ivan Pozdeez wrote:
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
In this case, being in line with the existing string API method names take priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines, etc. Although I agree that an underscore would probably be a bit easier to read here, it would be rather confusing to randomly swap between the naming convention for the same API. The benefit gained in *slightly *easier readability wouldn't make up for the headache IMO. On Sun, Mar 22, 2020 at 12:13 AM Ivan Pozdeev via Python-Dev < python-dev@python.org> wrote:
On 22.03.2020 6:38, Guido van Rossum wrote:
On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com> wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith.
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation)
To my language sense, hasprefix/hassuffix are horrible compared to startswith/endswith. If you were to talk about this kind of condition using English instead of Python, you wouldn't say "if x has prefix y", you'd say "if x starts with y". (I doubt any programming language uses hasPrefix or has_prefix for this, making it a strawman.)
*But*, what would you say if you wanted to express the idea or removing something from the start or end? It's pretty verbose to say "remove y from the end of x", and it's not easy to translate that into a method name. x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which confuses the reader.
The thing is that it's hard to translate "starts" (a verb) into a noun -- the "start" of something is its very beginning (i.e., in Python, position zero), while a "prefix" is a noun that specifically describes an initial substring (and I'm glad we don't have to use *that* :-).
I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API.
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"?
* uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes.
We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string:
s = s.replacestart('context.' , '')
This feels like a hypergeneralization. In 99.9% of use cases we just need to remove the prefix or suffix. If you want to replace the suffix with something else, you can probably use string concatenation. (In the one use case I can think of, changing "foo.c" into "foo.o", it would make sense that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o" actually works better there.
This approach would also very cleanly handle the last example from the PEP:
s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
Maybe the proposed functions can optionally take a tuple of prefixes/suffixes, like startswith/endswith do?
The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name.
Agreed on the second part, I just really don't like the 'ee'.
Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that):
s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None
Cheers, Nick.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.orghttps://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Q33NGX3N... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WW4V6S7B... Code of Conduct: http://python.org/psf/codeofconduct/
In this case, being in line with the existing string API method names take priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines, etc.
Oops, I just realized that I wrote "splitlines" twice there. I guess that goes to show how much I use that specific method in comparison to the others, but the point still stands. Here's a more comprehensive set of existing string methods to better demonstrate it (Python 3.8.2):
[m for m in dir(str) if not m.startswith('_')] ['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
On Sun, Mar 22, 2020 at 12:17 AM Kyle Stanley <aeros167@gmail.com> wrote:
Ivan Pozdeez wrote:
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
In this case, being in line with the existing string API method names take priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines, etc. Although I agree that an underscore would probably be a bit easier to read here, it would be rather confusing to randomly swap between the naming convention for the same API. The benefit gained in *slightly *easier readability wouldn't make up for the headache IMO.
On Sun, Mar 22, 2020 at 12:13 AM Ivan Pozdeev via Python-Dev < python-dev@python.org> wrote:
On 22.03.2020 6:38, Guido van Rossum wrote:
On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com> wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith.
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation)
To my language sense, hasprefix/hassuffix are horrible compared to startswith/endswith. If you were to talk about this kind of condition using English instead of Python, you wouldn't say "if x has prefix y", you'd say "if x starts with y". (I doubt any programming language uses hasPrefix or has_prefix for this, making it a strawman.)
*But*, what would you say if you wanted to express the idea or removing something from the start or end? It's pretty verbose to say "remove y from the end of x", and it's not easy to translate that into a method name. x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which confuses the reader.
The thing is that it's hard to translate "starts" (a verb) into a noun -- the "start" of something is its very beginning (i.e., in Python, position zero), while a "prefix" is a noun that specifically describes an initial substring (and I'm glad we don't have to use *that* :-).
I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API.
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"?
* uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes.
We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string:
s = s.replacestart('context.' , '')
This feels like a hypergeneralization. In 99.9% of use cases we just need to remove the prefix or suffix. If you want to replace the suffix with something else, you can probably use string concatenation. (In the one use case I can think of, changing "foo.c" into "foo.o", it would make sense that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o" actually works better there.
This approach would also very cleanly handle the last example from the PEP:
s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
Maybe the proposed functions can optionally take a tuple of prefixes/suffixes, like startswith/endswith do?
The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name.
Agreed on the second part, I just really don't like the 'ee'.
Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that):
s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None
Cheers, Nick.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.orghttps://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Q33NGX3N... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WW4V6S7B... Code of Conduct: http://python.org/psf/codeofconduct/
Nice PEP! That this discussion wound up in the NP-complete "naming things" territory as the main topic right from the start/prefix/beginning speaks highly of it. :) The only things left I have to add are (a) agreed on don't specify if it is a copy or not for str and bytes.. BUT (b) do specify that for bytearray. Being the only mutable type, it matters. Consistency with other bytearray methods based on https://docs.python.org/3/library/stdtypes.html#bytearray suggests copy. (Someone always wants inplace versions of bytearray methods, that is a separate topic not for this pep) Fwiw I *like* your cutprefix/suffix names. Avoiding the terms strip and trim is wise to avoid confusion and having the name read as nice English is Pythonic. I'm not going to vote on other suggestions. -gps On Sat, Mar 21, 2020, 9:32 PM Kyle Stanley <aeros167@gmail.com> wrote:
In this case, being in line with the existing string API method names take priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines, etc.
Oops, I just realized that I wrote "splitlines" twice there. I guess that goes to show how much I use that specific method in comparison to the others, but the point still stands. Here's a more comprehensive set of existing string methods to better demonstrate it (Python 3.8.2):
[m for m in dir(str) if not m.startswith('_')] ['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
On Sun, Mar 22, 2020 at 12:17 AM Kyle Stanley <aeros167@gmail.com> wrote:
Ivan Pozdeez wrote:
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
In this case, being in line with the existing string API method names take priority over PEP 8, e.g. splitlines, startswith, endswith, splitlines, etc. Although I agree that an underscore would probably be a bit easier to read here, it would be rather confusing to randomly swap between the naming convention for the same API. The benefit gained in *slightly *easier readability wouldn't make up for the headache IMO.
On Sun, Mar 22, 2020 at 12:13 AM Ivan Pozdeev via Python-Dev < python-dev@python.org> wrote:
On 22.03.2020 6:38, Guido van Rossum wrote:
On Sat, Mar 21, 2020 at 6:46 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat., 21 Mar. 2020, 11:19 am Nathaniel Smith, <njs@pobox.com> wrote:
On Fri, Mar 20, 2020 at 11:54 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects.
The names should use "start" and "end" instead of "prefix" and "suffix", to reduce the jargon factor and for consistency with startswith/endswith.
This would also be more consistent with startswith() & endswith(). (For folks querying this: the relevant domain here is "str builtin method names", and we already use startswith/endswith there, not hasprefix/hassuffix. The most challenging relevant audience for new str builtin method *names* is also 10 year olds learning to program in school, not adults reading the documentation)
To my language sense, hasprefix/hassuffix are horrible compared to startswith/endswith. If you were to talk about this kind of condition using English instead of Python, you wouldn't say "if x has prefix y", you'd say "if x starts with y". (I doubt any programming language uses hasPrefix or has_prefix for this, making it a strawman.)
*But*, what would you say if you wanted to express the idea or removing something from the start or end? It's pretty verbose to say "remove y from the end of x", and it's not easy to translate that into a method name. x.removefromend(y)? Blech! And x.removeend(y) has the double 'e', which confuses the reader.
The thing is that it's hard to translate "starts" (a verb) into a noun -- the "start" of something is its very beginning (i.e., in Python, position zero), while a "prefix" is a noun that specifically describes an initial substring (and I'm glad we don't have to use *that* :-).
I think the concern about stripstart() & stripend() working with substrings, while strip/lstrip/rstrip work with character sets, is valid, but I also share the concern about introducing "cut" as yet another verb to learn in the already wide string API.
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)
I must note that names conforming to https://www.python.org/dev/peps/pep-0008/#function-and-variable-names would be "strip_prefix" and "strip_suffix".
The example where the new function was used instead of a questionable use of replace gave me an idea, though: what if the new functions were "replacestart()" and "replaceend()"?
* uses "start" and "with" for consistency with the existing checks * substring based, like the "replace" method * can be combined with an extension of "replace()" to also accept a tuple of old values to match and replace to allow for consistency with checking for multiple prefixes or suffixes.
We'd expect the most common case to be the empty string, but I think the meaning of the following is clear, and consistent with the current practice of using replace() to delete text from anywhere within the string:
s = s.replacestart('context.' , '')
This feels like a hypergeneralization. In 99.9% of use cases we just need to remove the prefix or suffix. If you want to replace the suffix with something else, you can probably use string concatenation. (In the one use case I can think of, changing "foo.c" into "foo.o", it would make sense that plain "foo" ended up becoming "foo.o", so s.stripsuffix(".c") + ".o" actually works better there.
This approach would also very cleanly handle the last example from the PEP:
s = s.replaceend(('Mixin', 'Tests', 'Test'), '')
Maybe the proposed functions can optionally take a tuple of prefixes/suffixes, like startswith/endswith do?
The doubled 'e' in 'replaceend' isn't ideal, but if we went this way, I think keeping consistency with other str method names would be preferable to adding an underscore to the name.
Agreed on the second part, I just really don't like the 'ee'.
Interestingly, you could also use this to match multiple prefixes or suffixes and find out *which one* matched (since the existing methods don't report that):
s2 = s.replaceend(suffixes, '') suffix_len = len(s) - len(s2) suffix = s[-suffix-len:] if suffix_len else None
Cheers, Nick.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.orghttps://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Q33NGX3N... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WW4V6S7B... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/O44UVEJC... Code of Conduct: http://python.org/psf/codeofconduct/
Le dim. 22 mars 2020 à 06:07, Gregory P. Smith <greg@krypto.org> a écrit :
Nice PEP! That this discussion wound up in the NP-complete "naming things" territory as the main topic right from the start/prefix/beginning speaks highly of it. :)
Maybe we should have a rule to disallow bikeshedding until the foundations of a PEP are settled. Or always create two threads per PEP: one for bikeshedding only, one for otherthing else :-D Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Sat, Mar 21, 2020 at 8:38 PM Guido van Rossum <guido@python.org> wrote:
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. [explanation snipped]
Thinking a bit more, I could also get behind "removeprefix" and "removesuffix". -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
I like "removeprefix" and "removesuffix". My only concern before had been length, but three more characters than "cut***fix" is a small price to pay for clarity.
On Sun, Mar 22, 2020 at 05:00:10AM -0000, Dennis Sweeney wrote:
I like "removeprefix" and "removesuffix". My only concern before had been length, but three more characters than "cut***fix" is a small price to pay for clarity.
I personally rely on auto-complete of my editor while writing. So, thinking about these these methods in "correct" terms might be more important to me that the length. +1 for removeprefix and removesuffix. Thanks, Senthil
On 2020-03-21 20:38, Guido van Rossum wrote:
It's not great, and I actually think that "stripprefix" and "stripsuffix" are reasonable. (I found that in Go, everything we call "strip" is called "Trim", and there are "TrimPrefix" and "TrimSuffix" functions that correspond to the PEP 616 functions.)
To jump on the bikeshed, trimprefix and trimsuffix are the best I've read so far, due to the definitions of the words in English. Though often used interchangeably, when I think of "strip" I think of removing multiple things, somewhat indiscriminately with an arm motion, which is how the functions currently work. e.g. "strip paint", "strip clothes": https://www.dictionary.com/browse/strip to take away or remove When I think of trim, I think more of a single cut of higher precision with scissors. e.g. "trim hair", "trim branches": https://www.dictionary.com/browse/trim to put into a neat or orderly condition by clipping… Which is what this method would do. That trim matches Go is a small but decent benefit. Another person warned against inconsistency with PHP, but don't think PHP should be considered for design guidance, IMHO. Perhaps as an example of what not to do, which happily is in agreement with the above. -Mike p.s. +1, I do support this PEP, with or without name change, since some mentioned concern over that.
Is there a proven use case for anything other than the empty string as the replacement? I prefer your "replacewhatever" to another "stripwhatever" name, and I think it's clear and nicely fits the behavior you proposed. But should we allow a naming convenience to dictate that the behavior should be generalized to a use case we're not sure exists, where the same same argument is passed 99% of the time? I think a downside would be that a pass-a-string-or-a-tuple-of-strings interface would be more mental effort to keep track of than a ``*args`` variadic interface for "(cut/remove/without/trim)prefix", even if the former is how ``startswith()`` works.
On Sun, 22 Mar 2020 at 14:01, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
Is there a proven use case for anything other than the empty string as the replacement? I prefer your "replacewhatever" to another "stripwhatever" name, and I think it's clear and nicely fits the behavior you proposed. But should we allow a naming convenience to dictate that the behavior should be generalized to a use case we're not sure exists, where the same same argument is passed 99% of the time?
I think so, as if we don't, then we'd end up with the following three methods on str objects (using Guido's suggested names of "removeprefix" and "removesuffix", as I genuinely like those): * replace() * removeprefix() * removesuffix() And the following questions still end up with relatively non-obvious answers: Q: How do I do a replace, but only at the start or end of the string? A: Use "new_prefix + s.removeprefix(old_prefix)" or "s.removesuffix(old_suffix) + new_suffix" Q: How do I remove a substring from anywhere in a string, rather than just from the start or end? A: Use "s.replace(substr, '')" Most of that objection would go away if the PEP added a plain old "remove()" method in addition to removeprefix() and removesuffix(), though - the "replace the substring with an empty string" trick isn't the most obvious spelling in the world, whereas I'd expect a lot folks to reach for "s.remove(substr)" based on the regular sequence API, and I think Guido's right that in many cases where a prefix or suffix is being changed, you also want to add it if the old prefix/suffix is missing (and in the cases where you don't then, then you can either use startswith()/endswith() first, or else check for a length change.
I think a downside would be that a pass-a-string-or-a-tuple-of-strings interface would be more mental effort to keep track of than a ``*args`` variadic interface for "(cut/remove/without/trim)prefix", even if the former is how ``startswith()`` works.
I doubt we'd use *args for any new string methods, precisely because we don't use it for any of the existing ones. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*". And if no consensus is reached in this thread for a name I would assume the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
On 24Mar2020 1849, Brett Cannon wrote:
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*".
And if no consensus is reached in this thread for a name I would assume the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
-1 on "cut*" (feels too much like what .partition() does) -0 on "trim*" (this is the name used in .NET instead of "strip", so I foresee new confusion) +1 on "remove*" (because this is exactly what it does) Cheers, Steve
On 3/24/2020 3:30 PM, Steve Dower wrote:
On 24Mar2020 1849, Brett Cannon wrote:
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*".
And if no consensus is reached in this thread for a name I would assume the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
-1 on "cut*" (feels too much like what .partition() does) -0 on "trim*" (this is the name used in .NET instead of "strip", so I foresee new confusion) +1 on "remove*" (because this is exactly what it does)
I actually prefer "without*" because it seems more descriptive, but I don't expect it to get any traction. So "remove" would get my +1. Eric
On 03/24/2020 01:37 PM, Eric V. Smith wrote:
On 3/24/2020 3:30 PM, Steve Dower wrote:
On 24Mar2020 1849, Brett Cannon wrote:
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*".
And if no consensus is reached in this thread for a name I would assume the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
-1 on "cut*" (feels too much like what .partition() does) -0 on "trim*" (this is the name used in .NET instead of "strip", so I foresee new confusion) +1 on "remove*" (because this is exactly what it does)
I actually prefer "without*" because it seems more descriptive, but I don't expect it to get any traction.
So "remove" would get my +1.
I still think "strip" is the most optimal, as strip, stripprefix, and stripsuffix would all be together -- but if that's not going to happen, "remove" is good. +2 on "strip" ;-) +1 on "remove" -- ~Ethan~
On Tue, Mar 24, 2020 at 2:53 PM Ethan Furman <ethan@stoneleaf.us> wrote:
On 3/24/2020 3:30 PM, Steve Dower wrote:
On 24Mar2020 1849, Brett Cannon wrote:
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with
On 03/24/2020 01:37 PM, Eric V. Smith wrote: preexisting methods.
+1 on "remove*" for the same reasons as "trim*".
And if no consensus is reached in this thread for a name I would assume the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
-1 on "cut*" (feels too much like what .partition() does) -0 on "trim*" (this is the name used in .NET instead of "strip", so I foresee new confusion) +1 on "remove*" (because this is exactly what it does)
I think name choice is easier if you write the documentation first: cutprefix - Removes the specified prefix. trimprefix - Removes the specified prefix. stripprefix - Removes the specified prefix. removeprefix - Removes the specified prefix. Duh. :)
-1 on "cut*" (feels too much like what .partition() does) -0 on "trim*" (this is the name used in .NET instead of "strip", so I foresee new confusion) +1 on "remove*" (because this is exactly what it does)
I'm also most strongly in favor of "remove*" (out of the above options). I'm opposed to cut*, mainly because it's too ambiguous in comparison to other options such as "remove*" and "replace*", which would do a much better job of explaining the operation performed. Without the .NET conflict, I would normally be +1 on "trim*" as well; with it in mind though, I'd lower it down to +0. Personally, I don't consider a conflict in a different ecosystem enough to lower it down to -0, but it still has some influence on my preference. So far, the consensus seems to be in favor of "remove*" with several +1s and no arguments against it (as far as I can tell), whereas the other options have been rather controversial. On Tue, Mar 24, 2020 at 3:38 PM Steve Dower <steve.dower@python.org> wrote:
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with
+1 on "remove*" for the same reasons as "trim*".
And if no consensus is reached in this thread for a name I would assume
On 24Mar2020 1849, Brett Cannon wrote: preexisting methods. the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
-1 on "cut*" (feels too much like what .partition() does) -0 on "trim*" (this is the name used in .NET instead of "strip", so I foresee new confusion) +1 on "remove*" (because this is exactly what it does)
Cheers, Steve _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KVU75BNX... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Mar 24, 2020 at 11:55 AM Brett Cannon <brett@python.org> wrote:
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*".
And if no consensus is reached in this thread for a name I would assume the SC is going to ultimately decide on the name if the PEP is accepted as the burden of being known as "the person who chose _those_ method names on str" is more than any one person should have bear. ;)
"raymondLuxuryYacht*" pronounced Throatwobbler Mangrove it is! Never fear, the entire stdlib is full of naming inconsistencies and questionable choices accumulated over time. Whatever is chosen will be lost in the noise and people will happily use it. The original PEP mentioned that trim had a different use in PHP which is why I suggest avoiding that one. I don't know how much crossover there actually is between PHP and Python programmers these days outside of FB. -gps * https://montypython.fandom.com/wiki/Raymond_Luxury-Yacht _______________________________________________
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Z7TK4C5P... Code of Conduct: http://python.org/psf/codeofconduct/
On 24Mar2020 18:49, Brett Cannon <brett@python.org> wrote:
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*".
I reiterate my huge -1 on "trim" because it will confuse every PHP user who comes to us from the dark side. Over there "trim" means what our "strip" means. I've got (differing) opinions about the others, but "trim" is a big one to me. Cheers, Cameron Simpson <cs@cskk.id.au>
Am 26.03.20 um 06:28 schrieb Cameron Simpson:
On 24Mar2020 18:49, Brett Cannon <brett@python.org> wrote:
-1 on "cut*" because my brain keeps reading it as "cute". +1 on "trim*" as it is clear what's going on and no confusion with preexisting methods. +1 on "remove*" for the same reasons as "trim*".
I reiterate my huge -1 on "trim" because it will confuse every PHP user who comes to us from the dark side. Over there "trim" means what our "strip" means.
I've got (differing) opinions about the others, but "trim" is a big one to me.
As a full stack developer with terrible memory, I agree. JavaScript also uses trim() like Python's strip(), and this would quickly get confusing. - Sebastian
On 20.03.2020 21:52, Dennis Sweeney wrote:
Browser Link: https://www.python.org/dev/peps/pep-0616/
PEP: 616 Title: String methods to remove prefixes and suffixes Author: Dennis Sweeney <sweeney.dennis650@gmail.com> Sponsor: Eric V. Smith <eric@trueblade.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-Mar-2020 Python-Version: 3.9 Post-History: 30-Aug-2002
Abstract ========
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects. In particular, the methods would be added to Unicode ``str`` objects, binary ``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
Does it need to be separate methods? Can we augment or even replace *strip() instead? E.g. *strip(chars: str, line: str) -> str As written in the PEP preface, the very reason for the PEP is that people are continuously trying to use *strip methods for the suggested functionality -- which shows that this is where they are expecting to find it. (as a bonus, we'll be saved from bikeshedding debates over the names) --- Then, https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSU... suggests that the use of strip with character set argument may have fallen out of favor since its adoption. If that's the case, it can be deprecated in favor of the new use, thus saving us from extra complexity in perspective.
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is roughly equivalent to ``s[:-len(suf)] if suf and s.endswith(suf) else s``.
Rationale =========
There have been repeated issues [#confusion]_ on the Bug Tracker and StackOverflow related to user confusion about the existing ``str.lstrip`` and ``str.rstrip`` methods. These users are typically expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they are surprised that the parameter for ``lstrip`` is interpreted as a set of characters, not a substring. This repeated issue is evidence that these methods are useful, and the new methods allow a cleaner redirection of users to the desired behavior.
As another testimonial for the usefulness of these methods, several users on Python-Ideas [#pyid]_ reported frequently including similar functions in their own code for productivity. The implementation often contained subtle mistakes regarding the handling of the empty string (see `Specification`_).
Specification =============
The builtin ``str`` class will gain two new methods with roughly the following behavior::
def cutprefix(self: str, pre: str, /) -> str: if self.startswith(pre): return self[len(pre):] return self[:]
def cutsuffix(self: str, suf: str, /) -> str: if suf and self.endswith(suf): return self[:-len(suf)] return self[:]
The only difference between the real implementation and the above is that, as with other string methods like ``replace``, the methods will raise a ``TypeError`` if any of ``self``, ``pre`` or ``suf`` is not an instace of ``str``, and will cast subclasses of ``str`` to builtin ``str`` objects.
Note that without the check for the truthyness of ``suf``, ``s.cutsuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``.
Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()`` will accept any bytes-like object as an argument.
Note that the ``bytearray`` methods return a copy of ``self``; they do not operate in place.
The following behavior is considered a CPython implementation detail, but is not guaranteed by this specification::
>>> x = 'foobar' * 10**6 >>> x.cutprefix('baz') is x is x.cutsuffix('baz') True >>> x.cutprefix('') is x is x.cutsuffix('') True
That is, for CPython's immutable ``str`` and ``bytes`` objects, the methods return the original object when the affix is not found or if the affix is empty. Because these types test for equality using shortcuts for identity and length, the following equivalent expressions are evaluated at approximately the same speed, for any ``str`` objects (or ``bytes`` objects) ``x`` and ``y``::
>>> (True, x[len(y):]) if x.startswith(y) else (False, x) >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x)
The two methods will also be added to ``collections.UserString``, where they rely on the implementation of the new ``str`` methods.
Motivating examples from the Python standard library ====================================================
The examples below demonstrate how the proposed methods can make code one or more of the following:
Less fragile: The code will not depend on the user to count the length of a literal. More performant: The code does not require a call to the Python built-in ``len`` function. More descriptive: The methods give a higher-level API for code readability, as opposed to the traditional method of string slicing.
refactor.py -----------
- Current::
if fix_name.startswith(self.FILE_PREFIX): fix_name = fix_name[len(self.FILE_PREFIX):]
- Improved::
fix_name = fix_name.cutprefix(self.FILE_PREFIX)
c_annotations.py: -----------------
- Current::
if name.startswith("c."): name = name[2:]
- Improved::
name = name.cutprefix("c.")
find_recursionlimit.py ----------------------
- Current::
if test_func_name.startswith("test_"): print(test_func_name[5:]) else: print(test_func_name)
- Improved::
print(test_finc_name.cutprefix("test_"))
deccheck.py -----------
This is an interesting case because the author chose to use the ``str.replace`` method in a situation where only a prefix was intended to be removed.
- Current::
if funcname.startswith("context."): self.funcname = funcname.replace("context.", "") self.contextfunc = True else: self.funcname = funcname self.contextfunc = False
- Improved::
if funcname.startswith("context."): self.funcname = funcname.cutprefix("context.") self.contextfunc = True else: self.funcname = funcname self.contextfunc = False
- Arguably further improved::
self.contextfunc = funcname.startswith("context.") self.funcname = funcname.cutprefix("context.")
test_i18n.py ------------
- Current::
if test_func_name.startswith("test_"): print(test_func_name[5:]) else: print(test_func_name)
- Improved::
print(test_finc_name.cutprefix("test_"))
- Current::
if creationDate.endswith('\\n'): creationDate = creationDate[:-len('\\n')]
- Improved::
creationDate = creationDate.cutsuffix('\\n')
shared_memory.py ----------------
- Current::
reported_name = self._name if _USE_POSIX and self._prepend_leading_slash: if self._name.startswith("/"): reported_name = self._name[1:] return reported_name
- Improved::
if _USE_POSIX and self._prepend_leading_slash: return self._name.cutprefix("/") return self._name
build-installer.py ------------------
- Current::
if archiveName.endswith('.tar.gz'): retval = os.path.basename(archiveName[:-7]) if ((retval.startswith('tcl') or retval.startswith('tk')) and retval.endswith('-src')): retval = retval[:-4]
- Improved::
if archiveName.endswith('.tar.gz'): retval = os.path.basename(archiveName[:-7]) if retval.startswith(('tcl', 'tk')): retval = retval.cutsuffix('-src')
Depending on personal style, ``archiveName[:-7]`` could also be changed to ``archiveName.cutsuffix('.tar.gz')``.
test_core.py ------------
- Current::
if output.endswith("\n"): output = output[:-1]
- Improved::
output = output.cutsuffix("\n")
cookiejar.py ------------
- Current::
def strip_quotes(text): if text.startswith('"'): text = text[1:] if text.endswith('"'): text = text[:-1] return text
- Improved::
def strip_quotes(text): return text.cutprefix('"').cutsuffix('"')
- Current::
if line.endswith("\n"): line = line[:-1]
- Improved::
line = line.cutsuffix("\n")
fixdiv.py ---------
- Current::
def chop(line): if line.endswith("\n"): return line[:-1] else: return line
- Improved::
def chop(line): return line.cutsuffix("\n")
test_concurrent_futures.py --------------------------
In the following example, the meaning of the code changes slightly, but in context, it behaves the same.
- Current::
if name.endswith(('Mixin', 'Tests')): return name[:-5] elif name.endswith('Test'): return name[:-4] else: return name
- Improved::
return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test')
msvc9compiler.py ----------------
- Current::
if value.endswith(os.pathsep): value = value[:-1]
- Improved::
value = value.cutsuffix(os.pathsep)
test_pathlib.py ---------------
- Current::
self.assertTrue(r.startswith(clsname + '('), r) self.assertTrue(r.endswith(')'), r) inner = r[len(clsname) + 1 : -1]
- Improved::
self.assertTrue(r.startswith(clsname + '('), r) self.assertTrue(r.endswith(')'), r) inner = r.cutprefix(clsname + '(').cutsuffix(')')
Rejected Ideas ==============
Expand the lstrip and rstrip APIs ---------------------------------
Because ``lstrip`` takes a string as its argument, it could be viewed as taking an iterable of length-1 strings. The API could therefore be generalized to accept any iterable of strings, which would be successively removed as prefixes. While this behavior would be consistent, it would not be obvious for users to have to call ``'foobar'.cutprefix(('foo,))`` for the common use case of a single prefix.
Allow multiple prefixes -----------------------
Some users discussed the desire to be able to remove multiple prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``. However, this adds ambiguity about the order in which the prefixes are removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``. After this proposal, this can be spelled explicitly as ``s.cutprefix('Foo').cutprefix('FooBar')``.
Remove multiple copies of a prefix ----------------------------------
This is the behavior that would be consistent with the aforementioned expansion of the ``lstrip/rstrip`` API -- repeatedly applying the function until the argument is unchanged. This behavior is attainable from the proposed behavior via the following::
>>> s = 'foo' * 100 + 'bar' >>> while s != (s := s.cutprefix("foo")): pass >>> s 'bar'
The above can be modififed by chaining multiple ``cutprefix`` calls together to achieve the full behavior of the ``lstrip``/``rstrip`` generalization, while being explicit in the order of removal.
While the proposed API could later be extended to include some of these use cases, to do so before any observation of how these methods are used in practice would be premature and may lead to choosing the wrong behavior.
Raising an exception when not found -----------------------------------
There was a suggestion that ``s.cutprefix(pre)`` should raise an exception if ``not s.startswith(pre)``. However, this does not match with the behavior and feel of other string methods. There could be ``required=False`` keyword added, but this violates the KISS principle.
Alternative Method Names ------------------------
Several alternatives method names have been proposed. Some are listed below, along with commentary for why they should be rejected in favor of ``cutprefix`` (the same arguments hold for ``cutsuffix``)
``ltrim`` "Trim" does in other languages (e.g. JavaScript, Java, Go, PHP) what ``strip`` methods do in Python. ``lstrip(string=...)`` This would avoid adding a new method, but for different behavior, it's better to have two different methods than one method with a keyword argument that select the behavior. ``cut_prefix`` All of the other methods of the string API, e.g. ``str.startswith()``, use ``lowercase`` rather than ``lower_case_with_underscores``. ``cutleft``, ``leftcut``, or ``lcut`` The explicitness of "prefix" is preferred. ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc. All of these might have been acceptable, but they have more characters than ``cut``. Some suggested that the verb "cut" implies mutability, but the string API already contains verbs like "replace", "strip", "split", and "swapcase". ``stripprefix`` Users may benefit from the mnemonic that "strip" means working with sets of characters, while other methods work with substrings, so re-using "strip" here should be avoided.
Reference Implementation ========================
See the pull request on GitHub [#pr]_.
References ==========
.. [#pr] GitHub pull request with implementation (https://github.com/python/cpython/pull/18939) .. [#pyid] Discussion on Python-Ideas (https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSU...) .. [#confusion] Comment listing Bug Tracker and StackOverflow issues (https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFI...)
Copyright =========
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WFEWPAOV... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
On Sun, Mar 22, 2020 at 06:57:52AM +0300, Ivan Pozdeev via Python-Dev wrote:
Does it need to be separate methods?
Yes. Overloading a single method to do two dissimilar things is poor design.
As written in the PEP preface, the very reason for the PEP is that people are continuously trying to use *strip methods for the suggested functionality -- which shows that this is where they are expecting to find it.
They are only expecting to find it in strip() because there is no other alternative where it could be. There's nothing inherent about strip that means to delete a prefix or suffix, but when the only other choices are such obviously wrong methods as upper(), find(), replace(), count() etc it is easy to jump to the wrong conclusion that strip does what is wanted. -- Steven
On 22.03.2020 7:46, Steven D'Aprano wrote:
On Sun, Mar 22, 2020 at 06:57:52AM +0300, Ivan Pozdeev via Python-Dev wrote:
Does it need to be separate methods? Yes.
Overloading a single method to do two dissimilar things is poor design.
They are similar. We're removing stuff from an edge in both cases. The only difference is whether input is treated as a character set or as a raw substring.
As written in the PEP preface, the very reason for the PEP is that people are continuously trying to use *strip methods for the suggested functionality -- which shows that this is where they are expecting to find it. They are only expecting to find it in strip() because there is no other alternative where it could be. There's nothing inherent about strip that means to delete a prefix or suffix, but when the only other choices are such obviously wrong methods as upper(), find(), replace(), count() etc it is easy to jump to the wrong conclusion that strip does what is wanted.
-- Regards, Ivan
Ivan Pozdeev via Python-Dev writes:
On 22.03.2020 7:46, Steven D'Aprano wrote:
On Sun, Mar 22, 2020 at 06:57:52AM +0300, Ivan Pozdeev via Python-Dev wrote:
Does it need to be separate methods? Yes.
Overloading a single method to do two dissimilar things is poor design.
They are similar. We're removing stuff from an edge in both cases. The only difference is whether input is treated as a character set or as a raw substring.
That is true. However, the rule of thumb (due to Guido, IIRC) is if the parameter is normally going to be a literal constant, and there are few such constants (like <= 3), put them in the name of the function rather than as values for an optional parameter. Overloading doesn't save much, if any, typing in this case. That's why we have strip, rstrip, and lstrip in the first place, although nowadays we'd likely spell the modifiers out (and maybe use start/end rather than left/right, which I would guess force BIDI users to translate to start/end on the fly). Steve
On 22Mar2020 08:10, Ivan Pozdeev <vano@mail.mipt.ru> wrote:
On 22.03.2020 7:46, Steven D'Aprano wrote:
On Sun, Mar 22, 2020 at 06:57:52AM +0300, Ivan Pozdeev via Python-Dev wrote:
Does it need to be separate methods? Yes.
Overloading a single method to do two dissimilar things is poor design.
They are similar. We're removing stuff from an edge in both cases. The only difference is whether input is treated as a character set or as a raw substring.
That is not the only difference. strip() does not just remove a character from the set provided (as a str). It removes as many of them as there are; that is why "foo.ext".strip(".ext") can actually be quite misleading to someone looking for a suffix remover - it often looks like it did the right thing. By contrast, cutprefix/cutsuffix (or stripsuffix, whatever) remove only _one_ instance of the affix. To my mind they are quite different, which is the basis of my personal dislike of reusing the word "strip". Just extending "strip()" with a funky new affix mode would be even worse, since it can _still_ be misleading if the caller omited the special mode. Cheers, Cameron Simpson <cs@cskk.id.au>
My 2c on the naming: 'start' and 'end' in 'startswith' and 'endswith' are verbs, whereas we're looking for a noun if we want to cut/strip/trim a string. You can use 'start' and 'end' as nouns for this case but 'prefix' and 'suffix' seems a more obvious choice in English to me. Pathlib has `with_suffix()` and `with_name()`, which would give us something like `without_prefix()` or `without_suffix()` in this case. I think the name "strip", and the default (no-argument) behaviour of stripping whitespace implies that the method is used to strip something down to its bare essentials, like stripping a bed of its covers. Usually you use strip() to remove whitespace and get to the real important data. I don't think such an implication holds for removing a *specific* prefix/suffix. I also don't much like "strip" as the semantics are quite different - if i'm understanding correctly, we're removing a *single* instance of a *single* *multi-character* string. A verb like "trim" or "cut" seems appropriate to highlight that difference. Barney On Fri, 20 Mar 2020 at 18:59, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
Browser Link: https://www.python.org/dev/peps/pep-0616/
PEP: 616 Title: String methods to remove prefixes and suffixes Author: Dennis Sweeney <sweeney.dennis650@gmail.com> Sponsor: Eric V. Smith <eric@trueblade.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-Mar-2020 Python-Version: 3.9 Post-History: 30-Aug-2002
Abstract ========
This is a proposal to add two new methods, ``cutprefix`` and ``cutsuffix``, to the APIs of Python's various string objects. In particular, the methods would be added to Unicode ``str`` objects, binary ``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has been removed. If ``s`` does not have ``pre`` as a prefix, an unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is roughly equivalent to ``s[:-len(suf)] if suf and s.endswith(suf) else s``.
Rationale =========
There have been repeated issues [#confusion]_ on the Bug Tracker and StackOverflow related to user confusion about the existing ``str.lstrip`` and ``str.rstrip`` methods. These users are typically expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they are surprised that the parameter for ``lstrip`` is interpreted as a set of characters, not a substring. This repeated issue is evidence that these methods are useful, and the new methods allow a cleaner redirection of users to the desired behavior.
As another testimonial for the usefulness of these methods, several users on Python-Ideas [#pyid]_ reported frequently including similar functions in their own code for productivity. The implementation often contained subtle mistakes regarding the handling of the empty string (see `Specification`_).
Specification =============
The builtin ``str`` class will gain two new methods with roughly the following behavior::
def cutprefix(self: str, pre: str, /) -> str: if self.startswith(pre): return self[len(pre):] return self[:]
def cutsuffix(self: str, suf: str, /) -> str: if suf and self.endswith(suf): return self[:-len(suf)] return self[:]
The only difference between the real implementation and the above is that, as with other string methods like ``replace``, the methods will raise a ``TypeError`` if any of ``self``, ``pre`` or ``suf`` is not an instace of ``str``, and will cast subclasses of ``str`` to builtin ``str`` objects.
Note that without the check for the truthyness of ``suf``, ``s.cutsuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``.
Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()`` will accept any bytes-like object as an argument.
Note that the ``bytearray`` methods return a copy of ``self``; they do not operate in place.
The following behavior is considered a CPython implementation detail, but is not guaranteed by this specification::
>>> x = 'foobar' * 10**6 >>> x.cutprefix('baz') is x is x.cutsuffix('baz') True >>> x.cutprefix('') is x is x.cutsuffix('') True
That is, for CPython's immutable ``str`` and ``bytes`` objects, the methods return the original object when the affix is not found or if the affix is empty. Because these types test for equality using shortcuts for identity and length, the following equivalent expressions are evaluated at approximately the same speed, for any ``str`` objects (or ``bytes`` objects) ``x`` and ``y``::
>>> (True, x[len(y):]) if x.startswith(y) else (False, x) >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x)
The two methods will also be added to ``collections.UserString``, where they rely on the implementation of the new ``str`` methods.
Motivating examples from the Python standard library ====================================================
The examples below demonstrate how the proposed methods can make code one or more of the following:
Less fragile: The code will not depend on the user to count the length of a literal. More performant: The code does not require a call to the Python built-in ``len`` function. More descriptive: The methods give a higher-level API for code readability, as opposed to the traditional method of string slicing.
refactor.py -----------
- Current::
if fix_name.startswith(self.FILE_PREFIX): fix_name = fix_name[len(self.FILE_PREFIX):]
- Improved::
fix_name = fix_name.cutprefix(self.FILE_PREFIX)
c_annotations.py: -----------------
- Current::
if name.startswith("c."): name = name[2:]
- Improved::
name = name.cutprefix("c.")
find_recursionlimit.py ----------------------
- Current::
if test_func_name.startswith("test_"): print(test_func_name[5:]) else: print(test_func_name)
- Improved::
print(test_finc_name.cutprefix("test_"))
deccheck.py -----------
This is an interesting case because the author chose to use the ``str.replace`` method in a situation where only a prefix was intended to be removed.
- Current::
if funcname.startswith("context."): self.funcname = funcname.replace("context.", "") self.contextfunc = True else: self.funcname = funcname self.contextfunc = False
- Improved::
if funcname.startswith("context."): self.funcname = funcname.cutprefix("context.") self.contextfunc = True else: self.funcname = funcname self.contextfunc = False
- Arguably further improved::
self.contextfunc = funcname.startswith("context.") self.funcname = funcname.cutprefix("context.")
test_i18n.py ------------
- Current::
if test_func_name.startswith("test_"): print(test_func_name[5:]) else: print(test_func_name)
- Improved::
print(test_finc_name.cutprefix("test_"))
- Current::
if creationDate.endswith('\\n'): creationDate = creationDate[:-len('\\n')]
- Improved::
creationDate = creationDate.cutsuffix('\\n')
shared_memory.py ----------------
- Current::
reported_name = self._name if _USE_POSIX and self._prepend_leading_slash: if self._name.startswith("/"): reported_name = self._name[1:] return reported_name
- Improved::
if _USE_POSIX and self._prepend_leading_slash: return self._name.cutprefix("/") return self._name
build-installer.py ------------------
- Current::
if archiveName.endswith('.tar.gz'): retval = os.path.basename(archiveName[:-7]) if ((retval.startswith('tcl') or retval.startswith('tk')) and retval.endswith('-src')): retval = retval[:-4]
- Improved::
if archiveName.endswith('.tar.gz'): retval = os.path.basename(archiveName[:-7]) if retval.startswith(('tcl', 'tk')): retval = retval.cutsuffix('-src')
Depending on personal style, ``archiveName[:-7]`` could also be changed to ``archiveName.cutsuffix('.tar.gz')``.
test_core.py ------------
- Current::
if output.endswith("\n"): output = output[:-1]
- Improved::
output = output.cutsuffix("\n")
cookiejar.py ------------
- Current::
def strip_quotes(text): if text.startswith('"'): text = text[1:] if text.endswith('"'): text = text[:-1] return text
- Improved::
def strip_quotes(text): return text.cutprefix('"').cutsuffix('"')
- Current::
if line.endswith("\n"): line = line[:-1]
- Improved::
line = line.cutsuffix("\n")
fixdiv.py ---------
- Current::
def chop(line): if line.endswith("\n"): return line[:-1] else: return line
- Improved::
def chop(line): return line.cutsuffix("\n")
test_concurrent_futures.py --------------------------
In the following example, the meaning of the code changes slightly, but in context, it behaves the same.
- Current::
if name.endswith(('Mixin', 'Tests')): return name[:-5] elif name.endswith('Test'): return name[:-4] else: return name
- Improved::
return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test')
msvc9compiler.py ----------------
- Current::
if value.endswith(os.pathsep): value = value[:-1]
- Improved::
value = value.cutsuffix(os.pathsep)
test_pathlib.py ---------------
- Current::
self.assertTrue(r.startswith(clsname + '('), r) self.assertTrue(r.endswith(')'), r) inner = r[len(clsname) + 1 : -1]
- Improved::
self.assertTrue(r.startswith(clsname + '('), r) self.assertTrue(r.endswith(')'), r) inner = r.cutprefix(clsname + '(').cutsuffix(')')
Rejected Ideas ==============
Expand the lstrip and rstrip APIs ---------------------------------
Because ``lstrip`` takes a string as its argument, it could be viewed as taking an iterable of length-1 strings. The API could therefore be generalized to accept any iterable of strings, which would be successively removed as prefixes. While this behavior would be consistent, it would not be obvious for users to have to call ``'foobar'.cutprefix(('foo,))`` for the common use case of a single prefix.
Allow multiple prefixes -----------------------
Some users discussed the desire to be able to remove multiple prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``. However, this adds ambiguity about the order in which the prefixes are removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``. After this proposal, this can be spelled explicitly as ``s.cutprefix('Foo').cutprefix('FooBar')``.
Remove multiple copies of a prefix ----------------------------------
This is the behavior that would be consistent with the aforementioned expansion of the ``lstrip/rstrip`` API -- repeatedly applying the function until the argument is unchanged. This behavior is attainable from the proposed behavior via the following::
>>> s = 'foo' * 100 + 'bar' >>> while s != (s := s.cutprefix("foo")): pass >>> s 'bar'
The above can be modififed by chaining multiple ``cutprefix`` calls together to achieve the full behavior of the ``lstrip``/``rstrip`` generalization, while being explicit in the order of removal.
While the proposed API could later be extended to include some of these use cases, to do so before any observation of how these methods are used in practice would be premature and may lead to choosing the wrong behavior.
Raising an exception when not found -----------------------------------
There was a suggestion that ``s.cutprefix(pre)`` should raise an exception if ``not s.startswith(pre)``. However, this does not match with the behavior and feel of other string methods. There could be ``required=False`` keyword added, but this violates the KISS principle.
Alternative Method Names ------------------------
Several alternatives method names have been proposed. Some are listed below, along with commentary for why they should be rejected in favor of ``cutprefix`` (the same arguments hold for ``cutsuffix``)
``ltrim`` "Trim" does in other languages (e.g. JavaScript, Java, Go, PHP) what ``strip`` methods do in Python. ``lstrip(string=...)`` This would avoid adding a new method, but for different behavior, it's better to have two different methods than one method with a keyword argument that select the behavior. ``cut_prefix`` All of the other methods of the string API, e.g. ``str.startswith()``, use ``lowercase`` rather than ``lower_case_with_underscores``. ``cutleft``, ``leftcut``, or ``lcut`` The explicitness of "prefix" is preferred. ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc. All of these might have been acceptable, but they have more characters than ``cut``. Some suggested that the verb "cut" implies mutability, but the string API already contains verbs like "replace", "strip", "split", and "swapcase". ``stripprefix`` Users may benefit from the mnemonic that "strip" means working with sets of characters, while other methods work with substrings, so re-using "strip" here should be avoided.
Reference Implementation ========================
See the pull request on GitHub [#pr]_.
References ==========
.. [#pr] GitHub pull request with implementation (https://github.com/python/cpython/pull/18939) .. [#pyid] Discussion on Python-Ideas ( https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSU... ) .. [#confusion] Comment listing Bug Tracker and StackOverflow issues ( https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFI... )
Copyright =========
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WFEWPAOV... Code of Conduct: http://python.org/psf/codeofconduct/
Dennis: please add references to past discussions in python-ideas and python-dev. Link to the first email of each thread in these lists. Victor
Here's an updated version. Online: https://www.python.org/dev/peps/pep-0616/ Source: https://raw.githubusercontent.com/python/peps/master/pep-0616.rst Changes: - More complete Python implementation to match what the type checking in the C implementation would be - Clarified that returning ``self`` is an optimization - Added links to past discussions on Python-Ideas and Python-Dev - Specified ability to accept a tuple of strings - Shorter abstract section and fewer stdlib examples - Mentioned - Typo and formatting fixes I didn't change the name because it didn't seem like there was a strong consensus for an alternative yet. I liked the suggestions of ``dropprefix`` or ``removeprefix``. All the best, Dennis
On 22/03/2020 22:25, Dennis Sweeney wrote:
Here's an updated version.
Online: https://www.python.org/dev/peps/pep-0616/ Source: https://raw.githubusercontent.com/python/peps/master/pep-0616.rst
Changes: - More complete Python implementation to match what the type checking in the C implementation would be - Clarified that returning ``self`` is an optimization - Added links to past discussions on Python-Ideas and Python-Dev - Specified ability to accept a tuple of strings - Shorter abstract section and fewer stdlib examples - Mentioned - Typo and formatting fixes
I didn't change the name because it didn't seem like there was a strong consensus for an alternative yet. I liked the suggestions of ``dropprefix`` or ``removeprefix``.
All the best, Dennis _______________________________________________
Proofreading: it would not be obvious for users to have to call 'foobar'.cutprefix(('foo,)) for the common use case of a single prefix. Missing single quote after the last foo.
s = 'foobar' * 100 + 'bar' prefixes = ('bar', 'foo') while len(s) != len(s := s.cutprefix(prefixes)): pass s 'bar'
or the more obvious and readable alternative:
s = 'foo' * 100 + 'bar' prefixes = ('bar', 'foo') while s.startswith(prefixes): s = s.cutprefix(prefixes) s 'bar'
Er no, in both these examples s is reduced to an empty string. Best wishes Rob Cliffe
Much appreciated! I will add that single quote and change those snippets to::
s = 'FooBar' * 100 + 'Baz' prefixes = ('Bar', 'Foo') while len(s) != len(s := s.cutprefix(prefixes)): pass s 'Baz'
and::
s = 'FooBar' * 100 + 'Baz' prefixes = ('Bar', 'Foo') while s.startswith(prefixes): s = s.cutprefix(prefixes) s 'Baz'
Sorry, another niggle re handling an empty affix: With your Python implementation, 'aba'.cutprefix(('', 'a')) == 'aba' 'aba'.cutsuffix(('', 'a')) == 'ab' This seems surprising. Rob Gadfly Cliffe On 22/03/2020 23:23, Dennis Sweeney wrote:
Much appreciated! I will add that single quote and change those snippets to::
s = 'FooBar' * 100 + 'Baz' prefixes = ('Bar', 'Foo') while len(s) != len(s := s.cutprefix(prefixes)): pass s 'Baz'
and::
s = 'FooBar' * 100 + 'Baz' prefixes = ('Bar', 'Foo') while s.startswith(prefixes): s = s.cutprefix(prefixes) s 'Baz'
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QJ54X6WH... Code of Conduct: http://python.org/psf/codeofconduct/
On 22Mar2020 23:33, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
Sorry, another niggle re handling an empty affix: With your Python implementation, 'aba'.cutprefix(('', 'a')) == 'aba' 'aba'.cutsuffix(('', 'a')) == 'ab' This seems surprising.
That surprises me too. I expect the first matching affix to be used. It is the only way for the caller to have a predictable policy. As a diversion, _are_ there use cases where an empty affix is useful or reasonable or likely? Cheers, Cameron Simpson <cs@cskk.id.au>
Cameron Simpson writes:
As a diversion, _are_ there use cases where an empty affix is useful or reasonable or likely?
In the "raise on failure" design, "aba".cutsuffix('.doc') raises, "aba".cutsuffix('.doc', '') returns "aba". BTW, since I'm here, thanks for your discussion of context managers for loop invariants. It was very enlightening.
On Sun, Mar 22, 2020 at 10:25:28PM -0000, Dennis Sweeney wrote:
Changes: - More complete Python implementation to match what the type checking in the C implementation would be - Clarified that returning ``self`` is an optimization - Added links to past discussions on Python-Ideas and Python-Dev - Specified ability to accept a tuple of strings
I am concerned about that tuple of strings feature. First, an implementation question: you do this when the prefix is a tuple: if isinstance(prefix, tuple): for option in tuple(prefix): if not isinstance(option, str): raise TypeError() option_str = str(option) which looks like two unnecessary copies: 1. Having confirmed that `prefix` is a tuple, you call tuple() to make a copy of it in order to iterate over it. Why? 2. Having confirmed that option is a string, you call str() on it to (potentially) make a copy. Why? Aside from those questions about the reference implementation, I am concerned about the feature itself. No other string method that returns a modified copy of the string takes a tuple of alternatives. * startswith and endswith do take a tuple of (pre/suff)ixes, but they don't return a modified copy; they just return a True or False flag; * replace does return a modified copy, and only takes a single substring at a time; * find/index/partition/split etc don't accept multiple substrings to search for. That makes startswith/endswith the unusual ones, and we should be conservative before emulating them. The difficulty here is that the notion of "cut one of these prefixes" is ambiguous if two or more of the prefixes match. It doesn't matter for startswith: "extraordinary".startswith(('ex', 'extra')) since it is True whether you match left-to-right, shortest-to-largest, or even in random order. But for cutprefix, which prefix should be deleted? Of course we can make a ruling by fiat, right now, and declare that it will cut the first matching prefix reading left to right, whether that's what users expect or not. That seems reasonable when your prefixes are hard-coded in the source, as above. But what happens here? prefixes = get_prefixes('user.config') result = mystring.cutprefix(prefixes) Whatever decision we make -- delete the shortest match, longest match, first match, last match -- we're going to surprise and annoy the people who expected one of the other behaviours. This is why replace() still only takes a single substring to match and this isn't supported: "extraordinary".replace(('ex', 'extra'), '') We ought to get some real-life exposure to the simple case first, before adding support for multiple prefixes/suffixes. -- Steven
Steven D'Aprano wrote:
Having confirmed that prefix is a tuple, you call tuple() to make a copy of it in order to iterate over it. Why?
Having confirmed that option is a string, you call str() on it to (potentially) make a copy. Why?
This was an attempt to ensure no one can do funny business with tuple or str subclassing. I was trying to emulate the ``PyTuple_Check`` followed by ``PyTuple_GET_SIZE`` and ``PyTuple_GET_ITEM`` that are done by the C implementation of ``str.startswith()`` to ensure that only the tuple/str methods are used, not arbitrary user subclass code. It seems that that's what most of the ``str`` methods force. I was mistaken in how to do this with pure Python. I believe I actually wanted something like: def cutprefix(self, prefix, /): if not isinstance(self, str): raise TypeError() if isinstance(prefix, tuple): for option in tuple.__iter__(prefix): if not isinstance(option, str): raise TypeError() if str.startswith(self, option): return str.__getitem__( self, slice(str.__len__(option), None)) return str.__getitem__(self, slice(None, None)) if not isinstance(prefix, str): raise TypeError() if str.startswith(self, prefix): return str.__getitem__(self, slice(str.__len__(prefix), None)) else: return str.__getitem__(self, slice(None, None)) ... which looks even uglier.
We ought to get some real-life exposure to the simple case first, before adding support for multiple prefixes/suffixes.
I could be (and have been) convinced either way about whether or not to generalize to tuples of strings. I thought Victor made a good point about compatibility with ``startswith()``
On 24/03/20 3:43 pm, Dennis Sweeney wrote:
This was an attempt to ensure no one can do funny business with tuple or str subclassing. I was trying to emulate the ``PyTuple_Check`` followed by ``PyTuple_GET_SIZE`` and ``PyTuple_GET_ITEM`` that are done by the C implementation of ``str.startswith()``
The C code uses those functions for efficiency, not to prevent "funny business". PyTuple_GET_SIZE and PyTuple_GET_ITEM are macros that directly access fields of the tuple struct, and PyTuple_Check is much faster than a full isinstance check. There is no point in trying to emulate these in Python code. -- Greg
I think my confusion is about just how precise this sort of "reference implementation" should be. Should it behave with ``str`` and ``tuple`` subclasses exactly how it would when implemented? If so, I would expect the following to work: class S(str): __len__ = __getitem__ = __iter__ = None class T(tuple): __len__ = __getitem__ = __iter__ = None x = str.cutprefix("FooBar", T(("a", S("Foo"), 17))) assert x == "Bar" assert type(x) is str and so I think the ``str.__getitem__(self, slice(str.__len__(prefix), None))`` monstrosity would be the most technically correct, unless I'm missing something. But I've never seen Python code so ugly. And I suppose this is a slippery slope -- should it also guard against people redefining ``len = lambda x: 5`` and ``str = list`` in the global scope? Clearly not. I think then maybe it would be preferred to use the something like the following in the PEP: def cutprefix(self, prefix, /): if isinstance(prefix, str): if self.startswith(prefix): return self[len(prefix):] return self[:] elif isinstance(prefix, tuple): for option in prefix: if self.startswith(option): return self[len(option):] return self[:] else: raise TypeError() def cutsuffix(self, suffix): if isinstance(suffix, str): if self.endswith(suffix): return self[:len(self)-len(suffix)] return self[:] elif isinstance(suffix, tuple): for option in suffix: if self.endswith(option): return self[:len(self)-len(option)] return self[:] else: raise TypeError() The above would fail the assertions as written before, but would pass them for subclasses ``class S(str): pass`` and ``class T(tuple): pass`` that do not override any dunder methods. Is this an acceptable compromise if it appears alongside a clarifying sentence like the following? These methods should always return base ``str`` objects, even when called on ``str`` subclasses. I'm looking for guidance as to whether that's an appropriate level of precision for a PEP. If so, I'll make that change. All the best, Dennis
On Tue, Mar 24, 2020 at 08:14:33PM -0000, Dennis Sweeney wrote:
I think my confusion is about just how precise this sort of "reference implementation" should be. Should it behave with ``str`` and ``tuple`` subclasses exactly how it would when implemented? If so, I would expect the following to work:
I think that for the purposes of a relatively straight-forward PEP like this, you should start simple and only add complexity if needed to resolve questions. The Python implementation ought to show the desired semantics, not try to be an exact translation of the C code. Think of the Python equivalents in the itertools docs: https://docs.python.org/3/library/itertools.html See for example: https://www.python.org/dev/peps/pep-0584/#reference-implementation https://www.python.org/dev/peps/pep-0572/#appendix-b-rough-code-translations... You already state that the methods will show "roughly the following behavior", so there's no expectation that it will be precisely what the real methods do. Aim for clarity over emulation of unusual corner cases. The reference implementation is informative not prescriptive. -- Steven
On Tue, Mar 24, 2020 at 08:14:33PM -0000, Dennis Sweeney wrote:
I think then maybe it would be preferred to use the something like the following in the PEP:
def cutprefix(self, prefix, /): if isinstance(prefix, str): if self.startswith(prefix): return self[len(prefix):] return self[:]
Didn't we have a discussion about not mandating a copy when nothing changes? For strings, I'd just return `self`. It is only bytearray that requires a copy to be made.
elif isinstance(prefix, tuple): for option in prefix: if self.startswith(option): return self[len(option):]
I'd also remove the entire multiple substrings feature, for reasons I've already given. "Compatibility with startswith" is not a good reason to add this feature and you haven't established any good use-cases for it. A closer analog is str.replace(substring, ''), and after almost 30 years of real-world experience, that method still only takes a single substring, not a tuple. -- Steven
Steven D'Aprano wrote:
On Tue, Mar 24, 2020 at 08:14:33PM -0000, Dennis Sweeney wrote:
I think then maybe it would be preferred to use the something like the following in the PEP: def cutprefix(self, prefix, /): if isinstance(prefix, str): if self.startswith(prefix): return self[len(prefix):] return self[:]
Didn't we have a discussion about not mandating a copy when nothing changes? For strings, I'd just return self. It is only bytearray that requires a copy to be made.
It appears that in CPython, ``self[:] is self`` is true for base ``str`` objects, so I think ``return self[:]`` is consistent with (1) the premise that returning self is an implementation detail that is neither mandated nor forbidden, and (2) the premise that the methods should return base ``str`` objects even when called on ``str`` subclasses.
elif isinstance(prefix, tuple): for option in prefix: if self.startswith(option): return self[len(option):]
I'd also remove the entire multiple substrings feature, for reasons I've
already given. "Compatibility with startswith" is not a good reason to add this feature and you haven't established any good use-cases for it. A closer analog is str.replace(substring, ''), and after almost 30 years of real-world experience, that method still only takes a single substring, not a tuple.
The ``test_concurrent_futures.py`` example seemed to be a good use case to me. I agree that it would be good to see how common that actually is though. But it seems to me that any alternative behavior, e.g. repeated removal, could be implemented by a user on top of the remove-only-the-first-found behavior or by fluently chaining multiple method calls. Maybe you're right that it's too complex, but I think it's at least worth discussing.
Dennis Sweeney wrote:
Steven D'Aprano wrote:
Dennis Sweeney wrote:
I think then maybe it would be preferred to use the something like the following in the PEP: def cutprefix(self, prefix, /): if isinstance(prefix, str): if self.startswith(prefix): return self[len(prefix):] return self[:]
Didn't we have a discussion about not mandating a copy when nothing changes? For strings, I'd just return self. It is only bytearray that requires a copy to be made.
It appears that in CPython, ``self[:] is self`` is true for base ``str`` objects, so I think ``return self[:]`` is consistent with (1) the premise that returning self is an implementation detail that is neither mandated nor forbidden, and (2) the premise that the methods should return base ``str`` objects even when called on ``str`` subclasses.
The Python interpreter in my head sees `self[:]` and returns a copy. A note that says a `str` is returned would be more useful than trying to exactly mirror internal details in the Python "roughly equivalent" code.
elif isinstance(prefix, tuple): for option in prefix: if self.startswith(option): return self[len(option):]
I'd also remove the entire multiple substrings feature, for reasons I've already given. "Compatibility with startswith" is not a good reason to add this feature and you haven't established any good use-cases for it. A closer analog is str.replace(substring, ''), and after almost 30 years of real-world experience, that method still only takes a single substring, not a tuple.
The ``test_concurrent_futures.py`` example seemed to be a good use case to me. I agree that it would be good to see how common that actually is though. But it seems to me that any alternative behavior, e.g. repeated removal, could be implemented by a user on top of the remove-only-the-first-found behavior or by fluently chaining multiple method calls. Maybe you're right that it's too complex, but I think it's at least worth discussing.
I agree with Steven -- a tuple of options is not necessary for the affix removal methods. -- ~Ethan~
I'm removing the tuple feature from this PEP. So now, if I understand correctly, I don't think there's disagreement about behavior, just about how that behavior should be summarized in Python code. Ethan Furman wrote:
It appears that in CPython, self[:] is self is true for base str objects, so I think return self[:] is consistent with (1) the premise that returning self is an implementation detail that is neither mandated nor forbidden, and (2) the premise that the methods should return base str objects even when called on str subclasses. The Python interpreter in my head sees self[:] and returns a copy. A note that says a str is returned would be more useful than trying to exactly mirror internal details in the Python "roughly equivalent" code.
I think I'm still in the camp that ``return self[:]`` more precisely prescribes the desired behavior. It would feel strange to me to write ``return self`` and then say "but you don't actually have to return self, and in fact you shouldn't when working with subclasses". To me, it feels like return (the original object unchanged, or a copy of the object, depending on implementation details, but always make a copy when working with subclasses) is well-summarized by return self[:] especially if followed by the text Note that ``self[:]`` might not actually make a copy -- if the affix is empty or not found, and if ``type(self) is str``, then these methods may, but are not required to, make the optimization of returning ``self``. However, when called on instances of subclasses of ``str``, these methods should return base ``str`` objects, not ``self``. ...which is a necessary explanation regardless. Granted, ``return self[:]`` isn't perfect if ``__getitem__`` is overridden, but at the cost of three characters, the Python gains accuracy over both the optional nature of returning ``self`` in all cases and the impossibility (assuming no dunders are overridden) of returning self for subclasses. It also dissuades readers from relying on the behavior of returning self, which we're specifying is an implementation detail. Is that text explanation satisfactory?
On 3/25/2020 1:36 PM, Dennis Sweeney wrote:
It appears that in CPython, self[:] is self is true for base str objects, so I think return self[:] is consistent with (1) the premise that returning self is an implementation detail that is neither mandated nor forbidden, and (2) the premise that the methods should return base str objects even when called on str subclasses. The Python interpreter in my head sees self[:] and returns a copy. A note that says a str is returned would be more useful than trying to exactly mirror internal details in the Python "roughly equivalent" code. I think I'm still in the camp that ``return self[:]`` more precisely prescribes
I'm removing the tuple feature from this PEP. So now, if I understand correctly, I don't think there's disagreement about behavior, just about how that behavior should be summarized in Python code. I think that's right. Ethan Furman wrote: the desired behavior. It would feel strange to me to write ``return self`` and then say "but you don't actually have to return self, and in fact you shouldn't when working with subclasses". To me, it feels like
return (the original object unchanged, or a copy of the object, depending on implementation details, but always make a copy when working with subclasses)
is well-summarized by
return self[:]
especially if followed by the text
Note that ``self[:]`` might not actually make a copy -- if the affix is empty or not found, and if ``type(self) is str``, then these methods may, but are not required to, make the optimization of returning ``self``. However, when called on instances of subclasses of ``str``, these methods should return base ``str`` objects, not ``self``.
...which is a necessary explanation regardless. Granted, ``return self[:]`` isn't perfect if ``__getitem__`` is overridden, but at the cost of three characters, the Python gains accuracy over both the optional nature of returning ``self`` in all cases and the impossibility (assuming no dunders are overridden) of returning self for subclasses. It also dissuades readers from relying on the behavior of returning self, which we're specifying is an implementation detail.
Is that text explanation satisfactory?
Yes, that makes sense to me. I haven't had time to review the most recent updates, and I'll probably wait until you update it one more time. Eric
I've said a few times that I think it would be good if the behavior were defined /in terms of __getitem__/'s behavior. If the rough behavior is this: def removeprefix(self, prefix): if self.startswith(prefix): return self[len(prefix):] else: return self[:] Then you can shift all the guarantees about whether the subtype is str and whether it might return `self` when the prefix is missing onto the implementation of __getitem__. For CPython's implementation of str, `self[:]` returns `self`, so it's clearly true that __getitem__ is allowed to return `self` in some situations. Subclasses that do not override __getitem__ will return the str base class, and subclasses that /do/ overwrite __getitem__ can choose what they want to do. So someone could make their subclass do this: class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end is None: return self return type(self)(super().__getitem__(key)) They would then get "removeprefix" and "removesuffix" for free, with the desired semantics and optimizations. If we go with this approach (which again I think is much friendlier to subclassers), that obviates the problem of whether `self[:]` is a good summary of something that can return `self`: since "does the same thing as self[:]" /is/ the behavior it's trying to describe, there's no ambiguity. Best, Paul On 3/25/20 1:36 PM, Dennis Sweeney wrote:
I'm removing the tuple feature from this PEP. So now, if I understand correctly, I don't think there's disagreement about behavior, just about how that behavior should be summarized in Python code.
It appears that in CPython, self[:] is self is true for base str objects, so I think return self[:] is consistent with (1) the premise that returning self is an implementation detail that is neither mandated nor forbidden, and (2) the premise that the methods should return base str objects even when called on str subclasses. The Python interpreter in my head sees self[:] and returns a copy. A note that says a str is returned would be more useful than trying to exactly mirror internal details in the Python "roughly equivalent" code. I think I'm still in the camp that ``return self[:]`` more precisely prescribes
Ethan Furman wrote: the desired behavior. It would feel strange to me to write ``return self`` and then say "but you don't actually have to return self, and in fact you shouldn't when working with subclasses". To me, it feels like
return (the original object unchanged, or a copy of the object, depending on implementation details, but always make a copy when working with subclasses)
is well-summarized by
return self[:]
especially if followed by the text
Note that ``self[:]`` might not actually make a copy -- if the affix is empty or not found, and if ``type(self) is str``, then these methods may, but are not required to, make the optimization of returning ``self``. However, when called on instances of subclasses of ``str``, these methods should return base ``str`` objects, not ``self``.
...which is a necessary explanation regardless. Granted, ``return self[:]`` isn't perfect if ``__getitem__`` is overridden, but at the cost of three characters, the Python gains accuracy over both the optional nature of returning ``self`` in all cases and the impossibility (assuming no dunders are overridden) of returning self for subclasses. It also dissuades readers from relying on the behavior of returning self, which we're specifying is an implementation detail.
Is that text explanation satisfactory? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4E77QD52... Code of Conduct: http://python.org/psf/codeofconduct/
I was surprised by the following behavior: class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end: return self return type(self)(super().__getitem__(key)) my_foo = MyStr("foo") MY_FOO = MyStr("FOO") My_Foo = MyStr("Foo") empty = MyStr("") assert type(my_foo.casefold()) is str assert type(MY_FOO.capitalize()) is str assert type(my_foo.center(3)) is str assert type(my_foo.expandtabs()) is str assert type(my_foo.join(())) is str assert type(my_foo.ljust(3)) is str assert type(my_foo.lower()) is str assert type(my_foo.lstrip()) is str assert type(my_foo.replace("x", "y")) is str assert type(my_foo.split()[0]) is str assert type(my_foo.splitlines()[0]) is str assert type(my_foo.strip()) is str assert type(empty.swapcase()) is str assert type(My_Foo.title()) is str assert type(MY_FOO.upper()) is str assert type(my_foo.zfill(3)) is str assert type(my_foo.partition("z")[0]) is MyStr assert type(my_foo.format()) is MyStr I was under the impression that all of the ``str`` methods exclusively returned base ``str`` objects. Is there any reason why those two are different, and is there a reason that would apply to ``removeprefix`` and ``removesuffix`` as well?
I imagine it's an implementation detail of which ones depend on __getitem__. The only methods that would be reasonably amenable to a guarantee like "always returns the same thing as __getitem__" would be (l|r|)strip(), split(), splitlines(), and .partition(), because they only work with subsets of the input string. Most of the other stuff involves constructing new strings and it's harder to cast them in terms of other "primitive operations" since strings are immutable. I suspect that to the extent that the ones that /could/ be implemented in terms of __getitem__ are returning base strings, it's either because no one thought about doing it at the time and they used another mechanism or it was a deliberate choice to be consistent with the other methods. I don't see removeprefix and removesuffix explicitly being implemented in terms of slicing operations as a huge win - you've demonstrated that someone who wants a persistent string subclass still would need to override a /lot/ of methods, so two more shouldn't hurt much - I just think that "consistent with most of the other methods" is a /particularly/ good reason to avoid explicitly defining these operations in terms of __getitem__. The /default/ semantics are the same (i.e. if you don't explicitly change the return type of __getitem__, it won't change the return type of the remove* methods), and the only difference is that for all the /other/ methods, it's an implementation detail whether they call __getitem__, whereas for the remove methods it would be explicitly documented. In my ideal world, a lot of these methods would be redefined in terms of a small set of primitives that people writing subclasses could implement as a protocol that would allow methods called on the functions to retain their class, but I think the time for that has passed. Still, I don't think it would /hurt/ for new methods to be defined in terms of what primitive operations exist where possible. Best, Paul On 3/25/20 3:09 PM, Dennis Sweeney wrote:
I was surprised by the following behavior:
class MyStr(str): def __getitem__(self, key): if isinstance(key, slice) and key.start is key.stop is key.end: return self return type(self)(super().__getitem__(key))
my_foo = MyStr("foo") MY_FOO = MyStr("FOO") My_Foo = MyStr("Foo") empty = MyStr("")
assert type(my_foo.casefold()) is str assert type(MY_FOO.capitalize()) is str assert type(my_foo.center(3)) is str assert type(my_foo.expandtabs()) is str assert type(my_foo.join(())) is str assert type(my_foo.ljust(3)) is str assert type(my_foo.lower()) is str assert type(my_foo.lstrip()) is str assert type(my_foo.replace("x", "y")) is str assert type(my_foo.split()[0]) is str assert type(my_foo.splitlines()[0]) is str assert type(my_foo.strip()) is str assert type(empty.swapcase()) is str assert type(My_Foo.title()) is str assert type(MY_FOO.upper()) is str assert type(my_foo.zfill(3)) is str
assert type(my_foo.partition("z")[0]) is MyStr assert type(my_foo.format()) is MyStr
I was under the impression that all of the ``str`` methods exclusively returned base ``str`` objects. Is there any reason why those two are different, and is there a reason that would apply to ``removeprefix`` and ``removesuffix`` as well? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TVDATHMC... Code of Conduct: http://python.org/psf/codeofconduct/
I imagine it's an implementation detail of which ones depend on ``__getitem__``.
If we write class MyStr(str): def __getitem__(self, key): raise ZeroDivisionError() then all of the assertions from before still pass, so in fact *none* of the methods rely on ``__getitem__``. As of now ``str`` does not behave as an ABC at all. But it's an interesting proposal to essentially make it an ABC. Although it makes me curious what all of the different reasons people actually have for subclassing ``str``. All of the examples I found in the stdlib were either (1) contrived test cases (2) strings (e.g. grammar tokens) with some extra attributes along for the ride, or (3) string-based enums. None of types (2) or (3) ever overrode ``__getitem__``, so it doesn't feel like that common of a use case.
I don't see removeprefix and removesuffix explicitly being implemented in terms of slicing operations as a huge win - you've demonstrated that someone who wants a persistent string subclass still would need to override a /lot/ of methods, so two more shouldn't hurt much - I just think that "consistent with most of the other methods" is a /particularly/ good reason to avoid explicitly defining these operations in terms of __getitem__.
Making sure I understand: would you prefer the PEP to say ``return self`` rather than ``return self[:]``? I never had the intention of ``self[:]`` meaning "this must have exactly the behavior of ``self.__getitem__(slice(None, None))`` regardless of type", but I can understand if that's how you're saying it could be interpreted.
Dennis Sweeney wrote: -----------------------
It appears that in CPython, self[:] is self is true for base str objects, so I think return self[:] is consistent with (1) the premise that returning self is an implementation detail that is neither mandated nor forbidden, and (2) the premise that the methods should return base str objects even when called on str subclasses.
Ethan Furman wrote: -------------------
The Python interpreter in my head sees self[:] and returns a copy.
Dennis Sweeney wrote: -----------------------
I think I'm still in the camp that ``return self[:]`` more precisely prescribes the desired behavior. It would feel strange to me to write ``return self`` and then say "but you don't actually have to return self, and in fact you shouldn't when working with subclasses".
I don't understand that list bit -- surely, if I'm bothering to implement `removeprefix` and `removesuffix` in my subclass, I would also want to `return self` to keep my subclass? Why would I want to go through the extra overhead of either calling my own `__getitem__` method, or have the `str.__getitem__` method discard my subclass? However, if you are saying that `self[:]` *will* call `self.__class__.__getitem__` so my subclass only has to override `__getitem__` instead of `removeprefix` and `removesuffix`, that I can be happy with. -- ~Ethan~
I don't understand that list bit -- surely, if I'm bothering to implement removeprefix and removesuffix in my subclass, I would also want to return self to keep my subclass? Why would I want to go through the extra overhead of either calling my own __getitem__ method, or have the str.__getitem__ method discard my subclass?
I should clarify: by "when working with subclasses" I meant "when str.removeprefix() is called on a subclass that does not override removeprefix", and in that case it should return a base str. I was not taking a stance on how the methods should be overridden, and I'm not sure there are many use cases where it should be.
However, if you are saying that self[:] will call self.__class__.__getitem__ so my subclass only has to override __getitem__ instead of removeprefix and removesuffix, that I can be happy with.
I was only saying that the new methods should match 20 other methods in the str API by always returning a base str (the exceptions being format, format_map, and (r)partition for some reason). I did not mean to suggest that they should ever call user-supplied ``__getitem__`` code -- I don't think they need to. I haven't found anyone trying to use ``str`` as a mixin class/ABC, and it seems that this would be very difficult to do given that none of its methods currently rely on ``self.__class__.__getitem__``. If ``return self[:]`` in the PEP is too closely linked to "must call user-supplied ``__getitem__`` methods" for it not to be true, and so you're suggesting ``return self`` is more faithful, I can understand. So now if I understand the dilemma up to this point we have: Benefits of writing ``return self`` in the PEP: - Makes it clear that the optimization of not copying is allowed - Makes it clear that ``self.__class__.__getitem__`` isn't used Benefits of writing ``return self[:]`` in the PEP: - Makes it clear that returning self is an implementation detail - For subclasses not overriding ``__getitem__`` (the majority of cases), makes it clear that this method will return a base str like the other str methods. Did I miss anything? All the best, Dennis
First off, thank you for being so patient -- trying to champion a PEP can be exhausting. On 03/26/2020 05:22 PM, Dennis Sweeney wrote:
Ethan Furman wrote:
I don't understand that list bit -- surely, if I'm bothering to implement removeprefix and removesuffix in my subclass, I would also want to return self to keep my subclass? Why would I want to go through the extra overhead of either calling my own __getitem__ method, or have the str.__getitem__ method discard my subclass?
I should clarify: by "when working with subclasses" I meant "when str.removeprefix() is called on a subclass that does not override removeprefix", and in that case it should return a base str.
Okay.
However, if you are saying that self[:] will call self.__class__.__getitem__ so my subclass only has to override __getitem__ instead of removeprefix and removesuffix, that I can be happy with.
I was only saying that the new methods should match 20 other methods in the str API by always returning a base str
Okay.
If ``return self[:]`` in the PEP is too closely linked to "must call user-supplied ``__getitem__`` methods" for it not to be true, and so you're suggesting ``return self`` is more faithful, I can understand.
So now if I understand the dilemma up to this point we have:
Benefits of writing ``return self`` in the PEP: a - Makes it clear that the optimization of not copying is allowed b - Makes it clear that ``self.__class__.__getitem__`` isn't used
Benefits of writing ``return self[:]`` in the PEP: c - Makes it clear that returning self is an implementation detail d - For subclasses not overriding ``__getitem__`` (the majority of cases), makes it clear that this method will return a base str like the other str methods.
Did I miss anything?
The only thing you missed is that, for me at least, points A, C, and D are not at all clear from the example code. If I wanted to be explicit about the return type being `str` I would write: return str(self) # subclasses are coerced to str -- ~Ethan~
On 3/26/2020 9:10 PM, Ethan Furman wrote:
First off, thank you for being so patient -- trying to champion a PEP can be exhausting.
On 03/26/2020 05:22 PM, Dennis Sweeney wrote:
So now if I understand the dilemma up to this point we have:
Benefits of writing ``return self`` in the PEP: a - Makes it clear that the optimization of not copying is allowed b - Makes it clear that ``self.__class__.__getitem__`` isn't used
Benefits of writing ``return self[:]`` in the PEP: c - Makes it clear that returning self is an implementation detail d - For subclasses not overriding ``__getitem__`` (the majority of cases), makes it clear that this method will return a base str like the other str methods.
Did I miss anything?
The only thing you missed is that, for me at least, points A, C, and D are not at all clear from the example code. If I wanted to be explicit about the return type being `str` I would write:
return str(self) # subclasses are coerced to str
That does seem like the better solution, including the comment. Eric
I appreciate the input and attention to detail! Using the ``str()`` constructor was sort of what I had thought originally, and that's why I had gone overboard with "casting" in one iteration of the sample code. When I realized that this isn't quite "casting" and that ``__str__`` can be overridden, I went even more overboard and suggested that ``str.__getitem__(self, ...)`` and ``str.__len__(self)`` could be written, which does have the behavior of effectively "casting", but looks nasty. Do you think that the following is a happy medium? def removeprefix(self: str, prefix: str, /) -> str: # coerce subclasses to str self_str = str(self) prefix_str = str(prefix) if self_str.startswith(prefix_str): return self_str[len(prefix_str):] else: return self_str def removesuffix(self: str, suffix: str, /) -> str: # coerce subclasses to str self_str = str(self) suffix_str = str(suffix) if suffix_str and self_str.endswith(suffix_str): return self_str[:-len(suffix_str)] else: return self_str Followed by the text: If ``type(self) is str`` (rather than a subclass) and if the given affix is empty or is not found, then these methods may, but are not required to, make the optimization of returning ``self``.
On Wed, Mar 25, 2020 at 5:42 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I'm removing the tuple feature from this PEP. So now, if I understand correctly, I don't think there's disagreement about behavior, just about how that behavior should be summarized in Python code. [...] return (the original object unchanged, or a copy of the object, depending on implementation details, but always make a copy when working with subclasses)
is well-summarized by
return self[:]
especially if followed by the text
Note that ``self[:]`` might not actually make a copy -- if the affix is empty or not found, and if ``type(self) is str``, then these methods
may, but are not required to, make the optimization of returning
``self``. However, when called on instances of subclasses of ``str``, these methods should return base ``str`` objects, not ``self``.
Perhaps:
Note that ``self[:]`` might not actually make a copy of ``self``. If the affix is empty or not found, and if ``type(self)`` is immutable, then these methods may, but are not required to, make the optimization of returning ``self``. ... [...]
I was trying to start with the the intended behavior of the str class, then move on to generalizing to other classes, because I think completing a single example and *then* generalizing is an instructional style that's easier to digest, whereas intermixing all of the examples at once can get confused (can I call str.removeprefix(object(), 17)?). Is something missing that's not already there in the following sentence in the PEP? Although the methods on the immutable ``str`` and ``bytes`` types may make the aforementioned optimization of returning the original object, ``bytearray.removeprefix()`` and ``bytearray.removesuffix()`` should always return a copy, never the original object. Best, Dennis
How about just presenting pseudo code with the caveat that that's for the base str and bytes classes only, and then stipulating that for subclasses the return value is still a str/bytes/bytearray instance, and leaving it at that? After all the point of the Python code is to show what the C code should do in a way that's easy to grasp -- giving a Python implementation is not meant to constrain the C implementation to have *exactly* the same behavior in all corner cases (since that would lead to seriously contorted C code). On Fri, Mar 27, 2020 at 1:02 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I was trying to start with the the intended behavior of the str class, then move on to generalizing to other classes, because I think completing a single example and *then* generalizing is an instructional style that's easier to digest, whereas intermixing all of the examples at once can get confused (can I call str.removeprefix(object(), 17)?). Is something missing that's not already there in the following sentence in the PEP?
Although the methods on the immutable ``str`` and ``bytes`` types may make the aforementioned optimization of returning the original object, ``bytearray.removeprefix()`` and ``bytearray.removesuffix()`` should always return a copy, never the original object.
Best, Dennis _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IO33NJUQ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
I like how that would take the pressure off of the Python sample. How's something like this? Specification ============= The builtin ``str`` class will gain two new methods which will behave as follows when ``type(self) is str``:: def removeprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix): return self[len(prefix):] else: return self def removesuffix(self: str, suffix: str, /) -> str: if suffix and self.endswith(suffix): return self[:-len(suffix)] else: return self These methods, even when called on ``str`` subclasses, should always return base ``str`` objects. One should not rely on the behavior of ``self`` being returned (as in ``s.removesuffix('') is s``) -- this optimization should be considered an implementation detail. To test whether any affixes were removed during the call, one may use the constant-time behavior of comparing the lengths of the original and new strings:: >>> string = 'Python String Input' >>> new_string = string.removeprefix('Py') >>> modified = (len(string) != len(new_string)) >>> modified True One may also continue using ``startswith()`` and ``endswith()`` methods for control flow instead of testing the lengths as above. Note that without the check for the truthiness of ``suffix``, ``s.removesuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``. Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()`` will accept any bytes-like object as an argument. Although the methods on the immutable ``str`` and ``bytes`` types may make the aforementioned optimization of returning the original object, ``bytearray.removeprefix()`` and ``bytearray.removesuffix()`` should *always* return a copy, never the original object. The two methods will also be added to ``collections.UserString``, with similar behavior. My hesitation to write "return self" is resolved by saying that it should not be relied on, so I think this is a win. Best, Dennis
On Fri, Mar 27, 2020 at 1:55 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I like how that would take the pressure off of the Python sample. How's something like this?
Specification =============
The builtin ``str`` class will gain two new methods which will behave as follows when ``type(self) is str``::
def removeprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix): return self[len(prefix):] else: return self
def removesuffix(self: str, suffix: str, /) -> str: if suffix and self.endswith(suffix): return self[:-len(suffix)] else: return self
These methods, even when called on ``str`` subclasses, should always return base ``str`` objects. One should not rely on the behavior of ``self`` being returned (as in ``s.removesuffix('') is s``) -- this optimization should be considered an implementation detail.
I'd suggest to drop the last sentence ("One should ... detail.") and instead write 'return self[:]' in the methods.
To test whether any affixes were removed during the call, one may use the constant-time behavior of comparing the lengths of the original and new strings::
>>> string = 'Python String Input' >>> new_string = string.removeprefix('Py') >>> modified = (len(string) != len(new_string)) >>> modified True
If I saw that in a code review I'd flag it for non-obviousness. One should use 'string != new_string' *unless* there is severe pressure to squeeze every nanosecond out of this particular code (and it better be inside an inner loop).
One may also continue using ``startswith()`` and ``endswith()`` methods for control flow instead of testing the lengths as above.
That's worse, in a sense, since "foofoobar".removeprefix("foo") returns "foobar" which still starts with "foo". Note that without the check for the truthiness of ``suffix``,
``s.removesuffix('')`` would be mishandled and always return the empty string due to the unintended evaluation of ``self[:-0]``.
That's a good one (I started suggesting dropping that when I read this :-) but maybe it ought to go in a comment (and shorter -- at most one line).
Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()`` will accept any bytes-like object as an argument. Although the methods on the immutable ``str`` and ``bytes`` types may make the aforementioned optimization of returning the original object, ``bytearray.removeprefix()`` and ``bytearray.removesuffix()`` should *always* return a copy, never the original object.
This could also be simplified by writing 'return self[:]'.
The two methods will also be added to ``collections.UserString``, with similar behavior.
My hesitation to write "return self" is resolved by saying that it should not be relied on, so I think this is a win.
Writing 'return self[:]' seems to say the same thing in fewer words though. :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
One may also continue using ``startswith()`` and ``endswith()`` methods for control flow instead of testing the lengths as above.
That's worse, in a sense, since "foofoobar".removeprefix("foo") returns "foobar" which still starts with "foo".
I meant that startswith might be called before removeprefix, as it was in the ``deccheck.py`` example.
If I saw that in a code review I'd flag it for non-obviousness. One should use 'string != new_string' unless there is severe pressure to squeeze every nanosecond out of this particular code (and it better be inside an inner loop).
I thought that someone had suggested that such things go in the PEP, but since these are more stylistic considerations, I would be more than happy to trim it down to just The builtin ``str`` class will gain two new methods which will behave as follows when ``type(self) is type(prefix) is str``:: def removeprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix): return self[len(prefix):] else: return self[:] def removesuffix(self: str, suffix: str, /) -> str: # suffix='' should not call self[:-0]. if suffix and self.endswith(suffix): return self[:-len(suffix)] else: return self[:] These methods, even when called on ``str`` subclasses, should always return base ``str`` objects. Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()`` will accept any bytes-like object as an argument. The two methods will also be added to ``collections.UserString``, with similar behavior.
On Fri, Mar 27, 2020 at 3:29 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
One may also continue using ``startswith()`` and ``endswith()`` methods for control flow instead of testing the lengths as above.
That's worse, in a sense, since "foofoobar".removeprefix("foo") returns "foobar" which still starts with "foo".
I meant that startswith might be called before removeprefix, as it was in the ``deccheck.py`` example.
Not having read the full PEP, that wasn't clear to me. Sorry!
If I saw that in a code review I'd flag it for non-obviousness. One should use 'string != new_string' unless there is severe pressure to squeeze every nanosecond out of this particular code (and it better be inside an inner loop).
I thought that someone had suggested that such things go in the PEP,
I'm sure someone did. But not every bit of feedback is worth acting upon, and sometimes a weird compromise is cooked up that addresses somebody's nit while making things less understandable for everyone else. I think this is one of those cases.
but since these are more stylistic considerations, I would be more than happy to trim it down to just
The builtin ``str`` class will gain two new methods which will behave as follows when ``type(self) is type(prefix) is str``::
def removeprefix(self: str, prefix: str, /) -> str: if self.startswith(prefix): return self[len(prefix):] else: return self[:]
def removesuffix(self: str, suffix: str, /) -> str: # suffix='' should not call self[:-0]. if suffix and self.endswith(suffix): return self[:-len(suffix)] else: return self[:]
These methods, even when called on ``str`` subclasses, should always return base ``str`` objects.
Methods with the corresponding semantics will be added to the builtin ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()`` will accept any bytes-like object as an argument. The two methods will also be added to ``collections.UserString``, with similar behavior.
Excellent! -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Sat., 28 Mar. 2020, 8:39 am Guido van Rossum, <guido@python.org> wrote:
On Fri, Mar 27, 2020 at 3:29 PM Dennis Sweeney < sweeney.dennis650@gmail.com> wrote:
If I saw that in a code review I'd flag it for non-obviousness. One should use 'string != new_string' unless there is severe pressure to squeeze every nanosecond out of this particular code (and it better be inside an inner loop).
I thought that someone had suggested that such things go in the PEP,
I'm sure someone did.
I think that may have been me in a tangent thread where folks were worried about O(N) checks on long strings. I know at least I temporarily forgot to account for string equality checks starting with a few O(1) checks to speed up common cases (IIRC: identity, length, first code point, last code point), which means explicitly calling len() is just as likely to slow things down as it is to speed them up. Cheers, Nick.
On 25/03/20 9:14 am, Dennis Sweeney wrote:
I think my confusion is about just how precise this sort of "reference implementation" should be. Should it behave with ``str`` and ``tuple`` subclasses exactly how it would when implemented?
No, I don't think so. The purpose of a Python implementation of a proposed feature is to get the intended semantics across, not to reproduce all the quirks of an imagined C implementation. If you were to bake these details into a Python reference implementation, you would be implying that these are *intended* restrictions, which (unless I misunderstand) is not what you are intending. (Back when yield-fron was being designed, I described the intended semantics in prose, and gave an approximate Python equivalent, which went through several revisions as we thrashed out exactly how the feature should behave. But I don't think it ever exactly matched all the details of the actual implementation, nor was it intended to. The prose turned out to be much more readable, anway.:-) -- Greg
On 24 Mar 2020, at 2:42, Steven D'Aprano wrote:
On Sun, Mar 22, 2020 at 10:25:28PM -0000, Dennis Sweeney wrote:
Changes: - More complete Python implementation to match what the type checking in the C implementation would be - Clarified that returning ``self`` is an optimization - Added links to past discussions on Python-Ideas and Python-Dev - Specified ability to accept a tuple of strings
I am concerned about that tuple of strings feature. [...] Aside from those questions about the reference implementation, I am concerned about the feature itself. No other string method that returns a modified copy of the string takes a tuple of alternatives.
* startswith and endswith do take a tuple of (pre/suff)ixes, but they don't return a modified copy; they just return a True or False flag;
* replace does return a modified copy, and only takes a single substring at a time;
* find/index/partition/split etc don't accept multiple substrings to search for.
That makes startswith/endswith the unusual ones, and we should be conservative before emulating them.
Actually I would like for other string methods to gain the ability to search for/chop off multiple substrings too. A `find()` that supports multiple search strings (and returns the leftmost position where a search string can be found) is a great help in implementing some kind of tokenizer: ```python def tokenize(source, delimiter): lastpos = 0 while True: pos = source.find(delimiter, lastpos) if pos == -1: token = source[lastpos:].strip() if token: yield token break else: token = source[lastpos:pos].strip() if token: yield token yield source[pos] lastpos = pos + 1 print(list(tokenize(" [ 1, 2, 3] ", ("[", ",", "]")))) ``` This would output `['[', '1', ',', '2', ',', '3', ']']` if `str.find()` supported multiple substring. Of course to be really usable `find()` would have to return **which** substring was found, which would make the API more complicated (and somewhat incompatible with the existing `find()`). But for `cutprefix()` (or whatever it's going to be called). I'm +1 on supporting multiple prefixes. For ambiguous cases, IMHO the most straight forward option would be to chop off the first prefix found.
[...]
Servus, Walter
On Tue, Mar 24, 2020 at 04:53:55PM +0100, Walter Dörwald wrote:
But for `cutprefix()` (or whatever it's going to be called). I'm +1 on supporting multiple prefixes. For ambiguous cases, IMHO the most straight forward option would be to chop off the first prefix found.
The Zen of Python has something to say about guessing in the face of ambiguity. -- Steven
Walter Dörwald writes:
A `find()` that supports multiple search strings (and returns the leftmost position where a search string can be found) is a great help in implementing some kind of tokenizer:
In other words, you want the equivalent of Emacs's "(search-forward (regexp-opt list-of-strings))", which also meets the requirement of returning which string was found (as "(match-string 0)"). Since Python already has a functionally similar API for regexps, we can add a regexp-opt (with appropriate name) method to re, perhaps as .compile_string_list(), and provide a convenience function re.search_string_list() for your application. I'm applying practicality before purity, of course. To some extent we want to encourage simple string approaches, and putting this in regex is not optimal for that. Steve
On 25 Mar 2020, at 9:48, Stephen J. Turnbull wrote:
Walter Dörwald writes:
A `find()` that supports multiple search strings (and returns the leftmost position where a search string can be found) is a great help in implementing some kind of tokenizer:
In other words, you want the equivalent of Emacs's "(search-forward (regexp-opt list-of-strings))", which also meets the requirement of returning which string was found (as "(match-string 0)").
Sounds like it. I'm not familiar with Emacs.
Since Python already has a functionally similar API for regexps, we can add a regexp-opt (with appropriate name) method to re, perhaps as .compile_string_list(), and provide a convenience function re.search_string_list() for your application.
If you're using regexps anyway, building the appropriate or-expression shouldn't be a problem. I guess that's what most lexers/tokenizers do anyway.
I'm applying practicality before purity, of course. To some extent we want to encourage simple string approaches, and putting this in regex is not optimal for that.
Exactly. I'm always a bit hesitant when using regexps, if there's a simpler string approach.
Steve
Servus, Walter
Hi Dennis, Thanks for the updated PEP, it looks way better! I love the ability to pass a tuple of strings ;-) -- The behavior of tuple containing an empty string is a little bit surprising. cutsuffix("Hello World", ("", " World")) returns "Hello World", whereas cutsuffix("Hello World", (" World", "")) returns "Hello". cutprefix() has a the same behavior: the first empty strings stops the loop and returns the string unchanged. I would prefer to raise ValueError("empty separator") to avoid any risk of confusion. I'm not sure that str.cutprefix("") or str.cutsuffix("") does make any sense. "abc".startswith("") and "abc".startswith(("", "a")) are true, but that's fine since startswith() doesn't modify the string. Moreover, we cannot change the behavior now :-) But for new methods, we can try to design them correctly to avoid any risk of confusion. -- It reminds me https://bugs.python.org/issue28029: "".replace("", s, n) now returns s instead of an empty string for all non-zero n. The behavior changes in Python 3.9. There are also discussions about "abc".split("") and re.compile("").split("abc"). str.split() raises ValueError("empty separator") whereas re.split returns ['', 'a', 'b', 'c', ''] which can be (IMO) surprising. See also https://bugs.python.org/issue28937 "str.split(): allow removing empty strings (when sep is not None)". Note: on the other wise, str.strip("") is accepted and returns the string unmodified. But this method doesn't accept a tuple of substrings. It's different than cutprefix/cutsuffix. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Hello, On Tue, 24 Mar 2020 19:14:16 +0100 Victor Stinner <vstinner@python.org> wrote: []
The behavior of tuple containing an empty string is a little bit surprising.
cutsuffix("Hello World", ("", " World")) returns "Hello World", whereas cutsuffix("Hello World", (" World", "")) returns "Hello".
cutprefix() has a the same behavior: the first empty strings stops the loop and returns the string unchanged.
I would prefer to raise ValueError("empty separator") to avoid any risk of confusion. I'm not sure that str.cutprefix("") or str.cutsuffix("") does make any sense.
str.cutprefix("")/str.cutsuffix("") definitely makes sense, e.g.: === config.something === # If you'd like to remove some prefix from your lines, set it here REMOVE_PREFIX = "" ====== === src.py === ... line = line.cutprefix(config.REMOVE_PREFIX) ... ====== Now one may ask whether str.cutprefix(("", "nonempty")) makes sense. A response can be "the more complex functionality, the more complex and confusing corner cases there're to handle". [] -- Best regards, Paul mailto:pmiscml@gmail.com
Le mar. 24 mars 2020 à 20:06, Paul Sokolovsky <pmiscml@gmail.com> a écrit :
=== config.something === # If you'd like to remove some prefix from your lines, set it here REMOVE_PREFIX = "" ======
=== src.py === ... line = line.cutprefix(config.REMOVE_PREFIX) ... ======
Just use: if config.REMOVE_PREFIX: line = line.cutprefix(config.REMOVE_PREFIX) Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Hello, On Tue, 24 Mar 2020 22:51:55 +0100 Victor Stinner <vstinner@python.org> wrote:
=== config.something === # If you'd like to remove some prefix from your lines, set it here REMOVE_PREFIX = "" ======
=== src.py === ... line = line.cutprefix(config.REMOVE_PREFIX) ... ======
Just use:
if config.REMOVE_PREFIX: line = line.cutprefix(config.REMOVE_PREFIX)
Or even just: if line.startswith(config.REMOVE_PREFIX): line = line[len(config.REMOVE_PREFIX):] But the point taken - indeed, any confusing, inconsistent behavior can be fixed on users' side with more if's, once they discover it. -- Best regards, Paul mailto:pmiscml@gmail.com
On Tue, Mar 24, 2020 at 07:14:16PM +0100, Victor Stinner wrote:
I would prefer to raise ValueError("empty separator") to avoid any risk of confusion. I'm not sure that str.cutprefix("") or str.cutsuffix("") does make any sense.
They make as much sense as any other null-operation, such as subtracting 0 or deleting empty slices from lists. Every string s is unchanged if you prepend or concatenate the empty string: assert s == ''+s == s+'' so removing the empty string should obey the same invariant: assert s == s.removeprefix('') == s.removesuffix('') -- Steven
It seems that there is a consensus on the names ``removeprefix`` and ``removesuffix``. I will update the PEP accordingly. I'll also simplify sample Python implementation to primarily reflect *intent* over strict type-checking correctness, and I'll adjust the accompanying commentary accordingly. Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP? All the best, Dennis
On 3/24/2020 7:21 PM, Dennis Sweeney wrote:
It seems that there is a consensus on the names ``removeprefix`` and ``removesuffix``. I will update the PEP accordingly. I'll also simplify sample Python implementation to primarily reflect *intent* over strict type-checking correctness, and I'll adjust the accompanying commentary accordingly.
Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP?
No objection. I think that's a good idea. Eric
Le mer. 25 mars 2020 à 00:29, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
Lastly, since the issue of multiple prefixes/suffixes is more controversial and seems that it would not affect how the single-affix cases would work, I can remove that from this PEP and allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP. Is there any objection to deferring this to a different PEP?
name.cutsuffix(('Mixin', 'Tests', 'Test')) is used in the "Motivating examples from the Python standard library" section. It looks like a nice usage of this feature. You added "There were many other such examples in the stdlib." What do you mean by controversial? I proposed to raise an empty if the prefix/suffix is empty to make cutsuffix(("", "suffix")) less surprising. But I'm also fine if you keep this behavior, since startswith/endswith accepts an empty string, and someone wrote that accepting an empty prefix/suffix is an useful feature. Or did someone write that cutprefix/cutsuffix must not accept a tuple of strings? (I'm not sure that I was able to read carefully all emails.) I like the ability to pass multiple prefixes and suffixes because it makes the method similar to lstrip(), rstrip(), strip(), startswith(), endswith() with all accepts multiple "values" (characters to remove, prefixes, suffixes). Victor -- Night gathers, and now my watch begins. It shall not end until my death.
There were at least two comments suggesting keeping it to one affix at a time: https://mail.python.org/archives/list/python-dev@python.org/message/GPXSIDLK... https://mail.python.org/archives/list/python-dev@python.org/message/EDWFPEGQ... But I didn't see any big objections to the rest of the PEP, so I think maybe we keep it restricted for now.
Thanks for the pointers to emails. Ethan Furman: "This is why replace() still only takes a single substring to match and this isn't supported: (...)" Hum ok, it makes sense. I agree that we can start with only accepting str (reject tuple), and maybe reconsider the idea of accepting a tuple of str later. Please move the idea in Rejected Ideas, but try also to summarize the reasons why the idea was rejected. I saw: * surprising result for empty prefix/suffix * surprising result for "FooBar text".cutprefix(("Foo", "FooBar")) * issue with unordered sequence like set: only accept tuple which is ordered * str.replace() only accepts str.replace(str, str) to avoid these issues: the idea of accepting str.replace(tuple of str, str) or variant was rejected multiple times. XXX does someone have references to past discussions? I found https://bugs.python.org/issue33647 which is a little bit different. You may mention re.sub() as an existing efficient solution for the complex cases. I have to confess that I had to think twice when I wrote my example line.cutsuffix(("\r\n", "\r", "\n")). Did I write suffixes in the correct order to get what I expect? :-) "\r\n" starts with "\r". Victor Le mer. 25 mars 2020 à 01:44, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
There were at least two comments suggesting keeping it to one affix at a time:
https://mail.python.org/archives/list/python-dev@python.org/message/GPXSIDLK...
https://mail.python.org/archives/list/python-dev@python.org/message/EDWFPEGQ...
But I didn't see any big objections to the rest of the PEP, so I think maybe we keep it restricted for now. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QBCB2QMU... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On Wed, 25 Mar 2020 at 00:42, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
There were at least two comments suggesting keeping it to one affix at a time:
https://mail.python.org/archives/list/python-dev@python.org/message/GPXSIDLK...
https://mail.python.org/archives/list/python-dev@python.org/message/EDWFPEGQ...
But I didn't see any big objections to the rest of the PEP, so I think maybe we keep it restricted for now.
That sounds like a good idea. The issue for me is how the function should behave with a list of affixes if one is a prefix of another, e.g.,removeprefix(('Test', 'Tests')). The empty string case is just one form of that. The behaviour should be defined clearly, and while I imagine "always remove the longest" is the "obvious" sensible choice, I am fairly certain there will be other opinions :-) So deferring the decision for now until we have more experience with the single-affix form seems perfectly reasonable. I'm not even sure that switching to multiple affixes later would need a PEP - it might be fine to add via a simple feature request issue. But that can be a decision for later, too. Paul
On 25Mar2020 08:14, Paul Moore <p.f.moore@gmail.com> wrote:
[...] The issue for me is how the function should behave with a list of affixes if one is a prefix of another, e.g.,removeprefix(('Test', 'Tests')). The empty string case is just one form of that. The behaviour should be defined clearly, and while I imagine "always remove the longest" is the "obvious" sensible choice, I am fairly certain there will be other opinions :-) So deferring the decision for now until we have more experience with the single-affix form seems perfectly reasonable.
I'd like to preface this with "I'm fine to implement multiple affixes later, if at all". That said: To me "first match" is the _only_ sensible choice. "longest match" can always be implemented with a "first match" function by sorting on length if desired. Also, "longest first" requires the implementation to do a prescan of the supplied affixes whereas "first match" lets the implementation just iterate over the choices as supplied. I'm beginning to think I must again threaten my partner's anecdote about Netscape Proxy's rule system, which prioritised rules by the lexical length of their regexp, not their config file order of appearance. That way lies (and, indeeed, lay) madness. Cheers, Cameron Simpson <cs@cskk.id.au>
PEP 616 -- String methods to remove prefixes and suffixes is available here: https://www.python.org/dev/peps/pep-0616/ Changes: - Only accept single affixes, not tuples - Make the specification more concise - Make fewer stylistic prescriptions for usage - Fix typos A reference implementation GitHub PR is up to date here: https://github.com/python/cpython/pull/18939 Are there any more comments for it before submission?
What do you think of adding a Version History section which lists most important changes since your proposed the first version of the PEP? I recall: * Version 3: don't accept tuple * Version 2: Rename cutprefix/cutsuffix to removeprefix/removesuffix, accept tuple * Version 1: initial version For example, for my PEP 587, I wrote detailed changes, but I don't think that you should go into the details ;-) https://www.python.org/dev/peps/pep-0587/#version-history Victor Le sam. 28 mars 2020 à 06:11, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
PEP 616 -- String methods to remove prefixes and suffixes is available here: https://www.python.org/dev/peps/pep-0616/
Changes: - Only accept single affixes, not tuples - Make the specification more concise - Make fewer stylistic prescriptions for usage - Fix typos
A reference implementation GitHub PR is up to date here: https://github.com/python/cpython/pull/18939
Are there any more comments for it before submission? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UJE3WCQX... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On 28/03/2020 17:02, Victor Stinner wrote:
What do you think of adding a Version History section which lists most important changes since your proposed the first version of the PEP? I recall:
* Version 3: don't accept tuple * Version 2: Rename cutprefix/cutsuffix to removeprefix/removesuffix, accept tuple * Version 1: initial version
For example, for my PEP 587, I wrote detailed changes, but I don't think that you should go into the details ;-) https://www.python.org/dev/peps/pep-0587/#version-history
Victor
IMHO that's overkill. A list of rejected ideas, and why they were rejected, seems sufficient. Rob Cliffe
My intent is to help people like me to follow the discussion on the PEP. There are more than 100 messages, it's hard to follow PEP updates. Victor Le dim. 29 mars 2020 à 14:55, Rob Cliffe via Python-Dev <python-dev@python.org> a écrit :
On 28/03/2020 17:02, Victor Stinner wrote:
What do you think of adding a Version History section which lists most important changes since your proposed the first version of the PEP? I recall:
* Version 3: don't accept tuple * Version 2: Rename cutprefix/cutsuffix to removeprefix/removesuffix, accept tuple * Version 1: initial version
For example, for my PEP 587, I wrote detailed changes, but I don't think that you should go into the details ;-) https://www.python.org/dev/peps/pep-0587/#version-history
Victor
IMHO that's overkill. A list of rejected ideas, and why they were rejected, seems sufficient. Rob Cliffe _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/L6CHS3PF... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
Hello all, It seems that most of the discussion has settled down, but I didn't quite understand from reading PEP 1 what the next step should be -- is this an appropriate time to open an issue on the Steering Council GitHub repository requesting pronouncement on PEP 616? Best, Dennis
I suggest you to wait one more week to let other people comment the PEP. After this delay, if you consider that the PEP is ready for pronouncement, you can submit it to the Steering Council, right. Victor Le mer. 1 avr. 2020 à 21:56, Dennis Sweeney <sweeney.dennis650@gmail.com> a écrit :
Hello all,
It seems that most of the discussion has settled down, but I didn't quite understand from reading PEP 1 what the next step should be -- is this an appropriate time to open an issue on the Steering Council GitHub repository requesting pronouncement on PEP 616?
Best, Dennis _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZXKU3EM6... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On Thu., 2 Apr. 2020, 8:30 am Victor Stinner, <vstinner@python.org> wrote:
I suggest you to wait one more week to let other people comment the PEP. After this delay, if you consider that the PEP is ready for pronouncement, you can submit it to the Steering Council, right.
Note that the submission to the Steering Council doesn't have to be a request for immediate pronouncement - it's a notification that the PEP is mature enough for the Council to decide whether to appoint a Council member as BDFL-Delegate or to appoint someone else. The decision on whether to wait for more questions is then up to the Council and/or the appointed BDFL-Delegate. PEP 616 definitely looks mature enough for that step to me (and potentially even immediately accepted - it did get dissected pretty thoroughly, after all!) Cheers, Nick.
参加者 (33)
-
Barney Gale
-
Brett Cannon
-
Cameron Simpson
-
Chris Angelico
-
Dan Stromberg
-
Dennis Sweeney
-
Eric Fahlgren
-
Eric V. Smith
-
Ethan Furman
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
Ivan Pozdeev
-
Kyle Stanley
-
Mike Miller
-
MRAB
-
musbur@posteo.org
-
Nathaniel Smith
-
Ned Batchelder
-
Nick Coghlan
-
Paul Ganssle
-
Paul Moore
-
Paul Sokolovsky
-
Rhodri James
-
Rob Cliffe
-
Sebastian Rittau
-
senthil@uthcode.com
-
Stephen J. Turnbull
-
Steve Dower
-
Steve Holden
-
Steven D'Aprano
-
Victor Stinner
-
Walter Dörwald