[Python-ideas] New explicit methods to trim strings

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Sat Mar 30 14:05:59 EDT 2019


Steven D'Aprano writes:

 > The correct solution is a verbose statement:
 > 
 >     if string.startswith("spam"):
 >         string = string[:len("spam")]

This is harder to write than I thought!  (The slice should be
'len("spam"):'.)  But s/The/A/:

    string = re.sub("^spam", "", string)

And a slightly incorrect solution (unless you really do want to remove
all spam, which most people do, but might not apply to "tooth"):

    string = string.replace("spam", "")

 > A pair of "cut" methods (cut prefix, cut suffix) fills a real need,

But do they, really?  Do we really need multiple new methods to
replace a dual-use one-liner, which also handles

    outfile = re.sub("\\.bmp$", ".jpg", infile)

in one line?  I concede that the same argument was made against
startswith/endswith, and they cleared the bar.  Python is a lot more
complex now, though, and I think the predicates are more frequently
useful.

 > and will avoid a lot of mistaken bug reports/questions.

That depends on analogies to other languages.  Coming from Emacs, I'm
not at all surprised that .strip takes a character class as an
argument and strips until it runs into a character not in the class.
Evidently others have different intuition.  If that's from English,
and they know about cutprefix/cutsuffix, yeah, they won't make the
mistake.  If it's from another programming language they know, or they
don't know about cutprefix, they may just write "string.strip('.jpg')"
without thinking about it and it (sometimes) works, then they report a
bug when it doesn't.  Remember, these folks are not understanding the
docs, and very likely not reading them.

 > As for the disruption,

The word is "complexity".  Where do you get "disruption" from?

 > code is a cost, but there is also the uncounted opportunity cost of 
 > *not* adding this useful battery.

Obviously some people think it's useful.  Nobody denies that.  The
problem is *measuring* the opportunity cost of not having the battery,
or the "usefulness" of the battery, as well as measuring the cost of
complexity.  Please stop caricaturing those who oppose the change as
Luddites.

 > I can only think of one scenario where this change might 
 > break someone's code:

Again, who claimed it would break code?

 > The functionality is clear, meets a real need, is backwards compatible, 
 > and has no significant downsides. The only hard part is bikeshedding 
 > names for the methods:
 > 
 >     lcut rcut
 >     cutprefix cutsuffix
 >     ltrim rtrim
 >     prestrip poststrip
 >     etc.
 > 
 > Am I wrong about any of these statements?

It's not obvious to me from the names that the startswith/endswith
test is included in the method, although on reflection it would be
weird if it wasn't.  Still, I wouldn't be surprised to see

    if string.startswith("spam"):
        string.cutprefix("spam")

in a new user's code.

You're wrong about "no significant downsides," in the sense that
that's the wrong criterion.  The right criterion is "if we add a slew
of features that clear the same bar, does the total added benefit from
that set exceed the cost?"  The answer to that question is not a
trivial extrapolation from the question you did ask, because the
benefits will increase approximately linearly in the number of such
features, but the cost of additional complexity is generally
superlinear.

I also disagree they meet a real need, as explained above.  They're
merely convenient.

And the bikeshedding isn't hard.  In the list above, cutprefix/
cutsuffix are far and away the best.



More information about the Python-ideas mailing list