[Python-ideas] New explicit methods to trim strings

Fri Mar 29 21:37:36 EDT 2019

On Fri, Mar 29, 2019 at 04:05:55PM -0700, Christopher Barker wrote:

> This proposal would provide a minor gain for an even more minor disruption.

I don't think that is correct. I think you are underestimating the gain 
and exaggerating the disruption :-)

Cutting a prefix or suffix from a string is a common task, and there is 
no obvious "battery" in the std lib available for it. And there is a 
long history of people mistaking strip() and friends as that battery. 
The problem is that it seems to work:

py> "something.zip".rstrip(".zip")
'something'

until it doesn't:

py> "something.jpg".rstrip(".jpg")
'somethin'

It is *very common* for people to trip over this and think they have 
found a bug:

https://duckduckgo.com/?q=python+bug+in+strip

I would guestimate that for every person who think that they found a 
bug, there are probably a hundred who trip over this and then realise 
their error without ever going public. I believe this is a real pain 
point for people doing string processing. I know it has bitten me once 
or twice.

The correct solution is a verbose statement:

    if string.startswith("spam"):
        string = string[:len("spam")]

which repeats itself (*two* references to the prefix being removed, 
*three* references to the string being cut). The expression form is 
no better:

    process(a, b, string[:len("spam")] if string.startswith("spam") else string, c)

and heaven help you if you need to cut from both ends. To make that 
practical, you really need a helper function. Now that's fine as far as 
it goes, but why do we make people re-invent the wheel over and over 
again?

A pair of "cut" methods (cut prefix, cut suffix) fills a real need, and 
will avoid a lot of mistaken bug reports/questions.

As for the disruption, I don't see that this will cause *any* disruption 
at all, beyond bike-shedding the method names and doing an initial 
implementation. It is a completely backwards compatible change. Since we 
can't monkey-patch builtins, this isn't going to break anyone's use of 
str. Any subclasses of str which define the same methods will still 
work.

I've sometimes said in the past that any change will break *someone's* 
code, and so we should be risk-adverse. I still stand by that, but we 
shouldn't be *so risk adverse* that we're paralysed. Breaking users' 
code is a cost, but there is also the uncounted opportunity cost of 
*not* adding this useful battery.

If we don't add these new methods, how many hundreds of users over the 
next decade will we condemn to repeating the same old misuse of strip() 
that has been misused so often in the past? How much developer time will 
be wasted writing, and then closing, bug reports like this?

https://bugs.python.org/issue5318

Inaction has costs too. 

I can only think of one scenario where this change might 
break someone's code:

- we decide on method names (let's say) lcut and rcut;
- somebody else already has a class with lcut and rcut;
- which does something completely different;
- and they use hasattr() to decide whether to call those methods, 
  rather than isinstance:

    if hasattr(myobj, 'lcut'):
        print(myobj.lcut(1, 2, 3, 4))
    else:
        # do something else

- and they sometimes pass strings into this code.

In 3.7 and older, ordinary strings will take the second path. If we add 
these methods, they will take the first path.

But the chances of this actually being more than a trivially small 
problem for anyone in real life is so small that I don't know why I even 
raise it. This isn't a minor disruption. Its a small possibility of a 
minor disruption to a tiny set of users who can fix the breakage easily.

The functionality is clear, meets a real need, is backwards compatible, 
and has no significant downsides. The only hard part is bikeshedding 
names for the methods:

    lcut rcut
    cutprefix cutsuffix
    ltrim rtrim
    prestrip poststrip
    etc.

Am I wrong about any of these statements?

-- 
Steven