[Python-ideas] New explicit methods to trim strings

Steven D'Aprano steve at pearwood.info
Tue Apr 2 23:54:10 EDT 2019


On Wed, Apr 03, 2019 at 09:58:07AM +1100, Cameron Simpson wrote:

[...]
> Yeah. I was looking at the prefix list from a related article and seeing 
> "intra" and thinking "intractable". Hacky indeed.

That example supports my position that we ought to be cautious about 
allowing multiple prefixes. The correct prefix in that case is in- not 
intra-. Deciding which prefix ought to take precedence requires specific 
domain knowledge, not a simple rule like "first|last|shortest|longest 
wins".


> _Unless_ the word has 
> already been qualified as suitable for the action. And once it is, a 
> cutprefix method would indeed be handy.

Which is precisely the point.

Of course stemming words in full generality is hard. It requires the 
nuclear reactor of something like NLTK, and even that sometimes gets it 
wrong. But this is not a proposal for a natural language stemmer, it is 
a proposal for simple battery which could be used any time you want to 
cut a known prefix or suffix.


[...]
> - the anecdotally not uncommon misuse of .strip() where .cutsuffix() 
>  with be correct

Anecdotal would be "I knew a guy who made this error", but the evidence 
presented is objectively verifiable posts on the bug tracker, mailing 
lists and especially stackoverflow showing that people need to cut 
affixes and misuse strip for that purpose.


> I confess being a little surprised at how few examples which could use 
> cutsuffix I found in my own code, where I had expected it to be common.

I don't expect it to be very common, just common enough to be a repeated 
source of pain.

Its probably more common, and less specialised, than partition and 
zfill, but less common than startswith/endswith.

 
[...]
>     if ifname.endswith(':'):
>       ifname = ifname[:-1]
> 
> Here I DO NOT want rstrip() because I want to strip only one character, 
> rather than as many as there are. So: the optional trailing marker in 
> some input. But doing this for single character markers is much easier 
> to get right than the broader case with longer suffixes, so I think this 
> is not a very strong case.

Imagine that these proposed methods had been added in Python 2.2. Would 
you be even a tiny bit tempted to write that code above, or would you 
use the string method?

Now imagine it's five years from now, and you're using Python 3.11, and 
you came across code somebody (possibly even you!) wrote:

    ifname = ifname.cutsuffix(':')

Would you say "Damn, I wish that method had never been added!" and 
replace it with the earlier code above?

Those two questions are not so much aimed at you, Cameron, personally, 
they're more generic questions for any reader.


-- 
Steven


More information about the Python-ideas mailing list