[Python-ideas] New explicit methods to trim strings
Steven D'Aprano
steve at pearwood.info
Tue Apr 2 23:54:10 EDT 2019
On Wed, Apr 03, 2019 at 09:58:07AM +1100, Cameron Simpson wrote:
[...]
> Yeah. I was looking at the prefix list from a related article and seeing
> "intra" and thinking "intractable". Hacky indeed.
That example supports my position that we ought to be cautious about
allowing multiple prefixes. The correct prefix in that case is in- not
intra-. Deciding which prefix ought to take precedence requires specific
domain knowledge, not a simple rule like "first|last|shortest|longest
wins".
> _Unless_ the word has
> already been qualified as suitable for the action. And once it is, a
> cutprefix method would indeed be handy.
Which is precisely the point.
Of course stemming words in full generality is hard. It requires the
nuclear reactor of something like NLTK, and even that sometimes gets it
wrong. But this is not a proposal for a natural language stemmer, it is
a proposal for simple battery which could be used any time you want to
cut a known prefix or suffix.
[...]
> - the anecdotally not uncommon misuse of .strip() where .cutsuffix()
> with be correct
Anecdotal would be "I knew a guy who made this error", but the evidence
presented is objectively verifiable posts on the bug tracker, mailing
lists and especially stackoverflow showing that people need to cut
affixes and misuse strip for that purpose.
> I confess being a little surprised at how few examples which could use
> cutsuffix I found in my own code, where I had expected it to be common.
I don't expect it to be very common, just common enough to be a repeated
source of pain.
Its probably more common, and less specialised, than partition and
zfill, but less common than startswith/endswith.
[...]
> if ifname.endswith(':'):
> ifname = ifname[:-1]
>
> Here I DO NOT want rstrip() because I want to strip only one character,
> rather than as many as there are. So: the optional trailing marker in
> some input. But doing this for single character markers is much easier
> to get right than the broader case with longer suffixes, so I think this
> is not a very strong case.
Imagine that these proposed methods had been added in Python 2.2. Would
you be even a tiny bit tempted to write that code above, or would you
use the string method?
Now imagine it's five years from now, and you're using Python 3.11, and
you came across code somebody (possibly even you!) wrote:
ifname = ifname.cutsuffix(':')
Would you say "Damn, I wish that method had never been added!" and
replace it with the earlier code above?
Those two questions are not so much aimed at you, Cameron, personally,
they're more generic questions for any reader.
--
Steven
More information about the Python-ideas
mailing list