[Python-ideas] New explicit methods to trim strings
Steven D'Aprano
steve at pearwood.info
Fri Mar 29 21:37:36 EDT 2019
On Fri, Mar 29, 2019 at 04:05:55PM -0700, Christopher Barker wrote:
> This proposal would provide a minor gain for an even more minor disruption.
I don't think that is correct. I think you are underestimating the gain
and exaggerating the disruption :-)
Cutting a prefix or suffix from a string is a common task, and there is
no obvious "battery" in the std lib available for it. And there is a
long history of people mistaking strip() and friends as that battery.
The problem is that it seems to work:
py> "something.zip".rstrip(".zip")
'something'
until it doesn't:
py> "something.jpg".rstrip(".jpg")
'somethin'
It is *very common* for people to trip over this and think they have
found a bug:
https://duckduckgo.com/?q=python+bug+in+strip
I would guestimate that for every person who think that they found a
bug, there are probably a hundred who trip over this and then realise
their error without ever going public. I believe this is a real pain
point for people doing string processing. I know it has bitten me once
or twice.
The correct solution is a verbose statement:
if string.startswith("spam"):
string = string[:len("spam")]
which repeats itself (*two* references to the prefix being removed,
*three* references to the string being cut). The expression form is
no better:
process(a, b, string[:len("spam")] if string.startswith("spam") else string, c)
and heaven help you if you need to cut from both ends. To make that
practical, you really need a helper function. Now that's fine as far as
it goes, but why do we make people re-invent the wheel over and over
again?
A pair of "cut" methods (cut prefix, cut suffix) fills a real need, and
will avoid a lot of mistaken bug reports/questions.
As for the disruption, I don't see that this will cause *any* disruption
at all, beyond bike-shedding the method names and doing an initial
implementation. It is a completely backwards compatible change. Since we
can't monkey-patch builtins, this isn't going to break anyone's use of
str. Any subclasses of str which define the same methods will still
work.
I've sometimes said in the past that any change will break *someone's*
code, and so we should be risk-adverse. I still stand by that, but we
shouldn't be *so risk adverse* that we're paralysed. Breaking users'
code is a cost, but there is also the uncounted opportunity cost of
*not* adding this useful battery.
If we don't add these new methods, how many hundreds of users over the
next decade will we condemn to repeating the same old misuse of strip()
that has been misused so often in the past? How much developer time will
be wasted writing, and then closing, bug reports like this?
https://bugs.python.org/issue5318
Inaction has costs too.
I can only think of one scenario where this change might
break someone's code:
- we decide on method names (let's say) lcut and rcut;
- somebody else already has a class with lcut and rcut;
- which does something completely different;
- and they use hasattr() to decide whether to call those methods,
rather than isinstance:
if hasattr(myobj, 'lcut'):
print(myobj.lcut(1, 2, 3, 4))
else:
# do something else
- and they sometimes pass strings into this code.
In 3.7 and older, ordinary strings will take the second path. If we add
these methods, they will take the first path.
But the chances of this actually being more than a trivially small
problem for anyone in real life is so small that I don't know why I even
raise it. This isn't a minor disruption. Its a small possibility of a
minor disruption to a tiny set of users who can fix the breakage easily.
The functionality is clear, meets a real need, is backwards compatible,
and has no significant downsides. The only hard part is bikeshedding
names for the methods:
lcut rcut
cutprefix cutsuffix
ltrim rtrim
prestrip poststrip
etc.
Am I wrong about any of these statements?
--
Steven
More information about the Python-ideas
mailing list