[Python-ideas] New explicit methods to trim strings

David Mertz mertz at gnosis.cx
Sun Mar 31 22:58:24 EDT 2019

On Sun, Mar 31, 2019, 9:35 PM Steven D'Aprano <steve at pearwood.info> wrote:

> > That's simply not true, and I think it's clearly illustrated by the
> example I gave a few times. Not just conceivably, but FREQUENTLY I write
> code to accomplish the effect of the suggested:
> >
> >   basename = fname.rstrip(('.jpg', '.gif', '.png'))
> >
> > I probably do this MORE OFTEN than removing a single suffix.
> Okay.
> Yesterday, you stated that you didn't care what the behaviour was for the
> multiple affix case. You made it clear that "any" semantics would be okay
> with you so long as it was documented. You seemed to feel so strongly about
> your indifference that you mentioned it in two seperate emails.

Yes. Because the multiple affix is an edge case that will rarely affect any
of my code. I.e. I don't care much when a single string had multiple
candidate affixes, because that's just not a common situation. That doesn't
mean I'm indifferent to the core purpose that I need frequently. Any of the
several possible behaviors in the edge case will not affect my desired
usage whatsoever.

That doesn't sound like someone who has a clear use-case in mind. If you're
> doing this frequently, then surely one of the following two alternatives
> apply:

I don't think I've ever written code that cares about the edge case you
focus on. Ok, I guess technically the code I've written is all buggy in the
sense that it would behave in a manner I haven't thought through when
presented with weird input. Perhaps I should always have been more careful
about those edges.

There simply is no "majority of the time" for a situation I've never
specifically coded for.

The rest gets more and more sophistical.  I'm sure most people here have
written code similar to this (maybe structured differently, but same

for fname in filenames:
    basename, ext = fname.rsplit('.', 1)
    if ext in {'jpg', 'gif', 'png'}:

In all the times I've written things close to that, I've never thought
about files named 'silly.jpg.gif.png.gif.jpg'. The sophistry is insistently
asking "but what about...?" of this edge case.

For 29 years, we've done without this string primitive, and as a
> consequence the forums are full of examples of people misusing strip and
> getting it wrong.

It's interesting that you keep raising this error. I've made a whole lot of
silly mistakes in Python (and other languages). I have never for a moment
been tempted to think .rstrip() would remove a suffix rather than a
character class.

I did write the book Text Processing in Python a very long time ago, so
I've thought a bit about text processing in Python. Maybe it's just that
I'm comfortable enough with regexen that thinking of a character class
doesn't feel strange to me.

There's a clear case for the single argument version, and fixing that is
> the 90% solution.

I think there's very little case for a single argument version. At best,
it's a 10% solution.

Lacking a good set of semantics for removing multiple affixes at once, we
> shouldn't rush to guess what people want. You don't even know what
> behaviour YOU want, let alone what the community as a whole needs.

This is both dumb and dishonest. There are basically two choices, both
completely clear. I think the more obvious one is to treat several prefixes
or suffixes as substring class, much as .[rl]strip() does character class.

But another choice indeed is to remove at most one of the affixes. I think
that's a little bit less good for the edge case. But it would be fine
also... and as I keep writing, the difference would almost always be moot,
it just needs to be documented.

 the use-case of stripping a single file extension out of a set of
> such extensions, while leaving all others, there's an obvious solution:
>     if fname.endswith(('.jpg', '.png', '.gif'):
>         basename = os.path.splitext(fname)[0]

I should probably use of.path.splitext() more than I do. But that's just an
example. Another is, e.g. 'if url.startswith(('http://', 'sftp://',
's3://')): ...'. And lots of similar things that aren't addressed by
os.path.splitext(). E.g. 'if logline.startswith(('WARNING', 'ERROR')): ...'

I posted links to prior art. Unless I missed something, not one of those
> languages or libraries supports multiple affixes in the one call.

Also, none of those languages support the amazingly useful signature of
str.startswith(tuple). Well, they do in the sense they support regexen. But
not as a standard method or function on strings. I don't even know if PHP
with it's 5000 string functions had this great convenience.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190331/e0d0e770/attachment-0001.html>

More information about the Python-ideas mailing list