On Sun, Mar 31, 2019, 9:35 PM Steven D'Aprano <steve@pearwood.info> wrote:
That's simply not true, and I think it's clearly illustrated by the example I gave a few times. Not just conceivably, but FREQUENTLY I write code to accomplish the effect of the suggested:
basename = fname.rstrip(('.jpg', '.gif', '.png'))
I probably do this MORE OFTEN than removing a single suffix.
Okay.
Yesterday, you stated that you didn't care what the behaviour was for the multiple affix case. You made it clear that "any" semantics would be okay with you so long as it was documented. You seemed to feel so strongly about your indifference that you mentioned it in two seperate emails.
Yes. Because the multiple affix is an edge case that will rarely affect any of my code. I.e. I don't care much when a single string had multiple candidate affixes, because that's just not a common situation. That doesn't mean I'm indifferent to the core purpose that I need frequently. Any of the several possible behaviors in the edge case will not affect my desired usage whatsoever. That doesn't sound like someone who has a clear use-case in mind. If you're
doing this frequently, then surely one of the following two alternatives apply:
I don't think I've ever written code that cares about the edge case you focus on. Ok, I guess technically the code I've written is all buggy in the sense that it would behave in a manner I haven't thought through when presented with weird input. Perhaps I should always have been more careful about those edges. There simply is no "majority of the time" for a situation I've never specifically coded for. The rest gets more and more sophistical. I'm sure most people here have written code similar to this (maybe structured differently, but same purpose): for fname in filenames: basename, ext = fname.rsplit('.', 1) if ext in {'jpg', 'gif', 'png'}: do_stuff(basename) In all the times I've written things close to that, I've never thought about files named 'silly.jpg.gif.png.gif.jpg'. The sophistry is insistently asking "but what about...?" of this edge case. For 29 years, we've done without this string primitive, and as a
consequence the forums are full of examples of people misusing strip and getting it wrong.
It's interesting that you keep raising this error. I've made a whole lot of silly mistakes in Python (and other languages). I have never for a moment been tempted to think .rstrip() would remove a suffix rather than a character class. I did write the book Text Processing in Python a very long time ago, so I've thought a bit about text processing in Python. Maybe it's just that I'm comfortable enough with regexen that thinking of a character class doesn't feel strange to me. There's a clear case for the single argument version, and fixing that is
the 90% solution.
I think there's very little case for a single argument version. At best, it's a 10% solution. Lacking a good set of semantics for removing multiple affixes at once, we
shouldn't rush to guess what people want. You don't even know what behaviour YOU want, let alone what the community as a whole needs.
This is both dumb and dishonest. There are basically two choices, both completely clear. I think the more obvious one is to treat several prefixes or suffixes as substring class, much as .[rl]strip() does character class. But another choice indeed is to remove at most one of the affixes. I think that's a little bit less good for the edge case. But it would be fine also... and as I keep writing, the difference would almost always be moot, it just needs to be documented.
the use-case of stripping a single file extension out of a set of
such extensions, while leaving all others, there's an obvious solution:
if fname.endswith(('.jpg', '.png', '.gif'): basename = os.path.splitext(fname)[0]
I should probably use of.path.splitext() more than I do. But that's just an example. Another is, e.g. 'if url.startswith(('http://', 'sftp://', 's3://')): ...'. And lots of similar things that aren't addressed by os.path.splitext(). E.g. 'if logline.startswith(('WARNING', 'ERROR')): ...' I posted links to prior art. Unless I missed something, not one of those
languages or libraries supports multiple affixes in the one call.
Also, none of those languages support the amazingly useful signature of str.startswith(tuple). Well, they do in the sense they support regexen. But not as a standard method or function on strings. I don't even know if PHP with it's 5000 string functions had this great convenience.