On Mon, Apr 01, 2019 at 02:29:44PM +1100, Chris Angelico wrote:
The multiple affix case has exactly two forms:
1) Tearing multiple affixes off (eg stripping "asdf.jpg.png" down to just "asdf"), which most people are saying "no, don't do that, it doesn't make sense and isn't needed"
Perhaps I've missed something obvious (its been a long thread, and I'm badly distracted with hardware issues that are causing me some considerable grief), but I haven't seen anyone say "don't do that". But I have seen David Mertz say that this was the best behaviour: [quote] fname = 'silly.jpg.png.gif.png.jpg.gif.jpg' I'm honestly not sure what behavior would be useful most often for this oddball case. For the suffixes, I think "remove them all" is probably the best [end quote] I'd also like to point out that this is not an oddball case. There are two popular platforms where file extensions are advisory not mandatory (Linux and Mac), but even on Windows it is possible to get files with multiple, meaningful, extensions (foo.tar.gz for example) as well as periods used in place of spaces (a.funny.cat.video.mp4).
2) Removing one of several options, which implies that one option is a strict subpiece of another (eg stripping off "test" and "st")
I take it you're only referring to the problematic cases, because there's the third option, where none of the affixes to be removed clash: spam.cut_suffix(("ed", "ing")) But that's pretty uninteresting and a simple loop or repeated call to the method will work fine: spam.cut_suffix("ed").cut_suffix("ing") just as we do with replace: spam.replace(",", "").replace(" ", "") If you only have a few affixes to work with, this is fine. If you have a lot, you may want a helper function, but that's okay.
If anyone is advocating for #1, I would agree with saying YAGNI.
David Mertz did.
But #2 is an extremely unlikely edge case, and whatever semantics are chosen for it, *normal* usage will not be affected.
Not just unlikely, but "extremely" unlikely? Presumably you didn't just pluck that statement out of thin air, but have based it on an objective and statistically representative review of existing code and projections of future uses of these new methods. How could I possibly argue with that? Except to say that I think it is recklessly irresponsible for people engaged in language design to dismiss edge cases which will cause users real bugs and real pain so easily. We're not designing for our personal toolbox, we're designing for hundreds of thousands of other people with widely varying needs. It might be rare for you, but for somebody it will be happening ten times a day. And for somebody else, it will only happen once a year, but when it does, their code won't raise an exception it will just silently do the wrong thing. This is why replace does not take a set of multiple targets to replace. The user, who knows their own use-case and what behaviour they want, can write their own multiple-replace function, and we don't have to guess what they want. The point I am making is not that we must not ever support multiple affixes, but that we shouldn't rush that decision. Let's pick the low-hanging fruit, and get some real-world experience with the function before deciding how to handle the multiple affix case. [...]
Or all the behaviours actually do the same thing anyway.
In this thread, I keep hearing this message: "My own personal use-case will never be affected by clashing affixes, so I don't care what behaviour we build into the language, so long as we pick something RIGHT NOW and don't give the people actually affected time to use the method and decide what works best in practice for them." Like for the str.replace method, the final answer might be "there is no best behaviour and we should refuse to choose". Why are we rushing to permanently enshrine one specific behaviour into the builtins before any of the users of the feature have a chance to use it and decide for themselves which suits them best? Now is better than never. Although never is often better than *right* now. Somebody (I won't name names, but they know who they are) wrote to me off-list some time ago and accused me of being arrogant and thinking I know more than everyone else. Well perhaps I am, but I'm not so arrogant as to think that I can choose the right behaviour for clashing affixes for other people when my own use-cases don't have clashing affixes. [...]
Sure, but I've often wanted to do something like "strip off a prefix of http:// or https://", or something else that doesn't have a semantic that's known to the stdlib.
I presume there's a reason you aren't using urllib.parse and you just need a string without the leading scheme. If you're doing further parsing, the stdlib has the right batteries for that. (Aside: perhaps urllib.parse.ParseResult should get an attribute to return the URL minus the scheme? That seems like it would be useful.)
Also, this is still fairly verbose, and a lot of people are going to reach for a regex, just because it can be done in one line of code.
Okay, they will use a regex. Is this a problem? We're not planning on banning regexes are we? If they're happy using regexes, and don't care that it will be perhaps 3 times slower, let them.
I posted links to prior art. Unless I missed something, not one of those languages or libraries supports multiple affixes in the one call.
And they don't support multiple affixes in startswith/endswith either, but we're very happy to have that in Python.
But not until we had a couple of releases of experience with them: https://docs.python.org/2.7/library/stdtypes.html#str.endswith And .replace still only takes a single target to be replaced. [...]
We don't have to worry about edge cases that are unlikely to come up in real-world code,
And you are making that pronouncement on the basis of what? Your gut feeling? Perhaps you're thinking too narrowly. Here's a partial list of English prefixes that somebody doing text processing might want to remove to get at the root word: a an ante anti auto circum co com con contra contro de dis en ex extra hyper il im in ir inter intra intro macro micro mono non omni post pre pro sub sym syn tele un uni up I count fourteen clashes: a: an ante anti an: ante anti co: com con contra contro ex: extra in: inter intra intro un: uni (That's over a third of this admittedly incomplete list of prefixes.) I can think of at least one English suffix pair that clash: -ify, -fy. How about other languages? How comfortable are you to say that nobody doing text processing in German or Hindi will need to deal with clashing affixes? -- Steven