On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano <steve@pearwood.info> wrote:
The point I am making is not that we must not ever support multiple affixes, but that we shouldn't rush that decision. Let's pick the low-hanging fruit, and get some real-world experience with the function before deciding how to handle the multiple affix case.
There are exactly two methods of strings that deal specifically with affixes currently. Startswith and endswith. Both of those allow specifying multiple affixes. That's pretty strong real-world experience, and breaking the symmetry for no reason is merely confusing. Especially since the consistency would be obviously as commonly useful. Now look, the sky won't fall if a single-affix-only method is added. For that matter, it won't if nothing is added. In fact, the single affix version makes it a little bit easier to write a custom function handling multiple affixes. And the sky won't fall if the remove-just-one semantics are used rather than remove-from-class. But adding methods with sneakily helpful capabilities often helps users greatly. A lot of folks in this thread didn't even know about passing a tuple to str.startswith() a few days ago. I'm pretty sure that capability was added by Raymond, who has an amazingly good sense of what little tricks can prove really powerful. Apologies to a different developer if it wasn't him, but congrats and thanks to you if so. Somebody (I won't name names, but they know who they are) wrote to me
off-list some time ago and accused me of being arrogant and thinking I know more than everyone else. Well perhaps I am, but I'm not so arrogant as to think that I can choose the right behaviour for clashing affixes for other people when my own use-cases don't have clashing affixes.
That could be me... Unless it's someone else :-). I think my intent was a bit different than you characterize, but I'm very guilty of presuming too much also. So mea culpa.
Sure, but I've often wanted to do something like "strip off a prefix
of http:// or https://", or something else that doesn't have a semantic that's known to the stdlib.
I presume there's a reason you aren't using urllib.parse and you just need a string without the leading scheme. If you're doing further parsing, the stdlib has the right batteries for that.
I know there are lots of specialized string manipulations in the STDLIB. Yeah, I could use os.path.splitext, and os.path.split, and urllib.parse.something, and lots of other things I rarely use. A lot of us like to manipulate strings in generically stringy ways. But not until we had a couple of releases of experience with them:
https://docs.python.org/2.7/library/stdtypes.html#l.endswith <https://docs.python.org/2.7/library/stdtypes.html#str.endswith>
Ok. Fair point. I used Python 2.4 without the multiple affix option. Here's a partial list of English prefixes that somebody doing text
processing might want to remove to get at the root word:
a an ante anti auto circum co com con contra contro de dis en ex extra hyper il im in ir inter intra intro macro micro mono non omni post pre pro sub sym syn tele un uni up
I count fourteen clashes:
a: an ante anti an: ante anti co: com con contra contro ex: extra in: inter intra intro un: uni
This seems like a good argument for remove-all-from-class. :-) stem = word.lstrip(prefix_tup) But the we really need 'word.porter_stemmer()' as a built-in method.