On 4/1/19 9:34 PM, David Mertz wrote:
On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano <steve@pearwood.info> wrote:
The point I am making is not that we must not ever support multiple affixes, but that we shouldn't rush that decision. Let's pick the low-hanging fruit, and get some real-world experience with the function before deciding how to handle the multiple affix case.
There are exactly two methods of strings that deal specifically with affixes currently. Startswith and endswith. Both of those allow specifying multiple affixes. That's pretty strong real-world experience, and breaking the symmetry for no reason is merely confusing. Especially since the consistency would be obviously as commonly useful.
My imagination is failing me: for multiple affixes (affices?), what is a use case for removing one, but not having the function return which one? In other words, shouldn't a function that removes multiple affixes also return which one(s) were removed? I think I'm agreeing with Steven: take the low hanging fruit now, and worry about complexification later (because I'm not sure that the existing API is good when removing multiple affixes). Stemming is hard, because a lot of words begin/end with common affixes, but that string of letters isn't always an affix. For example, removing common prefixes from "relay" leaves "lay," but that's not the root; similarly with "relax" and "area." If my algorithm is "look for the word in a list of known words, if it's there then great, but if it's not then remove one affix and try again," then I don't want to remove all the affixes at once. When removing extensions from filenames, all of my use cases involve removing one at a time and acting on the one that was removed. For example, decompressing foo.tar.gz into foo.tar, and then untarring foo.tar into foo. I suppose I can imagine removing tar.gz and then decompressing and untarring in one step, but again, then I have to know which suffixes were removed. Or maybe I could process foo.tar.gz and want to end up with foo.norm (bonus points for recognizing the XKCD reference), but my personal preference would still be to produce foo.tar.gz.norm by default and let the user specify the ultimate filename if they want something else. So I've seen someone (likely David Mertz?) ask for something like filename.strip_suffix(('.png', '.jpg')). What is the context? Is it strictly a filename processing program? Do you subsequently have to determine the suffix(es) at hand?