Rhodri James writes: Steven d'Aprano writes:
(That's over a third of this admittedly incomplete list of prefixes.)
I can think of at least one English suffix pair that clash: -ify, -fy.
And worse: is "tries" the third person present tense of "try" or is it the plural of "trie"? Pure lexical manipulation can't tell you.
You're beginning to persuade me that cut/trim methods/functions aren't a good idea :-)
I don't think I would go there yet (well, I started there, but...).
So far we have two slightly dubious use-cases.
1. Stripping file extensions. Personally I find that treating filenames like filenames (i.e. using os.path or (nowadays) pathlib) results in me thinking more appropriately about what I'm doing.
Very much agree.
2. Stripping prefixes and suffixes to get to root words.
for suffix in english_suffixes: root = word.cutsuffix(suffix) if lookup_in_dictionary(root): do_something_appropriate_with_each_root_found() is surely more flexible and accurate than a hard-coded slice, and significantly more readable than for suffix in english_suffixes: root = word[:-len(suffix)] if word.endswith(suffix) else word if lookup_in_dictionary(root): do_something_appropriate_with_each_root_found() I think enough so that I might use a local def for cutsuffix if the method doesn't exist. So my feeling is that the use case for "or"-ing multiple suffixes is a lot weaker than it is for .endswith, but .cutsuffix itself is plausible. That said, I wouldn't add it if it were up to me. Among other things, for this root-extracting application def extract_root(word, prefix, suffix): word = word[len(prefix):] if word.endswith(prefix) else word word = word[:-len(suffix)] if word.endswith(suffix) else word # perhaps try further transforms like tri -> try here? return word and a double loop for prefix in english_prefixes: # includes '' for suffix in english_suffixes: # includes '' root = extract_root(word, prefix, suffix) if lookup_in_dictionary(root): yield root (probably recursive, as well) seems most elegant.
3. My most common use case (not very common at that) is for stripping annoying prompts off text-based APIs. I'm happy using .startswith() and string slicing for that, though your point about the repeated use of the string to be stripped off (or worse, hard-coding its length) is well made.
I don't understand this use case, specifically the opposition to hard-coding the length. Although hard-coding the length wouldn't occur to me in many cases, since I'd use # remove my bash prompt prompt_re = re.compile(r'^[^\u0000-\u001f\u007f]+ \d\d:\d\d\$ ') lines = [prompt_re.sub('', line) for line in lines] if I understand the task correctly. Similarly, there's a lot of regexp-removable junk in MTA logs, timestamps and DNS lookups for example, that can't be handled with cutprefix.
I am beginning to worry slightly that actually there are usually more appropriate things to do than simply cutting off affixes, and that in providing these particular batteries we might be encouraging poor practise.
I don't think that's a worry, at least if restricted to the single-affix form, because simply cutting off affixes is surely part of most such algorithms. The harder part is remembering that you probably have to deal with multiplicities and further transformations, but that can't be incentivized by refusing to implement .cutsuffix. It's an independent consideration. Steve