On 02/04/2019 01:52, Steven D'Aprano wrote:
Here's a partial list of English prefixes that somebody doing text processing might want to remove to get at the root word:
a an ante anti auto circum co com con contra contro de dis en ex extra hyper il im in ir inter intra intro macro micro mono non omni post pre pro sub sym syn tele un uni up
I count fourteen clashes:
a: an ante anti an: ante anti co: com con contra contro ex: extra in: inter intra intro un: uni
(That's over a third of this admittedly incomplete list of prefixes.)
I can think of at least one English suffix pair that clash: -ify, -fy.
You're beginning to persuade me that cut/trim methods/functions aren't a good idea :-) So far we have two slightly dubious use-cases. 1. Stripping file extensions. Personally I find that treating filenames like filenames (i.e. using os.path or (nowadays) pathlib) results in me thinking more appropriately about what I'm doing. 2. Stripping prefixes and suffixes to get to root words. Python has been used for natural language work for over a decade, and I don't think I've heard any great call from linguists for the functionality. English isn't a girl who puts out like that on a first date :-) There are too many common exception cases for such a straightforward approach not to cause confusion. 3. My most common use case (not very common at that) is for stripping annoying prompts off text-based APIs. I'm happy using .startswith() and string slicing for that, though your point about the repeated use of the string to be stripped off (or worse, hard-coding its length) is well made. I am beginning to worry slightly that actually there are usually more appropriate things to do than simply cutting off affixes, and that in providing these particular batteries we might be encouraging poor practise. -- Rhodri James *-* Kynesim Ltd