[Python-ideas] New explicit methods to trim strings

Dan Sommers 2QdxY4RzWzUUiLuE at potatochowder.com
Mon Apr 1 22:10:08 EDT 2019


On 4/1/19 9:34 PM, David Mertz wrote:
> On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano <steve at pearwood.info> wrote:
> 
>> The point I am making is not that we must not ever support multiple
>> affixes, but that we shouldn't rush that decision. Let's pick the
>> low-hanging fruit, and get some real-world experience with the function
>> before deciding how to handle the multiple affix case.
>>
> 
> There are exactly two methods of strings that deal specifically with
> affixes currently. Startswith and endswith. Both of those allow specifying
> multiple affixes. That's pretty strong real-world experience, and breaking
> the symmetry for no reason is merely confusing. Especially since the
> consistency would be obviously as commonly useful.

My imagination is failing me:  for multiple affixes
(affices?), what is a use case for removing one, but not
having the function return which one?  In other words,
shouldn't a function that removes multiple affixes also
return which one(s) were removed?  I think I'm agreeing
with Steven:  take the low hanging fruit now, and worry
about complexification later (because I'm not sure that the
existing API is good when removing multiple affixes).

Stemming is hard, because a lot of words begin/end with
common affixes, but that string of letters isn't always an
affix.  For example, removing common prefixes from "relay"
leaves "lay," but that's not the root; similarly with "relax"
and "area."  If my algorithm is "look for the word in a list
of known words, if it's there then great, but if it's not
then remove one affix and try again," then I don't want to
remove all the affixes at once.

When removing extensions from filenames, all of my use cases
involve removing one at a time and acting on the one that
was removed.  For example, decompressing foo.tar.gz into
foo.tar, and then untarring foo.tar into foo.  I suppose I
can imagine removing tar.gz and then decompressing and
untarring in one step, but again, then I have to know which
suffixes were removed.  Or maybe I could process foo.tar.gz
and want to end up with foo.norm (bonus points for
recognizing the XKCD reference), but my personal preference
would still be to produce foo.tar.gz.norm by default and let
the user specify the ultimate filename if they want something
else.

So I've seen someone (likely David Mertz?) ask for something
like filename.strip_suffix(('.png', '.jpg')).  What is the
context?  Is it strictly a filename processing program?  Do
you subsequently have to determine the suffix(es) at hand?


More information about the Python-ideas mailing list