[Python-ideas] New explicit methods to trim strings

David Mertz mertz at gnosis.cx
Mon Apr 1 21:34:21 EDT 2019

On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano <steve at pearwood.info> wrote:

> The point I am making is not that we must not ever support multiple
> affixes, but that we shouldn't rush that decision. Let's pick the
> low-hanging fruit, and get some real-world experience with the function
> before deciding how to handle the multiple affix case.

There are exactly two methods of strings that deal specifically with
affixes currently. Startswith and endswith. Both of those allow specifying
multiple affixes. That's pretty strong real-world experience, and breaking
the symmetry for no reason is merely confusing. Especially since the
consistency would be obviously as commonly useful.

Now look, the sky won't fall if a single-affix-only method is added. For
that matter, it won't if nothing is added. In fact, the single affix
version makes it a little bit easier to write a custom function handling
multiple affixes.

And the sky won't fall if the remove-just-one semantics are used rather
than remove-from-class.

But adding methods with sneakily helpful capabilities often helps users
greatly. A lot of folks in this thread didn't even know about passing a
tuple to str.startswith() a few days ago. I'm pretty sure that capability
was added by Raymond, who has an amazingly good sense of what little tricks
can prove really powerful. Apologies to a different developer if it wasn't
him, but congrats and thanks to you if so.

Somebody (I won't name names, but they know who they are) wrote to me
> off-list some time ago and accused me of being arrogant and thinking I know
> more than everyone else. Well perhaps I am, but I'm not so arrogant as to
> think that I can choose the right behaviour for clashing affixes for other
> people when my own use-cases don't have clashing affixes.

That could be me... Unless it's someone else :-). I think my intent was a
bit different than you characterize, but I'm very guilty of presuming too
much also. So mea culpa.

> Sure, but I've often wanted to do something like "strip off a prefix
> > of http:// or https://", or something else that doesn't have a
> > semantic that's known to the stdlib.
> I presume there's a reason you aren't using urllib.parse and you just need
> a string without the leading scheme. If you're doing further parsing, the
> stdlib has the right batteries for that.

I know there are lots of specialized string manipulations in the STDLIB.
Yeah, I could use os.path.splitext, and os.path.split, and
urllib.parse.something, and lots of other things I rarely use. A lot of us
like to manipulate strings in generically stringy ways.

But not until we had a couple of releases of experience with them:
> https://docs.python.org/2.7/library/stdtypes.html#l.endswith
> <https://docs.python.org/2.7/library/stdtypes.html#str.endswith>

Ok. Fair point. I used Python 2.4 without the multiple affix option.

Here's a partial list of English prefixes that somebody doing text
> processing might want to remove to get at the root word:
>     a an ante anti auto circum co com con contra contro de dis
>     en ex extra hyper il im in ir inter intra intro macro micro
>     mono non omni post pre pro sub sym syn tele un uni up
> I count fourteen clashes:
>     a: an ante anti
>     an: ante anti
>     co: com con contra contro
>     ex: extra
>     in: inter intra intro
>     un: uni

This seems like a good argument for remove-all-from-class. :-)

    stem = word.lstrip(prefix_tup)

But the we really need 'word.porter_stemmer()' as a built-in method.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190401/e4e576d1/attachment.html>

More information about the Python-ideas mailing list