[Python-ideas] New explicit methods to trim strings

Steven D'Aprano steve at pearwood.info
Sun Mar 31 21:34:27 EDT 2019


On Sun, Mar 31, 2019 at 08:23:05PM -0400, David Mertz wrote:
> On Sun, Mar 31, 2019, 8:11 PM Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > Regarding later proposals to add support for multiple affixes, to
> > recursively delete the affix repeatedly, and to take an additional
> > argument to limit how many affixes will be removed: YAGNI.
> >
> 
> That's simply not true, and I think it's clearly illustrated by the example
> I gave a few times. Not just conceivably, but FREQUENTLY I write code to
> accomplish the effect of the suggested:
> 
>   basename = fname.rstrip(('.jpg', '.gif', '.png'))
> 
> I probably do this MORE OFTEN than removing a single suffix.

Okay.

Yesterday, you stated that you didn't care what the behaviour was for 
the multiple affix case. You made it clear that "any" semantics would be 
okay with you so long as it was documented. You seemed to feel so 
strongly about your indifference that you mentioned it in two seperate 
emails.

That doesn't sound like someone who has a clear use-case in mind. If 
you're doing this frequently, then surely one of the following two 
alternatives apply:

(1) One specific behaviour makes sense for all or a majority of your 
use-cases, in which case you would prefer that behaviour rather than 
something that you can't use.

(2) Or there is no single useful behaviour that you want, perhaps all or 
a majority of your use-cases are different, and you'll usually need to 
write your own helper function to suit your own usage, no matter what 
the builtin behaviour is. Hence you don't care what the builtin 
behaviour is.

Since you have no preferred behaviour, either you don't do this often 
enough to care (but above you say differently), or you are going to have 
to write your own helpers because the behaviour you need won't match the 
behaviour of the builtin. And you clearly don't mind this, because you 
stated twice that you don't care what the builtin behaviour is.

So why rush to handle the multiple argument case?

"YAGNI" is a misnomer, because it doesn't actually mean "you aren't 
(ever) going to need it". It means (generic) you don't need it *now*, 
but when you do, you can come back and revisit the design with concrete 
use-cases in mind.

That's all I'm saying.

For 29 years, we've done without this string primitive, and as a 
consequence the forums are full of examples of people misusing strip and 
getting it wrong. There's a clear case for the single argument version, 
and fixing that is the 90% solution.

In comparison, we've been discussing this multiple affix feature for, 
what, a week?

Lacking a good set of semantics for removing multiple affixes at once, 
we shouldn't rush to guess what people want. You don't even know what 
behaviour YOU want, let alone what the community as a whole needs.

You won't be any worse off than you are now. You'll probably be better 
off, because you can use the single-affix version as the basic 
primitive, and build on top of that, instead of the incorrect version 
you currently use in an ad hoc manner:

    basename = fname.split(".ext")[0]
    # replace with fname.cut_suffix(".ext")

Others have already pointed out why the split version is incorrect.

For the use-case of stripping a single file extension out of a set of 
such extensions, while leaving all others, there's an obvious solution:

    if fname.endswith(('.jpg', '.png', '.gif'):
        basename = os.path.splitext(fname)[0]
    else:
        # Any other extension stays with the base.
        # (Presumably to be handled seperately?)
        basename = fname

But a more general solution needs to decide on two issues:

- given two affixes where one is an affix of the other, which wins? e.g.

  "abcd".cut_prefix(("a", "ab")) 
  # should this return "bcd" or "cd"?

- once you remove an affix, should you stop processing or continue?

  "ab".cut_prefix(("a", "b"))
  # should this return "b" or ""?

The startswith and endswith methods don't suffer from this problem, for 
obvious reasons. We shouldn't add a problematic, ambiguous feature just 
for consistency with methods where it is not problematic or ambiguous.

I posted links to prior art. Unless I missed something, not one of those 
languages or libraries supports multiple affixes in the one call.

Don't let the perfect be the enemy of the good. In this case, a 90% 
solution will let us fix real problems and meet real needs, and we can 
always revisit the multiple affix case once we have more experience and 
have time to build a consensus based on actual use-cases.


-- 
Steven


More information about the Python-ideas mailing list