[Python-ideas] New explicit methods to trim strings
Steven D'Aprano
steve at pearwood.info
Sun Mar 31 21:34:27 EDT 2019
On Sun, Mar 31, 2019 at 08:23:05PM -0400, David Mertz wrote:
> On Sun, Mar 31, 2019, 8:11 PM Steven D'Aprano <steve at pearwood.info> wrote:
>
> > Regarding later proposals to add support for multiple affixes, to
> > recursively delete the affix repeatedly, and to take an additional
> > argument to limit how many affixes will be removed: YAGNI.
> >
>
> That's simply not true, and I think it's clearly illustrated by the example
> I gave a few times. Not just conceivably, but FREQUENTLY I write code to
> accomplish the effect of the suggested:
>
> basename = fname.rstrip(('.jpg', '.gif', '.png'))
>
> I probably do this MORE OFTEN than removing a single suffix.
Okay.
Yesterday, you stated that you didn't care what the behaviour was for
the multiple affix case. You made it clear that "any" semantics would be
okay with you so long as it was documented. You seemed to feel so
strongly about your indifference that you mentioned it in two seperate
emails.
That doesn't sound like someone who has a clear use-case in mind. If
you're doing this frequently, then surely one of the following two
alternatives apply:
(1) One specific behaviour makes sense for all or a majority of your
use-cases, in which case you would prefer that behaviour rather than
something that you can't use.
(2) Or there is no single useful behaviour that you want, perhaps all or
a majority of your use-cases are different, and you'll usually need to
write your own helper function to suit your own usage, no matter what
the builtin behaviour is. Hence you don't care what the builtin
behaviour is.
Since you have no preferred behaviour, either you don't do this often
enough to care (but above you say differently), or you are going to have
to write your own helpers because the behaviour you need won't match the
behaviour of the builtin. And you clearly don't mind this, because you
stated twice that you don't care what the builtin behaviour is.
So why rush to handle the multiple argument case?
"YAGNI" is a misnomer, because it doesn't actually mean "you aren't
(ever) going to need it". It means (generic) you don't need it *now*,
but when you do, you can come back and revisit the design with concrete
use-cases in mind.
That's all I'm saying.
For 29 years, we've done without this string primitive, and as a
consequence the forums are full of examples of people misusing strip and
getting it wrong. There's a clear case for the single argument version,
and fixing that is the 90% solution.
In comparison, we've been discussing this multiple affix feature for,
what, a week?
Lacking a good set of semantics for removing multiple affixes at once,
we shouldn't rush to guess what people want. You don't even know what
behaviour YOU want, let alone what the community as a whole needs.
You won't be any worse off than you are now. You'll probably be better
off, because you can use the single-affix version as the basic
primitive, and build on top of that, instead of the incorrect version
you currently use in an ad hoc manner:
basename = fname.split(".ext")[0]
# replace with fname.cut_suffix(".ext")
Others have already pointed out why the split version is incorrect.
For the use-case of stripping a single file extension out of a set of
such extensions, while leaving all others, there's an obvious solution:
if fname.endswith(('.jpg', '.png', '.gif'):
basename = os.path.splitext(fname)[0]
else:
# Any other extension stays with the base.
# (Presumably to be handled seperately?)
basename = fname
But a more general solution needs to decide on two issues:
- given two affixes where one is an affix of the other, which wins? e.g.
"abcd".cut_prefix(("a", "ab"))
# should this return "bcd" or "cd"?
- once you remove an affix, should you stop processing or continue?
"ab".cut_prefix(("a", "b"))
# should this return "b" or ""?
The startswith and endswith methods don't suffer from this problem, for
obvious reasons. We shouldn't add a problematic, ambiguous feature just
for consistency with methods where it is not problematic or ambiguous.
I posted links to prior art. Unless I missed something, not one of those
languages or libraries supports multiple affixes in the one call.
Don't let the perfect be the enemy of the good. In this case, a 90%
solution will let us fix real problems and meet real needs, and we can
always revisit the multiple affix case once we have more experience and
have time to build a consensus based on actual use-cases.
--
Steven
More information about the Python-ideas
mailing list