[Python-ideas] New explicit methods to trim strings
Steven D'Aprano
steve at pearwood.info
Sat Mar 30 07:39:45 EDT 2019
On Fri, Mar 29, 2019 at 12:06:25PM +0900, Stephen J. Turnbull wrote:
> Anders Hovmöller writes:
[...]
> > just like always. This seems totally irrelevant to the
> > discussion. And it's of course irrelevant to all the end users that
> > aren't writing libraries but are using python directly.
>
> No, it's not "irrelevant". I wish we all would stop using that word,
> and trying to exclude others' arguments in this way.
I won't comment on Anders' claim that this issue is irrelevant to the
discussion, but I think he is correct about it being irrelevant to "all
the end users that aren't writing libraries but are using python
directly" -- or at least those on the cutting edge of 3.8.
There are lots of people who will soon be using nothing older than 3.8,
and they will no more care that 3.7 lacks this feature than they will
care that Python 1.5 lacks Unicode, iterators, and new-style classes.
More power to them :-)
For the sake of the argument, I'll grant your point that libraries
which support older versions of Python cannot use the new feature[1].
But those libraries, and their users, are no worse off by adding a
string method which they can't yet use. They will simply continue doing
whatever it is that they already do, which will remain backward
compatible to 3.3 or 2.7 or however far back they go.
And some day they will have dropped support for 3.7 and older, and will
be able to use all the new shiny features in 3.8.
After all, if "libraries that support old versions can't use this
feature" was a reason to reject new features, we would never have added
*any* new feature past those available in Python 1.0. New features are
added for the benefit of the present and the future, not for the past.
> We are balancing equities here.
Indeed, and a certain level of caution is justified -- but not so much
as to cause paralysis and stagnation. There's a word for a language
which has stopped changing: "dead".
> We have a plethora of changes, on the
> one side taken by itself each of which is an improvement, but on the
> other taken as a group they greatly increase the difficulty of
> learning to read Python programs fluently.
"Greatly"?
Is it truly that hard to go help(str.cutprefix) at the interactive
interpreter, or look it up in the docs? I mean, if a simple string
method causes a developer that much confusion, imagine how badly they
will cope with async!
You can't read Python programs fluently unless you understand the custom
functions and classes in that program. Compared to that, I don't think
that it is especially difficult to learn what a couple of new methods
do. Especially if their name is self-documenting.
[...]
> > Putting it in a library virtually guarantees it will never become
> > popular.
>
> Factually, you're wrong. Many libraries have moved from PyPI to the
> stdlib, often very quickly as they prove their worth in a deliberate
> test.
The Python community is not the Javascript community, we don't tend to
download tiny one-or-two line libraries. And that is a good thing:
https://medium.com/commitlog/the-internet-is-at-the-mercy-of-a-handful-of-people-73fac4bc5068
Putting aside all those whose are prohibited from using unapproved
third-party libraries -- and there are a lot of them, from students
using locked-down machines to corporate and government users where
downloading unapproved software is grounds for instant dismissal -- I
think most people simply couldn't be bothered installing and importing a
package that offered something as simple as a couple of "cut" functions.
While its true that not every two-line function needs to be in the
stdlib, its often better to have it in the stdlib than expect ten
thousand people to write the same two-line function over and over again.
> Note that decimal was introduced with no literal syntax and is quite
> useful and used.
It was also added straight into the stdlib without being forced to go
through the "third-party library" stage first, and with minimal
discussion:
https://mail.python.org/pipermail/python-dev/2003-October/thread.html
If there was ever a module which *could* have proven itself as a
third-party library on PyPI, it was probably Decimal. It adds an entire
new numeric class, one with significant advantages (and some
disadvantages) over binary floats, not just a couple of lines of code.
Re-inventing the wheel is impractical: few people have the numeric
know-how to duplicate that wheel, and for those who can, it would take a
massive amount of effort: the Python version is over 6000 lines
(including blanks and comments). If you want a Decimal type, it isn't
practical to write one yourself.
> If this change is going to prove it's tall enough to
> ride the stdlib ride, using a constructor for a derived class rather
> than str literal syntax shouldn't be too big a barrier to judging
> popularity (accounting for the annoyance of a constructor).
There's little difference between writing MyDecimal("1.2") versus
Decimal("1.2"), but there's a huge annoyance factor in having to write
MyString("hello world") instead of "hello world".
Especially when all you want is to add a single new method and instead
you have to override a dozen or more methods to return instances of
MyString. And then you pass it to some function or library, and it
returns a regular string again. So you're constantly playing wack-a-mole
trying to discover why your MyString subclass objects are turning into
regular built-in strings when you least expect it.
Forget it. That's a serious PITA.
> Alternatively, the features could be introduced using functions.
We specifically added a str class with methods to get away from the
functions in the string module, and you want to bring them back? I think
the bar for adding string functions into the string module should be
much higher than adding a couple of lightweight methods.
[1] Actually, they can. As we know from the transition from 2 to 3,
there is often a perfectly viable solution for libraries that want to
support old versions. Here is some actual code taken from one of my
modules which works back to Python 2.4:
try:
casefold = str.casefold # Added in 3.3 (I think).
except AttributeError:
# Fall back version is not as good, but is good enough.
casefold = str.lower
So even libraries that support Python 2 can get the advantage of an
accelerated C method by using this technique with a fallback to whatever
they are currently using:
try:
lcut = str.cutprefix
except AttributeError:
# Fall back to pure Python version.
def lcut(astring, prefix):
...
--
Steven
(the other one)
More information about the Python-ideas
mailing list