[Python-ideas] New explicit methods to trim strings

Steven D'Aprano steve at pearwood.info
Sat Mar 30 07:39:45 EDT 2019


On Fri, Mar 29, 2019 at 12:06:25PM +0900, Stephen J. Turnbull wrote:
> Anders Hovmöller writes:
[...]
>  > just like always. This seems totally irrelevant to the
>  > discussion. And it's of course irrelevant to all the end users that
>  > aren't writing libraries but are using python directly.
> 
> No, it's not "irrelevant".  I wish we all would stop using that word,
> and trying to exclude others' arguments in this way.

I won't comment on Anders' claim that this issue is irrelevant to the 
discussion, but I think he is correct about it being irrelevant to "all 
the end users that aren't writing libraries but are using python 
directly" -- or at least those on the cutting edge of 3.8.

There are lots of people who will soon be using nothing older than 3.8, 
and they will no more care that 3.7 lacks this feature than they will 
care that Python 1.5 lacks Unicode, iterators, and new-style classes.

More power to them :-)

For the sake of the argument, I'll grant your point that libraries 
which support older versions of Python cannot use the new feature[1].

But those libraries, and their users, are no worse off by adding a 
string method which they can't yet use. They will simply continue doing 
whatever it is that they already do, which will remain backward 
compatible to 3.3 or 2.7 or however far back they go.

And some day they will have dropped support for 3.7 and older, and will 
be able to use all the new shiny features in 3.8.

After all, if "libraries that support old versions can't use this 
feature" was a reason to reject new features, we would never have added 
*any* new feature past those available in Python 1.0. New features are 
added for the benefit of the present and the future, not for the past.


> We are balancing equities here.

Indeed, and a certain level of caution is justified -- but not so much 
as to cause paralysis and stagnation. There's a word for a language 
which has stopped changing: "dead".


> We have a plethora of changes, on the
> one side taken by itself each of which is an improvement, but on the
> other taken as a group they greatly increase the difficulty of
> learning to read Python programs fluently. 

"Greatly"?

Is it truly that hard to go help(str.cutprefix) at the interactive 
interpreter, or look it up in the docs? I mean, if a simple string 
method causes a developer that much confusion, imagine how badly they 
will cope with async!

You can't read Python programs fluently unless you understand the custom 
functions and classes in that program. Compared to that, I don't think 
that it is especially difficult to learn what a couple of new methods 
do. Especially if their name is self-documenting.


[...]
>  > Putting it in a library virtually guarantees it will never become
>  > popular.
> 
> Factually, you're wrong.  Many libraries have moved from PyPI to the
> stdlib, often very quickly as they prove their worth in a deliberate
> test.

The Python community is not the Javascript community, we don't tend to 
download tiny one-or-two line libraries. And that is a good thing:

https://medium.com/commitlog/the-internet-is-at-the-mercy-of-a-handful-of-people-73fac4bc5068

Putting aside all those whose are prohibited from using unapproved 
third-party libraries -- and there are a lot of them, from students 
using locked-down machines to corporate and government users where 
downloading unapproved software is grounds for instant dismissal -- I 
think most people simply couldn't be bothered installing and importing a 
package that offered something as simple as a couple of "cut" functions.

While its true that not every two-line function needs to be in the 
stdlib, its often better to have it in the stdlib than expect ten 
thousand people to write the same two-line function over and over again.


> Note that decimal was introduced with no literal syntax and is quite
> useful and used.

It was also added straight into the stdlib without being forced to go 
through the "third-party library" stage first, and with minimal 
discussion:

https://mail.python.org/pipermail/python-dev/2003-October/thread.html

If there was ever a module which *could* have proven itself as a 
third-party library on PyPI, it was probably Decimal. It adds an entire 
new numeric class, one with significant advantages (and some 
disadvantages) over binary floats, not just a couple of lines of code. 
Re-inventing the wheel is impractical: few people have the numeric 
know-how to duplicate that wheel, and for those who can, it would take a 
massive amount of effort: the Python version is over 6000 lines 
(including blanks and comments). If you want a Decimal type, it isn't 
practical to write one yourself.


> If this change is going to prove it's tall enough to
> ride the stdlib ride, using a constructor for a derived class rather
> than str literal syntax shouldn't be too big a barrier to judging
> popularity (accounting for the annoyance of a constructor).


There's little difference between writing MyDecimal("1.2") versus 
Decimal("1.2"), but there's a huge annoyance factor in having to write 
MyString("hello world") instead of "hello world".

Especially when all you want is to add a single new method and instead 
you have to override a dozen or more methods to return instances of 
MyString. And then you pass it to some function or library, and it 
returns a regular string again. So you're constantly playing wack-a-mole 
trying to discover why your MyString subclass objects are turning into 
regular built-in strings when you least expect it.

Forget it. That's a serious PITA.


> Alternatively, the features could be introduced using functions.

We specifically added a str class with methods to get away from the 
functions in the string module, and you want to bring them back? I think 
the bar for adding string functions into the string module should be 
much higher than adding a couple of lightweight methods.






[1] Actually, they can. As we know from the transition from 2 to 3, 
there is often a perfectly viable solution for libraries that want to 
support old versions. Here is some actual code taken from one of my 
modules which works back to Python 2.4:

try:
    casefold = str.casefold  # Added in 3.3 (I think).
except AttributeError:
    # Fall back version is not as good, but is good enough.
    casefold = str.lower

So even libraries that support Python 2 can get the advantage of an 
accelerated C method by using this technique with a fallback to whatever 
they are currently using:

try:
    lcut = str.cutprefix
except AttributeError:
    # Fall back to pure Python version.
    def lcut(astring, prefix):
        ...




-- 
Steven
(the other one)



More information about the Python-ideas mailing list