On Tue, Nov 1, 2016 at 12:15 AM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

> pretty slick -- but any hope of it being as fast as a C implemented method?

I would expect not in CPython, but if "fast" matters, why are you
using CPython rather than PyPy or Cython?

oh come on!

If it matters *that* much,
you can afford to write your own C implementation.

This is about a possible addition to the stdlib -- me writing my own C implementation has nothing to do with it.

But I doubt that
fast matters "that much" often enough to be worth maintaining yet
another string method in Python.

This could be said about every string method in Python -- I understand that every addition is more code to maintain. But somehow we are adding all kinds of stuff like yet another string formatting method, talking about null coalescing operators and who knows what -- those are all a MUCH larger burden -- not just for maintaining the interpreter, but for everyone using python having more to remember and understand.

On the other hand, powerful and performant string methods are a major plus for Python -- a good reason to us it over Perl :-)

So an new one that provides, as I write before:

> 1) single method call to do a common thing
>
> 2) nice fast, pure C performance

would fit right into to Python, and indeed, would be a similar implementation to existing methods -- so the maintenance burden would be a small addition (i.e if the internal representation for strings changed, all those methods would need re-visiting and similar changes)

So the only key question is -- is the a common enough use case?

> so I think a "keep these" method would help with both of these
> goals.

Sure, but the translate method already gives you that, and a lot more.

yes but only with the fairly esoteric use of defaultdict. which brings me back to the above:

1) single method call to do a common thing

the nice thing about a single method call is discoverability -- no newbie is going to figure out the .translate + defaultdict approach.

Note that when you're talking about working with Unicode characters,
no natural language activity I can imagine (not even translating
Buddhist texts, which involves a couple of Indian scripts as well as
Han ideographs) uses more than a fraction of defined characters.

which is why you may want to remove all the others :-)

So really translate with defaultdict is a specialized loop that
marries an algorithmic body (which could do things like look up the
original script or other character properties to decide on the
replacement for the generic case) with a (usually "small") table of
exceptions. That seems like inspired design to me.

indeed -- .translate() itself is remarkably flexible -- you could even pas in a custom class that does all sorts of logic. and adding the defaultdict is an easy way to add a useful feature. But again, advanced usage and not very discoverable.

Maybe that means we need some more docs and/or perhaps recipes instead.

Anyway, I joined this thread to clarify what might be on the table -- but while I think it's a good idea, I dont have the bandwidth to move it through the process -- so unless someone steps up that does, we're done.

-CHB

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA 98115   (206) 526-6317   main reception

Chris.Barker@noaa.gov