I"ve lost track of what (If anything) is actually being proposed here... so I"m going to try a quick summary:


1) an easy way to spell "remove all the characters other than these"

I think that's a good idea. What with unicode having an enormous number of code points, it really does make sense to have a way to specify only what you want, rather than what you don't want.

Back in the good old days of 1-byte chars, it wasn't hard to build up a full 256 element translate table -- not so much anymore. And one of the whole points of str.translate() is good performance.

 a) a new method:

   str.remove_all_but(sequence_of_chars)
  (naming TBD)

b) a new flag in translate (Kind of like the decode keywords)

  str.translate(table, missing='ignore'|'remove')


(b) has the advantage of adding translation and removal in one fell swoop -- but if you only want to remove, then you have to make a translation table of 1:1 mappings = not hard, but a annoying:

table = {c:c for c in sequence_of_chars}

I'm on the fence about what I personally prefer.

2) (in another thread, but similar enough) being able to pass in more than one string to replace:

str.replace( old=seq_of_strings, new=seq_of_strings )

I know I've wanted this a lot, and certainly from a performance perspective, it could be a nice bonus. 

But: It overlaps a lot with str.translate -- at least for single character replacements. so really why? so it would really only make sense if supported multi-char strings:

str.replace(old = ("aword", "another_word"), ("something", "something else"))

However: a string IS a sequence of strings, so we'd have confusion about that:

str.replace("this", "four")

Does the user want the word "this" replaced with the word "four" -- or do they want each character replaced?

Maybe we'd need a .replace_many() method? ugh!

There are also other issues with what to di with repeated / overlapping cahractors:

str.replace( ("aaa", "a", "b"), ("b", "bbb", "a")

and all sort of other complications!

THAT I think could be nailed down by defining the "order of operations" Does it lop through the entire string for each item? or through each item for each point in the string? note that if you loop thorugh the entire string for each item, you might as well have written the loop yourself:

for old, new in sip(old_list, new_list):
    s = s.replace(old, new))

and at least if the length of the string si long-ish, and the number of replacements short-ish -- performance would be fine.


*** So the question is -- is there support for these enhancements? If so, then it would be worth hashing ot the details. 

But the next question is -- does anyone care enough to manage that process -- it'll be a lot of work!

NOTE: there has also been a fair bit of discussion in this thread about ordinals vs characters, and unicode itself -- I don't think any of that resulted in any possible proposals...

-CHB



On Wed, Oct 26, 2016 at 2:48 PM, Mikhail V <mikhailwas@gmail.com> wrote:
On 26 October 2016 at 20:58, Stephen J. Turnbull
<turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
>import collections
>def translate_or_drop(string, table):
>    """
>    string: a string to process
>    table: a dict as accepted by str.translate
>    """
>    return string.translate(collections.defaultdict(lambda: None, **table))

>All OK now?

Not really. I tried with a simple example
intab = "ae"
outtab = "XM"
table = string.maketrans(intab, outtab)
collections.defaultdict(lambda: None, **table)

an this gives me
TypeError: type object argument after ** must be a mapping, not str

But I probably I misunderstood the idea. Anyway this code does not make
much sence to me, I would never in life understand what is meant here.
And in my not so big, but not so small, Python experience I *never* had
an occasion using collections or lambda.

>sets as a single, universal character set.  As it happens, although
>there are differences of opinion over how to handle Unicode in Python,
>there is consensus that Python does have to handle Unicode flexibly,
>effectively and efficiently.
>

I was merely talking about syntax and sources files standard, not about unicode
strings. No doubt one needs some way to store different glyph sets.

So I was talking about that if one defines a syntax and has good intentions
for readability in mind, there is not so many rationale to adopt the syntax
to current "hybrid" system: 7-bit and/or multibyte paradigm.
Again this a too far going discussion, but one should not probably much
look ahead on those. The situation is not so good in this sense that most
standard software is  attached to this strange paradigm
(even those which does not have anything
to do with multi-lingual typography).
So IMO something gone wrong with those standard characters.

>If you insist on bucking it, you'll
>have to do it pretty much alone, perhaps even maintaining your own
>fork of Python.

As for me I would take the path of developing of own IDE which will enable
typografic quality rendering and of course all useful glyphs, such as
curly quotes,
bullets, etc, which all is fundamental to any possible improvements of
cognitive qualities of code. And I'll stay in 8-bit boundaries, thats for sure.
So if Python will take the path of "unicode" code input (e.g. for some
punctuaion characters)
this would only add a minor issue for generating valid Python source
files in this case.


Mikhail
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov