<div dir="ltr">I"ve lost track of what (If anything) is actually being proposed here... so I"m going to try a quick summary:<div><br></div><div><br></div><div>1) an easy way to spell "remove all the characters other than these"</div><div><br></div><div>I think that's a good idea. What with unicode having an enormous number of code points, it really does make sense to have a way to specify only what you want, rather than what you don't want.</div><div><br></div><div>Back in the good old days of 1-byte chars, it wasn't hard to build up a full 256 element translate table -- not so much anymore. And one of the whole points of str.translate() is good performance.</div><div><br></div><div> a) a new method:</div><div><br></div><div> str.remove_all_but(sequence_of_chars)</div><div> (naming TBD)</div><div><br></div><div>b) a new flag in translate (Kind of like the decode keywords)</div><div><br></div><div> str.translate(table, missing='ignore'|'remove')</div><div><br></div><div><br></div><div>(b) has the advantage of adding translation and removal in one fell swoop -- but if you only want to remove, then you have to make a translation table of 1:1 mappings = not hard, but a annoying:</div><div><br></div><div>table = {c:c for c in sequence_of_chars}</div><div><br></div><div>I'm on the fence about what I personally prefer.</div><div><br></div><div>2) (in another thread, but similar enough) being able to pass in more than one string to replace:</div><div><br></div><div>str.replace( old=seq_of_strings, new=seq_of_strings )</div><div><br></div><div>I know I've wanted this a lot, and certainly from a performance perspective, it could be a nice bonus. </div><div><br></div><div>But: It overlaps a lot with str.translate -- at least for single character replacements. so really why? so it would really only make sense if supported multi-char strings:</div><div><br></div><div>str.replace(old = ("aword", "another_word"), ("something", "something else"))</div><div><br></div><div>However: a string IS a sequence of strings, so we'd have confusion about that:</div><div><br></div><div>str.replace("this", "four")</div><div><br></div><div>Does the user want the word "this" replaced with the word "four" -- or do they want each character replaced?</div><div><br></div><div>Maybe we'd need a .replace_many() method? ugh!</div><div><br></div><div>There are also other issues with what to di with repeated / overlapping cahractors:</div><div><br></div><div>str.replace( ("aaa", "a", "b"), ("b", "bbb", "a")</div><div><br></div><div>and all sort of other complications!</div><div><br></div><div>THAT I think could be nailed down by defining the "order of operations" Does it lop through the entire string for each item? or through each item for each point in the string? note that if you loop thorugh the entire string for each item, you might as well have written the loop yourself:</div><div><br></div><div>for old, new in sip(old_list, new_list):</div><div> s = s.replace(old, new))</div><div><br></div><div>and at least if the length of the string si long-ish, and the number of replacements short-ish -- performance would be fine.</div><div><br></div><div><br></div><div>*** So the question is -- is there support for these enhancements? If so, then it would be worth hashing ot the details. </div><div><br></div><div>But the next question is -- does anyone care enough to manage that process -- it'll be a lot of work!</div><div><br></div><div>NOTE: there has also been a fair bit of discussion in this thread about ordinals vs characters, and unicode itself -- I don't think any of that resulted in any possible proposals...</div><div><br></div><div>-CHB</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 26, 2016 at 2:48 PM, Mikhail V <span dir="ltr"><<a href="mailto:mikhailwas@gmail.com" target="_blank">mikhailwas@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 26 October 2016 at 20:58, Stephen J. Turnbull<br>
<span class=""><<a href="mailto:turnbull.stephen.fw@u.tsukuba.ac.jp">turnbull.stephen.fw@u.<wbr>tsukuba.ac.jp</a>> wrote:<br>
>import collections<br>
>def translate_or_drop(string, table):<br>
> """<br>
> string: a string to process<br>
> table: a dict as accepted by str.translate<br>
> """<br>
> return string.translate(collections.<wbr>defaultdict(lambda: None, **table))<br>
<br>
>All OK now?<br>
<br>
</span>Not really. I tried with a simple example<br>
intab = "ae"<br>
outtab = "XM"<br>
table = string.maketrans(intab, outtab)<br>
collections.defaultdict(<wbr>lambda: None, **table)<br>
<br>
an this gives me<br>
TypeError: type object argument after ** must be a mapping, not str<br>
<br>
But I probably I misunderstood the idea. Anyway this code does not make<br>
much sence to me, I would never in life understand what is meant here.<br>
And in my not so big, but not so small, Python experience I *never* had<br>
an occasion using collections or lambda.<br>
<span class=""><br>
>sets as a single, universal character set. As it happens, although<br>
>there are differences of opinion over how to handle Unicode in Python,<br>
>there is consensus that Python does have to handle Unicode flexibly,<br>
>effectively and efficiently.<br>
><br>
<br>
</span>I was merely talking about syntax and sources files standard, not about unicode<br>
strings. No doubt one needs some way to store different glyph sets.<br>
<br>
So I was talking about that if one defines a syntax and has good intentions<br>
for readability in mind, there is not so many rationale to adopt the syntax<br>
to current "hybrid" system: 7-bit and/or multibyte paradigm.<br>
Again this a too far going discussion, but one should not probably much<br>
look ahead on those. The situation is not so good in this sense that most<br>
standard software is attached to this strange paradigm<br>
(even those which does not have anything<br>
to do with multi-lingual typography).<br>
So IMO something gone wrong with those standard characters.<br>
<span class=""><br>
>If you insist on bucking it, you'll<br>
>have to do it pretty much alone, perhaps even maintaining your own<br>
>fork of Python.<br>
<br>
</span>As for me I would take the path of developing of own IDE which will enable<br>
typografic quality rendering and of course all useful glyphs, such as<br>
curly quotes,<br>
bullets, etc, which all is fundamental to any possible improvements of<br>
cognitive qualities of code. And I'll stay in 8-bit boundaries, thats for sure.<br>
So if Python will take the path of "unicode" code input (e.g. for some<br>
punctuaion characters)<br>
this would only add a minor issue for generating valid Python source<br>
files in this case.<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
Mikhail<br>
</font></span><div class="HOEnZb"><div class="h5">______________________________<wbr>_________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/<wbr>codeofconduct/</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><br>Christopher Barker, Ph.D.<br>Oceanographer<br><br>Emergency Response Division<br>NOAA/NOS/OR&R (206) 526-6959 voice<br>7600 Sand Point Way NE (206) 526-6329 fax<br>Seattle, WA 98115 (206) 526-6317 main reception<br><br><a href="mailto:Chris.Barker@noaa.gov" target="_blank">Chris.Barker@noaa.gov</a></div>
</div>