On 25 October 2016 at 19:10, Stephen J. Turnbull firstname.lastname@example.org wrote:
So my previous thought on it was, that there could be set of such functions:
str.translate_keep(table) - this is current translate, namely keeps non-defined chars untouched str.translate_drop(table) - all the same, but dropping non-defined chars
Probaly also a pair of functions without translation: str.remove(chars) - removes given chars str.keep(chars) - removes all, except chars
Motivation is that those can be optimised for speed and I suppose those can work faster than re.sub().
That said, multiple methods is a valid option for the API. Eg, Guido generally prefers that distinctions that can't be made on type of arguments (such as translate_keep vs translate_drop) be done by giving different names rather than a flag argument. Do you *like* this API, or was this motivated primarily by the possibilities you see for optimization?
Certainly I like the look of distinct functions more. It allows me to visually parse the code effectively, so e.g. for str.remove() I would not need to look in docs to understand what the function does. It has its downside of course, since new definitions can accidentally be similar to current ones, so more names, more the probability that no good names are left. Speed is not so important for majority of cases, at least for my current tasks. However if I'll need to process very large texts (seems like I will), speed will be more important.
The width is constant for any given string. However, I don't see at this point that you'll need more than the functions available in Python already, plus one or more wrappers to marshal the information your API accepts to the data that str.translate wants.
Just in some cases I need to convert them to numpy arrays back and forth, so this unicode vanity worries me a bit. But I cannot clearly explain why exactly I need this.
but as said I don't like very much the idea and would be OK for me to use numeric values only.
Yeah I am strange. This however gives you guarantee for any
environment that you
can see and input them ans save the work in ASCII.
This is not going to be a problem if you're running Python and can enter the program and digits. In any case, the API is going to have to be convenient for all the people who expect that they will never again be reduced to a hex keypad and 7-segment display
Here I will dare to make a lyrical degression again. It could have made an impression that I am stuck in nineties or something. But that is not the case. In nineties I used the PC mostly to play Duke Nukem (yeh big times!). And all the more I hadnt any idea what is efficiency of information representation and readability. Now I kind of realize it. So I am just not the one who believes in these maximalistical "we need over 9000 glyphs" talks. And, somewhat prophetic view on this: with the come of cyber era this all be flushed so fast, that all this diligences around unicode could look funny actually. And a hex keypad will not sound "retro" but "brand new".
In other words: I feel really strong that nothin besides standard characters must appear in code sources. If one wants to process unicode, then parse them as resources. So please, at least out of respect to rationally minded, don't make a code look like a christmas-tree. BTW, I use VIM to code actually so anyway I will not see them in my code.