On 24 October 2016 at 22:54, Chris Barker
On Mon, Oct 24, 2016 at 1:30 PM, Mikhail V
wrote: But how would you with current translate function drop all characters that are not in the table?
that is another question altogether, and one for a different list, actually.
I don't know a way to do "remove every character except these", but someone I expect there is a way to do that efficiently with Python strings.
you could probably (ab)use the codecs module, though.
If there really is no way to do it, then you might have feature worth pursuing, but be prepared with use-cases!
The only use-case I've had for that sort of this is when I want only ASCII -- but I can uses the ascii codec for that :-)
This for example is needed for filtering out all non-standard characters from paths, etc.
You'd usually want to replace those with something, rather than remove them entirely, yes?
Just a pair of usage cases which I was facing in my practice: 1. Imagine I perform some admin tasks in a company with very different users who also tend to name the files as they wish. So only God knows what can be there in filenames. And I know foe example that there can be Cyrillic besides ASCII their. So I just define a table like: { 1072: 97 1073: 98 1074: 99 ... [which localizes Cyrillic into ASCII] ... 97:97 98:98 99:99 ... [those chars that are OK, leave them] } Then I use os.walk() and os.rename() and voila! the file system regains it virginity in one simple script. 2. Say I have a multi-lingual file or whatever, I want to filter out some unwanted characters so I can do it similarly. Mikhail