stripping unwanted chars from string
John Machin
sjmachin at lexicon.net
Thu May 4 05:43:40 EDT 2006
On 4/05/2006 4:30 PM, Edward Elliott wrote:
> Bryan wrote:
>> >>> keepchars = set(string.letters + string.digits + '-.')
>
> Now that looks a lot better. Just don't forget the underscore. :)
>
*Looks* better than the monkey business. Perhaps I should point out to
those of the studio audience who are huddled in an ASCII bunker (if any)
that string.letters provides the characters considered to be alphabetic
in whatever the locale is currently set to. There is no guarantee that
the operating system won't permit filenames containing other characters,
ones that the file's creator would quite reasonably consider to be
alphabetic. And of course there are languages that have characters that
one would not want to strip but can scarcely be described as alphanumeric.
>>> import os
>>> os.listdir(u'.')
[u'\xc9t\xe9_et_hiver.doc', u'\u041c\u043e\u0441\u043a\u0432\u0430.txt',
u'\u5f20\u654f.txt']
>>> import string
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Doing
import locale; locale.setlocale(locale.LC_ALL, '')
would make string.letters work (for me) with the first file above, but
that's all.
More information about the Python-list
mailing list