way to remove all non-ascii characters from a file?
__peter__ at web.de
Fri Feb 13 22:45:40 CET 2004
> I have a text file which contains the occasional non-ascii charcter.
> What is the best way to remove all of these in python?
Read it in chunks, then remove the non-ascii charactors like so:
>>> t = "".join(map(chr, range(256)))
>>> d = "".join(map(chr, range(128,256)))
>>> "Törichte Logik böser Kobold".translate(t,d)
'Trichte Logik bser Kobold'
and finally write the maimed chunks to a file. However, it's not clear to
me, how removing characters could be a good idea in the first place.
Replacing them at least gives some mimimal hints that something is missing:
>>> t = "".join(map(chr, range(128))) + "?" * 128
>>> "Törichte Logik böser Kobold".translate(t)
'T?richte Logik b?ser Kobold'
More information about the Python-list