way to remove all non-ascii characters from a file?
peter at engcorp.com
Tue Feb 17 20:45:07 CET 2004
Gerhard Häring wrote:
> omission9 wrote:
> > I have a text file which contains the occasional non-ascii charcter.
> > What is the best way to remove all of these in python?
> Here's a simple example that does what you want:
> >>> orig = "Häring"
> >>> "".join([x for x in orig if ord(x) < 128])
Or, if performance is critical, it's possible something like this would
be faster. (A regex might be even better, avoiding the redundant identity
transformation step.) :
>>> from string import maketrans, translate
>>> table = maketrans('', '')
>>> translate(orig, table, table[128:])
More information about the Python-list