How do I automate the removal of all non-ascii characters from my code?

Alec Taylor alec.taylor6 at gmail.com
Tue Sep 13 11:02:05 EDT 2011


Hmm, nothing mentioned so far works for me...

Here's a very small test case:

>>> python -u "Convert to Creole.py"
  File "Convert to Creole.py", line 1
SyntaxError: Non-ASCII character '\xe2' in file Convert to Creole.py
on line 1, but no encoding declared; see
http://www.python.org/peps/pep-0263.html for details
>>> Exit Code: 1

Line 1: a=u'''≤'''.encode("ascii", "ignore").decode("ascii")

On Tue, Sep 13, 2011 at 11:33 PM, Vlastimil Brom
<vlastimil.brom at gmail.com> wrote:
> 2011/9/13 ron <vacorama at gmail.com>:
>>
>> Depending on the load, you can do something like:
>>
>> "".join([x for x in string if ord(x) < 128])
>>
>> It's worked great for me in cleaning input on webapps where there's a
>> lot of copy/paste from varied sources.
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
> Well, for this kind of dirty "data cleaning" you may as well use e.g.
>
>>>> u"äteöxt ÛÜÝ wiÉÊËÌthÞßà áânoûüýþn ASɔɕɖCɗɘəɚɛIɗɘəɚɛIεζ iηθιn жзbetийклweeჟრსn .ტუ..ფ".encode("ascii", "ignore").decode("ascii")
> u'text  with non ASCII in between ...'
>>>>
>
> vbr
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list