How do I automate the removal of all non-ascii characters from my code?
vlastimil.brom at gmail.com
Tue Sep 13 00:49:49 CEST 2011
2011/9/12 Alec Taylor <alec.taylor6 at gmail.com>:
> Good evening,
> I have converted ODT to HTML using LibreOffice Writer, because I want
> to convert from HTML to Creole using python-creole. Unfortunately I
> get this error: "File "Convert to Creole.py", line 17
> SyntaxError: Non-ASCII character '\xe2' in file Convert to Creole.py
> on line 18, but no encoding declared; see
> http://www.python.org/peps/pep-0263.html for details".
> Unfortunately I can't post my document yet (it's a research paper I'm
> working on), but I'm sure you'll get the same result if you write up a
> document in LibreOffice Writer and add some End Notes.
> How do I automate the removal of all non-ascii characters from my code?
> Thanks for all suggestions,
> Alec Taylor
It would obviously help to see the content of the line mentioned in
the traceback (and probably its context);
however, that value seems to correspond with â in some European
encodings, in which case it would probably be part of some quoted
unicode/string literal. (at least in python 2, in python3 it could
also be a name of an object in python code, the traceback seems to be
the same for both cases.)
>>> print '\xe2'.decode("iso-8859-1")
# and likewise for iso-8859-... 1,2,3,4; 9, 10, 14, 15, 16, some
windows- encodings etc.
Possibly (as previouslz suggested) adding the encoding information
like iso-8859-1 or windows-1252 or others depending on other data etc.
at the top of the source file might fix the error. Which would be
certainly preferable to throwing all non ascii data away.
Zou would add e.g.
# -*- coding: iso-8859-1 -*-
as the first or second line of the file.
More information about the Python-list