How do I automate the removal of all non-ascii characters from my code?

Mon Sep 12 04:35:44 EDT 2011

On Mon, Sep 12, 2011 at 8:17 AM, Gary Herron <gherron at islandtraining.com>wrote:

> On 09/12/2011 12:49 AM, Alec Taylor wrote:
>
>> Good evening,
>>
>> I have converted ODT to HTML using LibreOffice Writer, because I want
>> to convert from HTML to Creole using python-creole. Unfortunately I
>> get this error: "File "Convert to Creole.py", line 17
>> SyntaxError: Non-ASCII character '\xe2' in file Convert to Creole.py
>> on line 18, but no encoding declared; see
>> http://www.python.org/peps/**pep-0263.html<http://www.python.org/peps/pep-0263.html>for details".
>>
>> Unfortunately I can't post my document yet (it's a research paper I'm
>> working on), but I'm sure you'll get the same result if you write up a
>> document in LibreOffice Writer and add some End Notes.
>>
>> How do I automate the removal of all non-ascii characters from my code?
>>
>> Thanks for all suggestions,
>>
>> Alec Taylor
>>
>
>
>
> This question does not quite make sense.  The error message is complaining
> about a python file.  What does that file have to do with ODT to HTML
> conversion and LibreOffice?
>
> The error message means the python file (wherever it came from) has a
> non-ascii character (as you noted), and so it needs something to tell it
> what such a character means.  (That what the encoding is.)
>
> A comment like this in line 1 or 2 will specify an encoding:
>  # -*- coding: <encoding name> -*-
> but, we'll have to know more about the file "Convert to Creole.py" to guess
> what encoding name should be specified there.
>
> You might try utf-8 or latin-1.
>
>
>
> --
> http://mail.python.org/**mailman/listinfo/python-list<http://mail.python.org/mailman/listinfo/python-list>
>

If you are having trouble figuring out which encoding your file has, the
"file" util is often a quick and dirty solution.

#> echo "åäö" > test.txt
#> file test.txt
test.txt: UTF-8 Unicode text
#> iconv test.txt -f utf-8 -t latin1 > test.l1.txt
#> file test.l1.txt
test.l1.txt: ISO-8859 text

Note: I use latin1 (iso-8859-1) because it can describe the characters 'å',
'ä', 'ö'. Your encoding might be different depending on what system you are
using.

The gist is that if you specify the correct encoding as mentioned above with
the "coding"-comment, your program will probably (ish) run as intended.

-- John-John Tedro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110912/b4059e29/attachment.html>