From python to LaTeX in emacs on windows
Benjamin Niemann
b.niemann at betternet.de
Mon Aug 30 07:55:18 EDT 2004
Brian Elmegaard wrote:
> Hi group
>
> I hope this is not a faq...
>
> I try to understand how to use the new way of specifying a files
> encoding, but no matter what I do I get strange characters in the
> output.
>
> I have a text file which I have generated in python by parsing some
> html.
>
> In the file there is international characters like é and ó.
> I can see the file in emacs it is encoded as
> mule-utf-8-dos
>
> I read the file into python as a string and suddenly the characters
> when printed looks strange and consists of two characters.
>
> First problem: How do I avoid this?
>
> Second problem is that I make some string replacements and more in
> the string to write a latex output file. When I open this file in
> emacs the characters now are not the same?
>
> Second problem: How do I avoid this?
When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But you
probably want a unicode string. Use "text = unicode(data, 'utf-8')"
where "data" is the filecontent you read. After processing you probably
want to write it back to a file. Before you do this, you will have to
convert the unicode string back to a byte sequence. Use "data =
text.encode('utf')".
Handling character encodings correctly *is* difficult. It's no shame, if
you don't get it right on the first attempt.
More information about the Python-list
mailing list