From python to LaTeX in emacs on windows
b.niemann at betternet.de
Tue Aug 31 11:18:18 CEST 2004
Brian Elmegaard wrote:
> Benjamin Niemann <b.niemann at betternet.de> writes:
> Thank for the help. I solved the problem by specifying the cp1252
> encoding for the python file by a magic comment and for the input data file.
>>When you read the filecontents in python, you'll have the "raw" byte
>>sequence, in this case it is the UTF-8 encoding of unicode text. But
>>you probably want a unicode string. Use "text = unicode(data,
>>'utf-8')" where "data" is the filecontent you read. After processing
>>you probably want to write it back to a file. Before you do this, you
>>will have to convert the unicode string back to a byte sequence. Use
>>"data = text.encode('utf')".
> This worked, but when I try to print text I get:
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not in range(128)
> Why is that?
The console only understands "byte streams". To print a unicode string,
python tries to encode it using the default encoding, which is 'ascii'
in your case. That encoding is not able to represent characters like
'ü', 'ä'.. which causes the exception. What I usually do is something like:
print text.encode("cp1251", "ignore")
The 'ignore' argument causes all characters, that cannot be represented
in cp1251 to be silently dropped - which is ok, if the output is only
used e.g. to track progress.
Don't know if there's a way to python to do this automagically for all
unicodes passed to stdout...
>>Handling character encodings correctly *is* difficult.
> What makes it difficult? The OS, the editor, python, latex?
At least for me it is difficult, because I'm used to think "1 byte = 1
character" and when I read/write files I could simple handle the data as
strings. Unless you begin to parse arbitrary data from the internet,
there is little chance that you encounter text encodings different from
your operating systems default and you start to believe that e.g.
"ord('ü') == 252" is a universal rule sent by the gods...
If you do it right, then you should convert all data that 'enters' your
application as early as possible to unicode and encode it back when you
print/save/send it - this way you'll only have to deal with unicodes in
your application code. The most difficult part is probably changing old
More information about the Python-list