eval and unicode

Fri Mar 21 07:38:49 EDT 2008

>
> Your problem is, I think, that you think the magic of decoding source
> code from the byte sequence into unicode happens in exec or eval. It
> doesn't. It happens in between reading the file and passing the
> contents of the file to exec or eval.
>   
I think you are wrong here. Decoding source happens inside eval. Here is 
the proof:

s = 'u"' + '\xdb' + '"'
print eval(s) == eval( "# -*- coding: iso8859-2\n" + s) # prints False, 
indicating that the decoding of the string expression happened inside eval!

It can also be prooven that eval does not use 'ascii' codec for default 
decoding:

'\xdb'.decode('ascii') # This will raise an UnicodeDecodeError

eval() somehow decoded the passed expression. No question. It did not 
use 'ascii', nor 'latin2' but something else. Why is that? Why there is 
a particular encoding hard coded into eval? Which is that encoding? (I 
could not decide which one, since '\xdb' will be the same in latin1, 
latin3, latin4 and probably many others.)

I suspected that eval is going to use the same encoding that the python 
source file/console had at the point of execution, but this is not true: 
the following program prints u'\xdb' instead of u'\u0170':

<snip>
# -*- coding iso8859-2 -*-

s = '\xdb'
expr = 'u"' + s +'"'
print repr(eval(expr))
</snip>

Regards,

   Laszlo