eval and unicode
Laszlo Nagy
gandalf at shopzeus.com
Fri Mar 21 07:38:49 EDT 2008
>
> Your problem is, I think, that you think the magic of decoding source
> code from the byte sequence into unicode happens in exec or eval. It
> doesn't. It happens in between reading the file and passing the
> contents of the file to exec or eval.
>
I think you are wrong here. Decoding source happens inside eval. Here is
the proof:
s = 'u"' + '\xdb' + '"'
print eval(s) == eval( "# -*- coding: iso8859-2\n" + s) # prints False,
indicating that the decoding of the string expression happened inside eval!
It can also be prooven that eval does not use 'ascii' codec for default
decoding:
'\xdb'.decode('ascii') # This will raise an UnicodeDecodeError
eval() somehow decoded the passed expression. No question. It did not
use 'ascii', nor 'latin2' but something else. Why is that? Why there is
a particular encoding hard coded into eval? Which is that encoding? (I
could not decide which one, since '\xdb' will be the same in latin1,
latin3, latin4 and probably many others.)
I suspected that eval is going to use the same encoding that the python
source file/console had at the point of execution, but this is not true:
the following program prints u'\xdb' instead of u'\u0170':
<snip>
# -*- coding iso8859-2 -*-
s = '\xdb'
expr = 'u"' + s +'"'
print repr(eval(expr))
</snip>
Regards,
Laszlo
More information about the Python-list
mailing list