I suspect the interactive session is *not* always in UTF8. It probably depends on the keyboard mapping of your terminal emulator. I imagine in Windows it's the current code page. On Wed, Apr 29, 2015 at 9:19 AM, Adam Bartoš <drekin@gmail.com> wrote:
Yes, that works for eval. But I want it for code entered during an interactive session.
u'α' u'\xce\xb1'
The tokenizer gets b"u'\xce\xb1'" by calling PyOS_Readline and it knows it's utf-8 encoded. But the result of evaluation is u'\xce\xb1'. Because of how eval works, I believe that it would work correctly if the PyCF_SOURCE_IS_UTF8 was set, but it is not. That is why I'm asking if there is a way to set it. Also, my naive thought is that it should be always set in the case of interactive session.
On Wed, Apr 29, 2015 at 4:59 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Le 29 avr. 2015 10:36, "Adam Bartoš" <drekin@gmail.com> a écrit :
Why I'm talking about PyCF_SOURCE_IS_UTF8? eval(u"u'\u03b1'") -> u'\u03b1' but eval(u"u'\u03b1'".encode('utf-8')) -> u'\xce\xb1'.
There is a simple option to get this flag: call eval() with unicode, not with encoded bytes.
Victor
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)