[New-bugs-announce] [issue5911] built-in compile() should take encoding option.

Naoki INADA report at bugs.python.org
Sun May 3 03:55:25 CEST 2009

New submission from Naoki INADA <songofacandy at gmail.com>:

The built-in compile() expects source is encoded in utf-8.
This behavior make it harder to implement alternative shell
like IDLE and IPython. (http://bugs.python.org/issue1542677 and
https://bugs.launchpad.net/ipython/+bug/339642 are related bugs.)

Below is current compile() behavior.

# Python's interactive shell in Windows cp932 console.
>>> "あ"
>>> u"あ"

# compile() fails to decode str.
>>> code = compile('u"あ"', '__interactive__', 'single')
>>> exec code
u'\x82\xa0'  # u'\u3042' expected.

# compile() encodes unicode to utf-8.
>>> code = compile(u'"あ"', '__interactive__', 'single')
>>> exec code
'\xe3\x81\x82' # '\x82\xa0' (cp932) wanted, but I get utf-8.

Currentry, using PEP0263 like below is needed to get compile
code in expected encoding.

>>> code = compile('# coding: cp932\n%s' % ('"あ"',), '__interactive__', 
>>> exec code
>>> code = compile('# coding: cp932\n%s' % ('u"あ"',), '__interactive__', 
>>> exec code

But I feel compile() with PEP0263 is bit dirty hack.
I think adding a 'encoding' argument that have a 'utf-8' as default value to
compile() is cleaner way and it doesn't break backward compatibility.

Following example is describe behavior of compile() with encoding option.

# coding: utf-8 (in utf-8 context)
code = compile('"あ"', '__foo.py', 'single')
exec code #=> '\xe3\x81\x82'

code = compile('"あ"', '__foo.py', 'single', encoding='cp932') => 

code = compile(u'"あ"', '__foo.py', 'single')
exec code #=> '\xe3\x81\x82'

code = compile(u'"あ"', '__foo.py', 'single', encoding='cp932')
exec code #=> '\x82\xa0'

components: None
messages: 86994
nosy: naoki
severity: normal
status: open
title: built-in compile() should take encoding option.
type: feature request
versions: Python 2.6, Python 2.7

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list