[ python-Bugs-999444 ] compiler module doesn't support unicode characters in laiter

SourceForge.net noreply at sourceforge.net
Thu Jul 29 13:30:08 CEST 2004


Bugs item #999444, was opened at 2004-07-28 15:00
Message generated for change (Comment added) made by mwh
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=999444&group_id=5470

Category: Python Interpreter Core
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Jim Fulton (dcjim)
Assigned to: Nobody/Anonymous (nobody)
Summary: compiler module doesn't support unicode characters in laiter

Initial Comment:
I'm not positive that this is a bug.  The buit-in
compile function acepts unicode with non-ascii text in
literals:

>>> text = u"print u'''\u0442\u0435\u0441\u0442'''"
>>> exec compile(text, 's', 'exec')
тест
>>> import compiler
>>> exec compiler.compile(text, 's', 'exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py",
line 64, in compile
    gen.compile()
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py",
line 111, in compile
    tree = self._get_tree()
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py",
line 77, in _get_tree
    tree = parse(self.source, self.mode)
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py",
line 50, in parse
    return Transformer().parsesuite(buf)
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py",
line 120, in parsesuite
    return self.transform(parser.suite(text))
UnicodeEncodeError: 'ascii' codec can't encode
characters in position 10-13: ordinal not in range(128)
>>> 

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2004-07-29 12:30

Message:
Logged In: YES 
user_id=6656

thinking about this a little harder, doing a proper job probably 
invloves mucking around in the depths of python to support 
source-as-unicode throughout.  the vile solution is this sort of 
thing:

>>> parser.suite('# coding: utf-8\n' + u"print 
u'''\u0442\u0435\u0441\u0442'''".encode('utf-8'))
<parser.st object at 0x107770>


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-07-29 12:19

Message:
Logged In: YES 
user_id=6656

the immediate problem is that the parser module does support 
unicode:

>>> import parser
>>> parser.suite(u"print u'''\u0442\u0435\u0441\u0442'''")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in 
position 10-13: ordinal not in range(128)

there may well be more bugs lurking in Lib/compiler wrt this 
issue, but this is the first... I don't know how easy this will be to 
fix (looking at what the builtin compile() function does with 
unicode might be a good start).

----------------------------------------------------------------------

Comment By: Jim Fulton (dcjim)
Date: 2004-07-28 15:02

Message:
Logged In: YES 
user_id=73023

Also in 2.3

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=999444&group_id=5470


More information about the Python-bugs-list mailing list