[Python-bugs-list] [ python-Bugs-530882 ] import and execfile don't handle utf-16

noreply@sourceforge.net noreply@sourceforge.net
Sun, 17 Mar 2002 09:33:14 -0800


Bugs item #530882, was opened at 2002-03-17 06:46
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=530882&group_id=5470

Category: Unicode
Group: None
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Terrel Shumway (terrelshumway)
Assigned to: Nobody/Anonymous (nobody)
Summary: import and execfile don't handle utf-16

Initial Comment:
import and execfile don't handle utf-16 encoded files,
but if I read the file with an appropriate encoder, 
exec works fine on the loaded uncode string.

Also, changing site.encoding to utf-16 has a 
detrimental effect. (I need to understand this better.)

I understand that the general problem is difficult to 
solve, but it seems it would be fairly easy to handle 
for the specific case of utf-16 file with some byte 
order mark at the begining: if import/execfile fail 
and the file starts with some BOM, re-read the file 
with an appropriate codec.


Use this code to reproduce the problem
--------------
import sys
print sys.getdefaultencoding()

code = u'print "this is a test: OK"'

import traceback
import codecs

codecs.open("foo.py","w+","utf-16").write(code)

try:
    execfile("foo.py")
except:
    traceback.print_exc()

try:
    import foo
except:
    traceback.print_exc()


uu = codecs.open("foo.py","r","utf-16").read()

exec(uu)
--------------
produces this output
--------------
ascii
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 12, in ?
    execfile("foo.py")
  File "<string>", line 1
      &#9632;p
     ^
 SyntaxError: invalid syntax
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 17, in ?
    import foo
  File "<string>", line 1
      &#9632;p
     ^
 SyntaxError: invalid syntax
this is a test: OK

--------------
If I edit site.py to change encoding to "utf-16", I get
--------------
utf-16
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 15, in ?
    execfile("foo.py")
  File "<string>", line 1
      &#9632;p
     ^
 SyntaxError: invalid syntax
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 20, in ?
    import foo
  File "<string>", line 1
      &#9632;p
     ^
 SyntaxError: invalid syntax
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 27, in ?
    exec(uu)
TypeError: expected string without null bytes
----


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 18:33

Message:
Logged In: YES 
user_id=21627

This is not a bug. The language reference clearly says, in

http://www.python.org/doc/current/ref/lexical.html

"Python uses the 7-bit ASCII character set for program text
and string literals."

PEP 263 (if accepted) will extend this to other encodings.
However, UTF-16 is not in the list of encodings supported
under this PEP, as it is not an ASCII superset.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=530882&group_id=5470