[ python-Bugs-1306484 ] compile() converts "filename" parameter to StringType

Sat Oct 8 12:04:45 CEST 2005

Bugs item #1306484, was opened at 2005-09-28 06:49
Message generated for change (Settings changed) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1306484&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Parser/Compiler
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Vágvölgyi Attila (wigy)
>Assigned to: Nobody/Anonymous (nobody)
Summary: compile() converts "filename" parameter to StringType

Initial Comment:
The builtin compile() signature looks like:

  compile(string, filename, kind[, flags[, dont_inherit]])

The string parameter can be either StringType or
UnicodeType, but the filename parameter will be
converted to StringType, so if there are non-ascii
characters in the unicode object passed, it raises
UnicodeEncodeError.

This can be an issue on filesystems having utf-8
filenames, or when using non-English names for the
backtrace beautification.

The attached file contains a unit test that will
succeed when the bug is resolved. I saw the error in
2.3 and 2.4, maybe it is there for all releases?

----------------------------------------------------------------------

Comment By: Vágvölgyi Attila (wigy)
Date: 2005-09-29 10:29

Message:
Logged In: YES 
user_id=156682

loewis, I confess I could not understand a word.

But as I see, it would have some advantages to have a
completely unicode internal filename representation on
systems having multiple filesystems mounted with different
encodings, or systems having simply utf-8 filesystems (no
'ascii', 'replace' for allowing two filenames differing only
in accents).

I agree with Joel Spolsky
(http://www.joelonsoftware.com/articles/Unicode.html), and I
think that if choosing unicode could be easier in a
language, than most of l10n problems would be solved. I
understand, that coding unicode in C is a pain.

Imagine - theoretically - if a literal like "hello" would
automatically mean a unicode object in python, and you had
to write s"hello" to make a literal string object encoded in
a way some enviromental settings (or maybe the PEP 0263
header of the specific source file?) determine, so you have
control on what happens.

Imagine the case when there is a latin1 and a utf-8
partition mounted, and the console is latin2! Life would be
much, much easier for a non-American programmer if she had
to be aware from the first moment, that she is in an
international environment.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2005-09-29 08:34

Message:
Logged In: YES 
user_id=849994

Sounds sound. :)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2005-09-29 08:20

Message:
Logged In: YES 
user_id=21627

Why couldn't co_filename just be the Unicode string? I think
one would have to change:
- code_repr, to convert the filename into a byte string
(preferably using 'ascii', 'replace')
- tb_printinternal (not sure what to do here)
- code_new, to accept either strings or unicode strings
- builtin_compile, which probably indeed needs to convert
the string using the file system encoding, and then patch
the resulting code object to point to the unicode object
originally passed (unless we can accept more pythonrun
functions).

----------------------------------------------------------------------

Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-09-28 14:54

Message:
Logged In: YES 
user_id=1188172

Should compile() use the Py_FileSystemEncoding?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1306484&group_id=5470