[Python-bugs-list] [ python-Bugs-767645 ] incorrect os.path.supports_unicode_filenames

SourceForge.net noreply@sourceforge.net
Thu, 10 Jul 2003 14:13:39 -0700


Bugs item #767645, was opened at 2003-07-08 11:42
Message generated for change (Comment added) made by jvr
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=767645&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Just van Rossum (jvr)
Assigned to: Nobody/Anonymous (nobody)
Summary: incorrect os.path.supports_unicode_filenames

Initial Comment:
At least on OSX, unicode file names are pretty much fully 
supported, yet os.path.supports_unicode_filenames is False 
(it comes from posixpath.py, which hard codes it). What 
would be a proper way to detect unicode filename support 
for posix platforms?

----------------------------------------------------------------------

>Comment By: Just van Rossum (jvr)
Date: 2003-07-10 23:13

Message:
Logged In: YES 
user_id=92689

> On OSX, the situation is somewhat different from POSIX, as
> you have additional functions to open files (which Python
> apparently does not use, though), and because OSX specifies
> that the byte strings have to be NFD UTF-8 (which Python
> violates AFAICT).

(I'm not 100% sure, but I think the OS corrects that)

> True if arbitrary Unicode strings can be used as file names
> (within limitations imposed by the file system), and if
> \function{os.listdir()} returns Unicode strings for a Unicode
> argument.
> 
> While the first part is true for OSX, I don't think the
> second part is.

It is, we had a long discussion about that back when I 
implemented that ;-)

> If that ever gets corrected (or verified),
> no further detection is necessary - just set
> macpath.supports_unicode_filenames for darwin (assuming you
> use macpath.py on that system). 

Darwin is a posix platform, so I'll have to add a switch to 
posixpath.py. Unless you object to that, I will do that.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-07-10 23:05

Message:
Logged In: YES 
user_id=21627

Brett: As for "writing Unicode to an ASCII file system":
there is no such thing. POSIX file systems accept arbitrary
bytes, and don't interpret them except by looking at the
path separator (in ASCII).

So you can put Latin-1, KOI8-r, EUC-JP, UTF-8, gb2312, etc
all on a single file system, and people actually do that.
The convention is that bytes in file names are interpreted
according to the locale's encoding. This is just a
convention, and it has some significant flaws. Python
follows that convention, meaning that you can use arbitrary
Unicode strings in open(), as long as they are supported in
the locale's encoding.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-07-10 23:01

Message:
Logged In: YES 
user_id=21627

On POSIX platforms in general, detecting Unicode file name
support is not possible. Posix uses open(2), and only
open(2) (alon with creat(2), stat(2) etc) to access files.
There is no open_w, or open_utf8, or the like. So file names
are byte strings on Posix, and it will stay that way forever.
(There is actually also fopen, but that doesn't change the
situation at all).

On OSX, the situation is somewhat different from POSIX, as
you have additional functions to open files (which Python
apparently does not use, though), and because OSX specifies
that the byte strings have to be NFD UTF-8 (which Python
violates AFAICT).

The documentation for supports_unicode_filenames says

True if arbitrary Unicode strings can be used as file names
(within limitations imposed by the file system), and if
\function{os.listdir()} returns Unicode strings for a Unicode
argument.

While the first part is true for OSX, I don't think the
second part is. If that ever gets corrected (or verified),
no further detection is necessary - just set
macpath.supports_unicode_filenames for darwin (assuming you
use macpath.py on that system).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2003-07-09 20:07

Message:
Logged In: YES 
user_id=357491

What happens if you try to create a file using Unicode names?  
Could a test get the temp directory for the platform, write a file 
with Unicode in it, and then check for an error?  Or if it always 
succeeds, write it, and then see if the results match?

In other words, does writing Unicode to an ASCII file system ever 
lead to a mangling of the name?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=767645&group_id=5470