PEP 277 (unicode filenames): please review

Martin v. Löwis loewis at informatik.hu-berlin.de
Mon Aug 12 10:55:51 EDT 2002


"Terry Reedy" <tjreedy at udel.edu> writes:

> > The PEP describes a Windows-only change to Unicode in file names:
> > On Windows NT/2k/XP, Python would allow arbitrary Unicode strings
> > as file names and pass them to the OS, instead of converting them
> > to CP_ACP first. This applies to open() and all os functions that
> > accept filenames.
> 
> Does 'CP_ACP' == ''mbcs' encoding'?  (Never heard of either.)

Yes. Microsoft has the "ANSI code page" (CP_ACP), which can be
multi-byte for some encodings, and so the API function is called
MultiByteToWideChar*. They also have the "OEM code page" (CP_OEM),
which also can be multi-byte, hence the Python "mbcs" is somewhat of a
misnomer.

In any case, CP_ACP is what the Win32 *A functions expect,
e.g. CreateFileA. On NT, those functions internally convert it back to
Unicode, and invoke the *W function (i.e. CreateFileW).

> Question: is it NT+ only because other OSes don't (yet) allow unicode
> filenames (in which case this is trial run for future when they do) or
> because access to such is transparent?

It's because W9x does not really support the *W functions. I'm not
quite sure what "not really" means - I believe that for some value of
x, the *W versions fail in every case. For some higher value of x, it
might be that they convert the Unicode string to CP_ACP and invoke the
*A version.

> 1.  Will this break any code?  If so, need transition plan.

Not that I'm aware of. If applications use the features, i.e. pass
Unicode strings to os.listdir, they get Unicode strings back. It might
be that they then try to use these Unicode strings in contexts that
are not Unicode-aware - however, it was the choice of the application
to pass Unicode to listdir in the first place.

> 2. What does 'import <non-latin-unicode-name>' do?  I presume
> exception, but which?

You currently can't write that, not even with PEP 263. So it is a
syntax error, even before you look at the file system whether you had
<non-latin-unicode-name>.py somewhere.

> I suspect this PEP will increase pressure for unicode identifiers.

That might be the case - but the main application will be names of
application files, not names of source code files.

Thanks for your comments,
Martin



More information about the Python-list mailing list