[Python-Dev] PEP 277 (unicode filenames): please review

Guido van Rossum guido@python.org
Tue, 13 Aug 2002 10:51:40 -0400


> > Aha!  So MBCS is not an encoding: it's an indirection for a variety of
> > encodings.  (Is there a way to find out what the encoding is?)
> 
> Correct. In Python, locale.getdefaultlocale()[1] returns the encoding;
> the underlying API function is GetACP, and Python uses it as
> 
>     PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP());
> 
> There is a second indirection, the "OEM code page", which they use:
> - for on-disk FAT short file names,
> - for the cmd.exe window
> 
> Python currently offers no access to GetOEMCP().
> 
> > Do you mean that the condition on
> > 
> > #if defined(HAVE_LANGINFO_H) && defined(CODESET)
> > 
> > is reliably false on Windows?  Otherwise _locale.setlocale() could set
> > it.
> 
> Correct. nl_langinfo is a Sun invention (I believe) which made it into
> Posix; Microsoft ignores it.
> 
> > So as long as they use 8-bit it's not our problem, right.  Another
> > reason to avoid prodicing Unicode without a clue that the app expects
> > Unicode (alas).  (BTW I find a Unicode argument to os.listdir() a
> > sufficient clue.  IOW os.listdir(u".") should return Unicode.)
> 
> Indeed, that would be consistent. I deliberately want to leave this
> out of PEP 277. On Unix, things are not that clear - as Jack points
> out, readlink() and getcwd() also need consideration.
> 
> > > Ok, I'll update the PEP.
> > 
> > To what?  (It would be bad if I convinced you at the same time you
> > convinced me of the opposite. :-)
> 
> I haven't changed anything yet, and I won't. 
> 
> In this terrain, Windows has the cleaner API (they consider file names
> as character strings, not as byte strings), so doing the right thing
> is easier.

OK.  I leave this further in your capable hands, Martin!

--Guido van Rossum (home page: http://www.python.org/~guido/)