[Python-Dev] PEP 277 (unicode filenames): please review
Guido van Rossum
guido@python.org
Tue, 13 Aug 2002 10:51:40 -0400
> > Aha! So MBCS is not an encoding: it's an indirection for a variety of
> > encodings. (Is there a way to find out what the encoding is?)
>
> Correct. In Python, locale.getdefaultlocale()[1] returns the encoding;
> the underlying API function is GetACP, and Python uses it as
>
> PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP());
>
> There is a second indirection, the "OEM code page", which they use:
> - for on-disk FAT short file names,
> - for the cmd.exe window
>
> Python currently offers no access to GetOEMCP().
>
> > Do you mean that the condition on
> >
> > #if defined(HAVE_LANGINFO_H) && defined(CODESET)
> >
> > is reliably false on Windows? Otherwise _locale.setlocale() could set
> > it.
>
> Correct. nl_langinfo is a Sun invention (I believe) which made it into
> Posix; Microsoft ignores it.
>
> > So as long as they use 8-bit it's not our problem, right. Another
> > reason to avoid prodicing Unicode without a clue that the app expects
> > Unicode (alas). (BTW I find a Unicode argument to os.listdir() a
> > sufficient clue. IOW os.listdir(u".") should return Unicode.)
>
> Indeed, that would be consistent. I deliberately want to leave this
> out of PEP 277. On Unix, things are not that clear - as Jack points
> out, readlink() and getcwd() also need consideration.
>
> > > Ok, I'll update the PEP.
> >
> > To what? (It would be bad if I convinced you at the same time you
> > convinced me of the opposite. :-)
>
> I haven't changed anything yet, and I won't.
>
> In this terrain, Windows has the cleaner API (they consider file names
> as character strings, not as byte strings), so doing the right thing
> is easier.
OK. I leave this further in your capable hands, Martin!
--Guido van Rossum (home page: http://www.python.org/~guido/)