[Python-Dev] PEP 277 (unicode filenames): please review
Martin v. Loewis
martin@v.loewis.de
13 Aug 2002 16:45:36 +0200
Guido van Rossum <guido@python.org> writes:
> Aha! So MBCS is not an encoding: it's an indirection for a variety of
> encodings. (Is there a way to find out what the encoding is?)
Correct. In Python, locale.getdefaultlocale()[1] returns the encoding;
the underlying API function is GetACP, and Python uses it as
PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP());
There is a second indirection, the "OEM code page", which they use:
- for on-disk FAT short file names,
- for the cmd.exe window
Python currently offers no access to GetOEMCP().
> Do you mean that the condition on
>
> #if defined(HAVE_LANGINFO_H) && defined(CODESET)
>
> is reliably false on Windows? Otherwise _locale.setlocale() could set
> it.
Correct. nl_langinfo is a Sun invention (I believe) which made it into
Posix; Microsoft ignores it.
> So as long as they use 8-bit it's not our problem, right. Another
> reason to avoid prodicing Unicode without a clue that the app expects
> Unicode (alas). (BTW I find a Unicode argument to os.listdir() a
> sufficient clue. IOW os.listdir(u".") should return Unicode.)
Indeed, that would be consistent. I deliberately want to leave this
out of PEP 277. On Unix, things are not that clear - as Jack points
out, readlink() and getcwd() also need consideration.
> > Ok, I'll update the PEP.
>
> To what? (It would be bad if I convinced you at the same time you
> convinced me of the opposite. :-)
I haven't changed anything yet, and I won't.
In this terrain, Windows has the cleaner API (they consider file names
as character strings, not as byte strings), so doing the right thing
is easier.
Regards,
Martin