[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

"Martin v. Löwis" martin at v.loewis.de
Tue Sep 30 22:45:55 CEST 2008


> I'm not sure either way. I've heard it claim that Windows filesystem
> APIs use Unicode natively. Does Python 3.0 on Windows currently
> support filenames expressed as bytes?

Yes, it does (at least, os.open, os.stat support them, builtin open
doesn't).

> Are they encoded first before
> passing to the Unicode APIs? Using what encoding?

They aren't passed to the Unicode (W) APIs (by Python). Instead, they
are passed to the "ANSI" (A) APIs (i.e. CP_ACP APIs). On Windows NT+,
that API then converts it to Unicode through the CP_ACP (aka "mbcs")
encoding; this is inside the system DLLs.

CP_ACP is a lossy encoding (from Unicode to bytes): Microsoft uses
replacement characters if they can, starting with similarly-looking
characters, and falling back to question marks.

Regards,
Martin



More information about the Python-Dev mailing list