[Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API
Victor Stinner
victor.stinner at haypocalc.com
Tue Oct 25 10:22:26 CEST 2011
Le Mardi 25 Octobre 2011 13:20:12 vous avez écrit :
> Victor Stinner writes:
> > I propose to raise Unicode errors if a filename cannot be decoded
> > on Windows, instead of creating a bogus filenames with questions
> > marks.
>
> By "bogus" you mean "sometimes (?) invalid and the OS will refuse to
> use them, causing a later hard-to-diagnose exception", rather than
> "not what the user thinks he wants", right?
If the ("Unicode") filename cannot be encoded to the ANSI code page, which is
usually a small charset (e.g. cp1252 contains 256 code points), Windows
replaces unencodable characters by question marks.
Imagine that the code page is ASCII, the ("Unicode") filename "hého.txt" will
be encoded to b"h?ho.txt". You can display this string in a dialog, but you
cannot open the file to read its content... If you pass the filename to
os.listdir(), it is even worse because "?" is interpreted ("?" means any
character, it's a pattern to match a filename).
I would like to raise an error on such situation, because currently the user
cannot be noticed otherwise. The user may search "?" in the filename, but
Windows replaces also unencodable characters by *similar glyph* (e.g. "é"
replaced by "e").
> In the "hard errors" case, a hearty +1 (I'm dealing with this in an
> experimental version of XEmacs and it's a right PITA if the codec
> doesn't complain).
If you use MultiByteToWideChar and WideCharToMultiByte, you can be noticed on
error using some flags, but functions of the ANSI API doesn't give access to
these flags...
> Backward compatibility is important, but here the
> costs of fixing such bugs outweigh the value of bug-compatibility.
I only want to change how unencodable filenames are handled, the bytes API will
still be available. If you filesystem has the "8dot3name" feature enable, it
may work even for unencodable filenames (Windows generates names like
HEHO~1.TXT).
Victor
More information about the Python-Dev
mailing list