[Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API
Terry Reedy
tjreedy at udel.edu
Wed Oct 26 02:49:43 CEST 2011
On 10/25/2011 4:31 AM, Victor Stinner wrote:
> Le Mardi 25 Octobre 2011 09:09:56 vous avez écrit :
>>> I propose to raise Unicode errors if a filename cannot be decoded on
>>> Windows, instead of creating a bogus filenames with questions marks.
>>
>> Can you please elaborate what APIs you are talking about exactly?
>
> Basically, all functions processing filenames, so most functions of
> posixmodule.c. Some examples:
This seems way too broad. From you previous posts, I presumed that you
only propose to change behavior when the user asks for the bytes
versions of a unicode name that cannot be properly converted to a bytes
version.
> - os.listdir():
os.listdir(unicode) works fine and should not be changed.
os.listdir(bytes) is what OP of issue wants changed.
> FindFirstFileA, FindNextFileA, FindCloseA
There are not Python names. Are they Windows API names?
> - os.lstat(): CreateFileA
This does not create a path and should not be changed as far as I can see.
> - os.getcwdb():
This you might change.
> getcwd()
This should not be, as no bytes are involved.
> - os.mkdir(): CreateDirectoryA
> - os.chmod(): SetFileAttributesA
Like os.lstat, these accept only accept a path and should do what they
are supposed to do.
>> If it's the byte APIs (i.e. using bytes as file names), then I'm -1 on
>> this proposal. People that explicitly use bytes for file names deserve
>> to get whatever exact platform semantics the platform has to offer. This
>> is true on Unix, and it is also true on Windows.
>
> My proposition is a fix to user reported by a user:
> http://bugs.python.org/issue13247
>
> I want to keep the bytes API for backward compatibility, and it will still
> work for non-ASCII characters, but only for non-ASCII characters encodable to
> the ANSI code page.
>
> In practice, characters not encodable to the ANSI code page are very rare. For
> example: it's difficult to write such characters directly with the keyboard. I
> bet that very few people will notify the change.
Actually, Windows makes switching keyboard setups rather easy once you
enable the feature. It might be that people who routinely use non-'ansi'
characters in file and directory names do not routinely ask for bytes
versions thereof.
The doc says "All functions accepting path or file names accept both
bytes and string objects, and result in an object of the same type, if a
path or file name is returned." It does that now, though it says nothing
about the encoding assumed for input bytes or used for output bytes. It
does not mention raising exceptions, so doing so is a feature-change
that would likely break code. Currently, exceptional situations are
signalled with "'?' in returned_path" rather than with an exception
object. It ('?') is a bad choice of signal though, given the other uses
of '?' in paths.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list