[Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

Stephen J. Turnbull stephen at xemacs.org
Wed Oct 26 05:31:36 CEST 2011


In general I agree with what you write, Terry.  One clarification and
one comment, though.

Terry Reedy writes:

 > The doc says "All functions accepting path or file names accept both 
 > bytes and string objects, and result in an object of the same type, if a 
 > path or file name is returned." It does that now, though it says nothing 
 > about the encoding assumed for input bytes or used for output
 > bytes.

That's determined by the OS, and figuring that out is the end user's
problem.

 > It does not mention raising exceptions, so doing so is a
 > feature-change that would likely break code. Currently, exceptional
 > situations are signalled with "'?' in returned_path" rather than
 > with an exception object. It ('?') is a bad choice of signal
 > though, given the other uses of '?' in paths.

True, but this isn't really Python's problem.  And IIUC Martin's post,
it is hardly "exceptional": isn't Python doing this, it's just
standard Windows behavior, which results in pathnames that are
perfectly acceptable to Windows APIs, but unreliable in use because
they have different semantics in different Windows APIs.  If that is
true, there are almost surely user programs that depend on this
behavior, even though it sucks.[1]

My original "hearty +1" was dependent on my understanding from
Victor's post that this substitution could cause later exceptions
because filename is invalid (eg, contains illegal characters causing
Windows to signal an error).  If that's not true, I think the proper
remedy is to add a strong warning to pylint that use of those APIs is
supported (eg, for interaction with existing programs that use them)
but that they require careful error-checking for robust use.

As a card-carrying Unicode nazi I wouldn't mind tagging the bytes APIs
with a DeprecationWarning but I know that proposal is going nowhere so
I withdraw it in advance. <wink>


Footnotes: 
[1]  Note that the original rationale for this was surely "since users
will have a very hard time using file names with this character in
them, using it as a substitution character internally will make the
problem evident and Sufficiently Smart Programs can deal with it."



More information about the Python-Dev mailing list