[Python-Dev] 2.2 Unicode questions

Mark Hammond MarkH@ActiveState.com
Mon, 23 Jul 2001 16:02:41 -0700


> Guido van Rossum wrote:
> >
> > > First, a short one, Mark Hammond's patch for supporting MBCS on
> > > Windows.  I trust everyone can handle a little bit of TeX markup?
> > >
> > >   % XXX is this explanation correct?
> > >   \item When presented with a Unicode filename on Windows, Python will
> > >   now correctly convert it to a string using the MBCS encoding.
> > >   Filenames on Windows are a case where Python's choice of ASCII as
> > >   the default encoding turns out to be an annoyance.
> > >
> > >   This patch also adds \samp{et} as a format sequence to
> > >   \cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and
> > >   an encoding name, and converts it to the given encoding if the
> > >   parameter turns out to be a Unicode string, or leaves it alone if
> > >   it's an 8-bit string, assuming it to already be in the desired
> > >   encoding.  (This differs from the \samp{es} format character, which
> > >   assumes that 8-bit strings are in Python's default ASCII encoding
> > >   and converts them to the specified new encoding.)
> > >
> > >   (Contributed by Mark Hammond with assistance from Marc-Andr\'e
> > >   Lemburg.)
> >
> > I learned something here, so I hope this is correct. :-)
>
> The last part is... the rest is for Mark to comment on.

Sorry for the delay - I hope this reponse is not too late.  The description
is technically correct, but may be better phrased as:

\item When presented with a Unicode filename on Windows, Python will
now convert it to an MBCS encoded string, as used by the Microsoft
file APIs.  As MBCS is explicitly used by the file APIs,
the default Python encoding (be it ASCII or any other encoding
explicitly set) is generally not appropriate for these conversions.

Mark.