[Python-Dev] Unicode and Windows

Mark Hammond mhammond@skippinet.com.au
Tue, 21 Mar 2000 09:48:06 -0800


>
> Right. The idea with open() was to write a special version (using
> #ifdefs) for use on Windows platforms which does all the needed
> magic to convert Unicode to whatever the native format and locale
> is...

That works for open() - but what about other extension modules?

This seems to imply that any Python extension on Windows that wants to pass
a Unicode string to an external function can not use PyArg_ParseTuple() with
anything other than "O", and perform the magic themselves.

This just seems a little back-to-front to me.  Platforms that have _no_
native Unicode support have useful utilities for working with Unicode.
Platforms that _do_ have native Unicode support can not make use of these
utilities.  Is this by design, or simply a sad side-effect of the design?

So - it is trivial to use Unicode on platforms that dont support it, but
quite difficult on platforms that do.

> Using parser markers for this is obviously *not* the right way
> to get to the core of the problem. Basically, you will have to
> write a helper which takes a string, Unicode or some other
> "t" compatible object as name object and then converts it to
> the system's view of things.

Why "obviously"?  What on earth does the existing mechamism buy me on
Windows, other than grief that I can not use it?

> I think we had a private discussion about this a few months ago:
> there was some way to convert Unicode to a platform independent
> format which then got converted to MBCS -- don't remember the details
> though.

There is a Win32 API function for this.  However, as you succinctly pointed
out, not many people are going to be aware of its name, or how to use the
multitude of flags offered by these conversion functions, or know how to
deal with the memory management, etc.

> Can't you use the wchar_t interfaces for the task (see
> the unicodeobject.h file for details) ? Perhaps you can
> first transfer Unicode to wchar_t and then on to MBCS
> using a win32 API ?!

Sure - I can.  But can everyone who writes interfaces to Unicode functions?
You wrote the Python Unicode support but dont know its name - pity the poor
Joe Average trying to write an extension.

It seems to me that, on Windows, the Python Unicode support as it stands is
really internal.  I can not think of a single time that an extension writer
on Windows would ever want to use the "t" markers - am I missing something?
I dont believe that a single Unicode-aware function in the Windows
extensions (of which there are _many_) could be changed to use the "t"
markers.

It still seems to me that the Unicode support works well on platforms with
no Unicode support, and is fairly useless on platforms with the support.  I
dont believe that any extension on Windows would want to use the "t"
marker - so, as Fred suggested, how about providing something for us that
can help us interface to the platform's Unicode?

This is getting too hard for me - I will release my windows registry module
without Unicode support, and hope that in the future someone cares enough to
address it, and to add a large number of LOC that will be needed simply to
get Unicode talking to Unicode...

Mark.