Internal Format (Re: [Python-Dev] Internationalization Toolkit)

M.-A. Lemburg
Wed, 10 Nov 1999 13:36:44 +0100

Fredrik Lundh wrote:
> > What I don't like is using wchar_t if available (and then addressing
> > it as if it were defined as unsigned integer). IMO, it's better
> > to define a Python Unicode representation which then gets converted
> > to whatever wchar_t represents on the target machine.
> you should read the unicode.h file a bit more carefully:
> ...
> /* Unicode declarations. Tweak these to match your platform */
> /* set this flag if the platform has "wchar.h", "wctype.h" and the
>    wchar_t type is a 16-bit unsigned type */
> #if defined(WIN32) || defined(HAVE_USABLE_WCHAR_H)
>     (this uses wchar_t, and also iswspace and friends)
> ...
> #else
> /* Use if you have a standard ANSI compiler, without wchar_t support.
>    If a short is not 16 bits on your platform, you have to fix the
>    typedef below, or the module initialization code will complain. */
>     (this maps iswspace to isspace, for 8-bit characters).
> #endif
> ...
> the plan was to use the second solution (using "configure"
> to figure out what integer type to use), and its own uni-
> code database table for the is/to primitives

Oh, I did read unicode.h, stumbled across the mixed usage
and decided not to like it ;-)

Seriously, I find the second solution where you use the 'unsigned
short' much more portable and straight forward. You never know what
the compiler does for isw*() and it's probably better sticking
to one format for all platforms. Only endianness gets in the way,
but that's easy to handle.

So I opt for 'unsigned short'. The encoding used in these 2 bytes
is a different question though. If HP insists on Unicode 3.0, there's
probably no other way than to use UTF-16.
> (iirc, the unicode.txt file discussed this, but that one
> seems to be missing from the zip archive).

It's not in the file I downloaded from your site. Could you post
it here ?

Marc-Andre Lemburg
Y2000:                                                    51 days left
Python Pages: