Internal Format (Re: [Python-Dev] Internationalization Toolkit)
Wed, 10 Nov 1999 13:36:44 +0100
Fredrik Lundh wrote:
> > What I don't like is using wchar_t if available (and then addressing
> > it as if it were defined as unsigned integer). IMO, it's better
> > to define a Python Unicode representation which then gets converted
> > to whatever wchar_t represents on the target machine.
> you should read the unicode.h file a bit more carefully:
> /* Unicode declarations. Tweak these to match your platform */
> /* set this flag if the platform has "wchar.h", "wctype.h" and the
> wchar_t type is a 16-bit unsigned type */
> #define HAVE_USABLE_WCHAR_H
> #if defined(WIN32) || defined(HAVE_USABLE_WCHAR_H)
> (this uses wchar_t, and also iswspace and friends)
> /* Use if you have a standard ANSI compiler, without wchar_t support.
> If a short is not 16 bits on your platform, you have to fix the
> typedef below, or the module initialization code will complain. */
> (this maps iswspace to isspace, for 8-bit characters).
> the plan was to use the second solution (using "configure"
> to figure out what integer type to use), and its own uni-
> code database table for the is/to primitives
Oh, I did read unicode.h, stumbled across the mixed usage
and decided not to like it ;-)
Seriously, I find the second solution where you use the 'unsigned
short' much more portable and straight forward. You never know what
the compiler does for isw*() and it's probably better sticking
to one format for all platforms. Only endianness gets in the way,
but that's easy to handle.
So I opt for 'unsigned short'. The encoding used in these 2 bytes
is a different question though. If HP insists on Unicode 3.0, there's
probably no other way than to use UTF-16.
> (iirc, the unicode.txt file discussed this, but that one
> seems to be missing from the zip archive).
It's not in the file I downloaded from your site. Could you post
it here ?
Y2000: 51 days left
Python Pages: http://www.lemburg.com/python/