[Python-Dev] Re: Internal Format

Greg Stein gstein@lyra.org
Thu, 11 Nov 1999 01:18:52 -0800 (PST)


On Wed, 10 Nov 1999, Fredrik Lundh wrote:
> Marc-Andre writes:
> 
>     The internal format for Unicode objects should either use a Python
>     specific fixed cross-platform format <PythonUnicode> (e.g. 2-byte
>     little endian byte order) or a compiler provided wchar_t format (if
>     available). Using the wchar_t format will ease embedding of Python in
>     other Unicode aware applications, but will also make internal format
>     dumps platform dependent. 
> 
> having been there and done that, I strongly suggest
> a third option: a 16-bit unsigned integer, in platform
> specific byte order (PY_UNICODE_T).  along all other
> roads lie code bloat and speed penalties...

I agree 100% !!

wchar_t will introduce portability issues right on up into the Python
level. The byte-order introduces speed issues and OS interoperability
issues, yet solves no portability problems (Byte Order Marks should still
be present and used).

There are two "platforms" out there that use Unicode: Win32 and Java. They
both use UCS-2, AFAIK.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/