[I18n-sig] UCS-4 configuration

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 26 Jun 2001 23:15:19 +0200


I've now a patch on SF which does the autoconf machinery for the
proposed simultaneous support for narrow and wide Py_UNICODE
definitions. 

https://sourceforge.net/tracker/index.php?func=detail&aid=436496&group_id=5470&atid=305470

In particular

--enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses wchar_t
                      if it fits
--enable-unicode=ucs4 configures a wide Py_UNICODE likewise
--enable-unicode      configures Py_UNICODE to wchar_t if available,
                      and to UCS-4 if not; this is the default

The intention is that --disable-unicode, or --enable-unicode=no
removes the Unicode type altogether; this is not yet implemented
(it only defines a Py_USING_UNICODE macro that can be used to
wrap Unicode support).

With a wide Py_UNICODE, this patch also
- supports UTF-8 and UTF-16 encodings of the complete Unicode range
- supports unichr and \U literals:

>>> u"\U00102030"
u'\U00102030'
>>> len(u"\U00102030")
1
>>> u"\U00102030".encode("utf-8")
'\xf4\x82\x80\xb0'
>>> u"\U00102030".encode("utf-16")
'\xff\xfe\xc8\xdb0\xdc'

Regards,
Martin