Pure python implementation of string-like class

Akihiro KAYAMA kayama at st.rim.or.jp
Sat Feb 25 12:00:53 EST 2006


Hi And.

In article <1140878638.666329.316900 at u72g2000cwu.googlegroups.com>,
and-google at doxdesk.com writes:

and-google> Akihiro KAYAMA wrote:
and-google> > As the character set is wider than UTF-16(U+10FFFF), I can't use
and-google> > Python's native unicode string class.
and-google> 
and-google> Have you tried using Python compiled in Wide Unicode mode
and-google> (--enable-unicode=ucs4)? You get native UTF-32/UCS-4 strings then,
and-google> which should be enough for most purposes.

>From my quick survey, Python's Unicode support is restricted to
UTF-16 range(U+0000...U+10FFFF) intentionally, regardless of
--enable-unicode=ucs4 option. 

> Python 2.4.1 (#2, Sep  3 2005, 22:35:47) 
> [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
> Type "help", "copyright", "credits" or "license" for more information.
> >>> u"\U0010FFFF"
> u'\U0010ffff'
> >>> len(u"\U0010FFFF")
> 1
> >>> u"\U00110000"
> UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character

Simple patch to unicodeobject.c which disables unicode range checking
could solve this, but I don't want to maintenance specialized Python
binary for my project.

-- kayama



More information about the Python-list mailing list