[Python-Dev] New Py_UNICODE doc
Shane Hathaway
shane at hathawaymix.org
Sat May 7 11:14:21 CEST 2005
Martin v. Löwis wrote:
> Shane Hathaway wrote:
>>More generally, how should a non-unicode-expert writing Python extension
>>code find out the minimum they need to know about unicode to use the
>>Python unicode API? The API reference [1] ought to at least have a list
>>of background links. I had to hunt everywhere.
>
> That, of course, depends on what your background is. Did you know what
> Latin-1 is, when you started? How it relates to code page 1252? What
> UTF-8 is? What an abstract character is, as opposed to a byte sequence
> on the one hand, and to a glyph on the other hand?
>
> Different people need different background, especially if they are
> writing different applications.
Yes, but the first few steps are the same for nearly everyone, and
people need more help taking the first few steps. In particular:
- The Python docs link to unicode.org, but unicode.org is complicated,
long-winded, and leaves many questions unanswered. The Wikipedia
article is far better. I wish I had thought to look there instead.
http://en.wikipedia.org/wiki/Unicode
- The docs should say what to expect to happen when a large unicode
character winds up in a Py_UNICODE array. For instance, what is
len(u'\U00012345')? 1 or 2? Does the answer depend on the UCS4
compile-time switch?
- The docs should help developers evaluate whether they need the UCS4
compile-time switch. Is UCS2 good enough for Asia? For math? For
hieroglyphics? <wink>
Shane
More information about the Python-Dev
mailing list