[Python-Dev] New Py_UNICODE doc

Shane Hathaway shane at hathawaymix.org
Sat May 7 11:14:21 CEST 2005


Martin v. Löwis wrote:
> Shane Hathaway wrote:
>>More generally, how should a non-unicode-expert writing Python extension
>>code find out the minimum they need to know about unicode to use the
>>Python unicode API?  The API reference [1] ought to at least have a list
>>of background links.  I had to hunt everywhere.
> 
> That, of course, depends on what your background is. Did you know what
> Latin-1 is, when you started? How it relates to code page 1252? What
> UTF-8 is? What an abstract character is, as opposed to a byte sequence
> on the one hand, and to a glyph on the other hand?
>
> Different people need different background, especially if they are
> writing different applications.

Yes, but the first few steps are the same for nearly everyone, and
people need more help taking the first few steps.  In particular:

- The Python docs link to unicode.org, but unicode.org is complicated,
long-winded, and leaves many questions unanswered.  The Wikipedia
article is far better.  I wish I had thought to look there instead.

  http://en.wikipedia.org/wiki/Unicode

- The docs should say what to expect to happen when a large unicode
character winds up in a Py_UNICODE array.  For instance, what is
len(u'\U00012345')?  1 or 2?  Does the answer depend on the UCS4
compile-time switch?

- The docs should help developers evaluate whether they need the UCS4
compile-time switch.  Is UCS2 good enough for Asia?  For math?  For
hieroglyphics? <wink>

Shane


More information about the Python-Dev mailing list