> usually shorter in languages with many ideographs (my non-scientific
> tests indicate that chinese text uses about 4 times less symbols than
> english; I'm sure someone can dig up better figures).

This is why I am not especially enamored of Unicode and the prospect of 
Python becoming married to it.  It is heavily weighted in favor of 
efficiently representing Chinese and inefficiently representing English. 
To give English equivalent treatment, the 20,000 or so most common words, 
roots, prefixes, and suffixes would each get its own codepoint.

