Re: [Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions

14 Sep 2004

      On Sep 14, 2004, at 2:54 AM, Terry Reedy wrote:
...
This is why I am not especially enamored of Unicode and the prospect of
Python becoming married to it.  It is heavily weighted in favor of
efficiently representing Chinese and inefficiently representing 
English.
To give English equivalent treatment, the 20,000 or so most common 
words,
roots, prefixes, and suffixes would each get its own codepoint.
Of course it is perfectly possible to have the Python unicode 
implementation choose to represent some unicode strings with only 8 
bits per character. There is no (conceptual) reason it could not 
represent (u'a' * 8) with 8 bytes + class header overhead. That is 
simply an implementation detail and really has nothing to do with 
Unicode itself.

It would also be possible to use UTF-8 string storage, although this 
has the tradeoff that indexing an element takes linear time w.r.t. 
position instead of constant time.

James

Re: [Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions

James Y Knight