[Python-Dev] Divorcing str and unicode (no more implicit conversions).

Mon Oct 3 15:26:55 CEST 2005

Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> Antoine Pitrou wrote:
> 
> > A good rule of thumb is to convert to unicode everything that is
> > semantically textual
> 
> and isn't pure ASCII.

How can you be sure that something that is /semantically textual/ will
always remain "pure ASCII" ? That's contradictory, unless your software
never goes out of the anglo-saxon world (and even...).

> (anyone who are tempted to argue otherwise should benchmark their
> applications, both speed- and memorywise, and be prepared to come
> up with very strong arguments for why python programs shouldn't be
> allowed to be fast and memory-efficient whenever they can...)

I think most applications don't critically depend on text processing
performance. OTOH, international adaptability is the kind of thing
that /will/ bite you one day if you don't prepare for it at the
beginning.

Also, if necessary, the distinction could be an implementation detail
and the conversion be transparent (like int vs. long): the text would be
coded in an 8-bit charset as long as possible and converted to a wide
encoding only when necessary. The important thing is that these
optimisations, if they are necessary, should be transparently handled by
the Python runtime.

(it seems to me - I may be mistaken - that modern Windows versions treat
every string as 16-bit unicode internally. Why are they doing it if it
is that inefficient?)

Regards

Antoine.