[Python-Dev] Pre-PEP: Python Character Model

Andrew Kuchling akuchlin@mems-exchange.org
Wed, 7 Feb 2001 16:00:02 -0500


On Wed, Feb 07, 2001 at 12:49:15PM -0800, Paul Prescod quoted:
>>    The approach taken in the next version of Ruby is for all string and
>> regex objects to have an encoding attribute and for there to be
>> infrastructure to handle operations that combine encodings.

Any idea if this next version of Ruby is available in its current
state, or if it's vaporware?  It might be worth looking at what
exactly it implements, but I wonder if this is just Matz's idea and he
hasn't yet tried implementing it.

>We could support a thousand encodings internally but a Python programmer
>should never know or care which one they are dealing with. Which leads
>me to ask "what's the point"? Would the small performance gains be worth
>it?

I'd worry that implementing a regex engine for multiple encodings
would be impossible or, if possible, it would be quite slow because
you'd need to abstract every single character retrieval into a
function call that decodes a single character for a given encoding.
Massive surgery was required to make Perl handle UTF-8, for example,
and I don't know that Perl's engine is actually fully operational with
UTF-8 yet.

--amk