[Python-3000] string C API

Thu Sep 14 12:19:46 CEST 2006

Martin v. Löwis wrote:
> Jim Jewett schrieb:
>> Simply delegate such methods to a hidden per-encoding subclass.
>>
>> The UTF-8 methods will indeed be complex, unless the solution is
>> simply "someone called indexing/slicing/len, so I have to recode after
>> all."
>>
>> The Latin-1 encoding will have no such problem.
> 
> I'm not so much worried about UTF-8 or Latin-1; they are fairly trivial.
> Efficiency of such methods for multi-byte encodings would be
> dramatically slow.

Only the first such call on a given string, though - the idea is to use lazy 
decoding, not to avoid decoding altogether. Most manipulations (len, indexing, 
slicing, concatenation, etc) would require decoding to at least UCS-2 (or 
perhaps UCS-4).

It's applications that are just schlepping bits around that would benefit from 
the lazy decoding behaviour.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org