[Python-3000] string C API
Nick Coghlan
ncoghlan at gmail.com
Thu Sep 14 12:19:46 CEST 2006
Martin v. Löwis wrote:
> Jim Jewett schrieb:
>> Simply delegate such methods to a hidden per-encoding subclass.
>>
>> The UTF-8 methods will indeed be complex, unless the solution is
>> simply "someone called indexing/slicing/len, so I have to recode after
>> all."
>>
>> The Latin-1 encoding will have no such problem.
>
> I'm not so much worried about UTF-8 or Latin-1; they are fairly trivial.
> Efficiency of such methods for multi-byte encodings would be
> dramatically slow.
Only the first such call on a given string, though - the idea is to use lazy
decoding, not to avoid decoding altogether. Most manipulations (len, indexing,
slicing, concatenation, etc) would require decoding to at least UCS-2 (or
perhaps UCS-4).
It's applications that are just schlepping bits around that would benefit from
the lazy decoding behaviour.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-3000
mailing list