[Python-3000] string C API

Antoine solipsis at pitrou.net
Thu Sep 14 14:48:56 CEST 2006


> Only the first such call on a given string, though - the idea is to use
> lazy
> decoding, not to avoid decoding altogether. Most manipulations (len,
> indexing,
> slicing, concatenation, etc) would require decoding to at least UCS-2 (or
> perhaps UCS-4).

My two cents:

For len() you can compute the length at string construction and store it
in the string object (which is immutable). For example if the string is
constructed by concatenation then computing the resulting length should be
trivial. Even when real computation is needed, it plays nicer with the CPU
cache since the data has to be there anyway.

As for concatenation, recoding can be avoided if the strings to be
concatenated use the same internal encoding (assuming it does not hold
internal state). Given that in many cases the strings will come from
similar sources (thus use the same internal encoding), it may be an
interesting optimization.

Regards

Antoine.




More information about the Python-3000 mailing list