[Python-3000] string C API

Nick Coghlan ncoghlan at gmail.com
Fri Sep 15 15:29:58 CEST 2006


Martin v. Löwis wrote:
> Nick Coghlan schrieb:
>> Only the first such call on a given string, though - the idea is to use
>> lazy decoding, not to avoid decoding altogether. Most manipulations
>> (len, indexing, slicing, concatenation, etc) would require decoding to
>> at least UCS-2 (or perhaps UCS-4).
> 
> Ok. Then my objection is this: What about errors that occur in decoding?
> What happens if the bytes are not meaningful in the presumed encoding?
> 
> ISTM that raising the exception lazily (which seems to be necessary)
> would be very confusing.

Yeah, it appears it would be necessary to at least *scan* the string when it 
was first created in order to ensure it can be decoded without errors later on.

I also realised there is another issue with an internal representation that 
can change over the life of a string, which is that of thread-safety.

Since strings don't currently have any mutable internal state, it's possible 
to freely share them between threads (without this property, the interning 
behaviour would be doomed).

If strings could change the encoding of their internal buffers then they'd 
have to use a read/write lock internally on all operations that might be 
affected when the internal representation changes. Blech.

Far, far simpler is the idea of supporting only latin-1, UCS-2 and UCS-4 as 
internal representations, and choosing which one to use when the string is 
created.

Sure certain applications that are just copying from one data stream to 
another (both in the same encoding) may needlessly decode and then re-encode 
the data, but if the application *knows* that this might happen (and has 
reason to care about optimising the performance of this case), then the 
application is free to decouple the "reading" and "decoding" steps, and just 
transfer raw bytes between the streams.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list