[Python-3000] string C API
Jason Orendorff
jason.orendorff at gmail.com
Fri Sep 15 18:22:30 CEST 2006
On 9/15/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> There should be only one reference to a string until is constructed,
> and after that, its data should be immutable. Recoding that results
> in different bytes should not be in-place. Either it returns a new
> string (no problem) or it doesn't change the databuffer-and-encoding
> pointer until the new databuffer is fully constructed.
Yes, but then having, say, a Latin-1 string, and repeatedly using it
in places where UTF-16 is needed, causes you to repeat the decoding
operation. The optimization becomes a pessimization.
Here I'm imagining things like taking len(s) of a UTF-8 string, or
s==u where u happens to be UTF-16. You only have to do this once or
twice per string to start losing.
Also, having two different classes of strings means fewer felicitous
cases of x==y, where the result is True, being just a pointer
comparison. This might matter in dictionaries: imagine a dictionary
created as a literal and then used to look up key strings read from a
file.
> [Nick Coghlan wrote:]
> > [...] the
> > application is free to decouple the "reading" and "decoding" steps, and just
> > transfer raw bytes between the streams.
>
> So adding boilerplate to treat text as bytes "for efficiency" may
> become a standard recipe? Not so good.
I'm sure this will happen to the same degree that it's become a
standard recipe in Java and C# (both of which lack polymorphic
whatzits). Which is to say, not at all.
-j
More information about the Python-3000
mailing list