[Python-3000] string C API

"Martin v. Löwis" martin at v.loewis.de
Sat Sep 16 15:43:36 CEST 2006


Josiah Carlson schrieb:
>> That places a burden on all creators of strings to ensure
>> that they are in the minimal format, which could be
>> inconvenient for some operations, e.g. taking a substring
>> could require making an extra pass to re-code the data.
> 
> If Martin says it's not a big deal, I'm not really all that concerned.

I was thinking about codecs specifically: they often need to make
multiple passes anyway.

In general, only measurements can tell the performance impacts of
some design decision (e.g. it's non-obvious how often the various
string operations occur, and what the performance impact is).

There is also an issue of convenience here; however, with three
different representations, library functions could be provided
to support all cases.

> It is ultimately about space savings, and in the case of names (since
> all will be 8-bit), perhaps even a bit faster to look up in the
> interning table (I believe it is easier to hash 8 chars than 8 shorts).

That you need to demonstrate through profiling. First, strings likely
continue to keep their hash, and then it seems plausible that the cost
for hashing is in the computation and the loop, not in the memory
access, and that the computation is carried out in 32-bit registers
regardless of character width.

Regards,
Martin


More information about the Python-3000 mailing list