[Python-3000] string C API
"Martin v. Löwis"
martin at v.loewis.de
Tue Oct 3 21:33:33 CEST 2006
Jim Jewett schrieb:
> By knowing that there is only one possible representation for a given
> string, he skips the equivalency cache. On the other hand, he also
> loses the equivalency cache.
What is an equivalency cache, and why would one like to have one?
> When python 2.x chooses the unicode
> width, it tries to match tcl; under a "minimal size possible" scheme,
> strings that fit in ASCII will have to be recoded twice on every round
> trip. The same problem pops up with other extension modules, and with
> system encodings.
In _tkinter, strings have to be copied *always*, whether they use the
same representation or a different one. Tcl requires strings to be
represented in a TclObj; you cannot pass a Python string object directly
into Tcl. As you have to copy, anyway, it doesn't matter if you do
size conversions in the process.
> By exposing the full object insted of the abstract interface,
> compilers can do pointer addition instead of calling a get_data
> function. But they still don't know (until run time) how wide the
> data at that pointer will be, and we're locked into binary
That's not true. The internal representation of objects can and did
change across releases. People have to and will recompile their
extension modules for a new feature release.
>> I doubt any kind of "pluggable" representation could work in a
>> reasonable way. With that generality, you lose any information
>> as to what the internal representation is, and then code becomes
>> tedious to write and slow to run.
> Instead of working with ((string)obj).data directly, you work with
> string.recode(object, desired)
... causing a copy of the data, right? This is expensive.
> If you're saying this will be slow because it is a C function call,
> then I can't really argue; I just think it will be a good trade for
> all the times we don't recode at all (or recode only once/encoding).
It's not the function call that makes it slow. It's the copying of
potentially large string data that a recoding requires. In addition,
for some encodings, the algorithm to do the transformation is
> I'll admit that I'm not sure what sort of data would make a real-world
> (as opposed to contrived) benchmark.
Any kind of text application will suffer if strings get constantly
More information about the Python-3000