[XML-SIG] Re: Re: Re: Re: cElementTree 0.8 (january 11, 2005)

Fredrik Lundh fredrik at pythonware.com
Sat Jan 15 15:09:10 CET 2005


Daniel Veillard wrote:

>  You have a python function calling a native function. That function returns
> a string. That C string is translated to a Python string by the wrapper
> using PyString_FromString(). That operation seems to be extremely expensive.

PyString basically boils down to:

    determine the length of the string
    call fast allocator
    copy string to area allocated by fast allocator

for UTF-8 data, the steps are:

    determine maximum possible length of the string
    call fast allocator
    copy string to area allocated by fast allocator, character
        by character.  handle UTF-8 code sequences.
    adjust size of allocated area, if necessary

cElementTree has to do all this for all strings in the document, of course, and
the time it takes is included in my parsing benchmark.  and I guess libxml2 is
doing something very similar, but using your own allocator and object layout.

but parsing is one thing, using the data from Python code is another.  to return
data to Python, all cElementTree has to do (in the normal case) is to return the
string object it created during the parse.  that's a pointer copy, not a buffer
copy.

libxml2, in contrast, has to copy the strings once again, using Python's allocator
and Python's string object layout.  and if you don't cache stuff, you end up doing
this every time someone accesses a node...

</F> 





More information about the XML-SIG mailing list