[XML-SIG] Re: Re: Re: cElementTree 0.8 (january 11, 2005)

Fri Jan 14 18:23:41 CET 2005

Daniel Veillard wrote:
> On Fri, Jan 14, 2005 at 11:43:04AM +0100, Fredrik Lundh wrote:
> 
>>Daniel Veillard wrote:
>>
>>> Seriously, with respect to performances one of the trouble I have seen when
>>>doing a bit of profiling is that interning strings, i.e. the process of
>>>taking string coming from C and turning them into Python string objects,
>>>to be extremely costly, I don't know if it's the hash function or the way
>>>the string hash works but it was one of the biggest cost when I tried
>>>(with python 2.3 or 2.2 I can't remember precisely when it was).
>>
>>in python, conversion and interning and hash calculations are three different
>>things, so I'm not sure what your problem really was.  but I'm curious. can
>>you elaborate?
> 
>   You have a python function calling a native function. That function returns
> a string. That C string is translated to a Python string by the wrapper 
> using PyString_FromString(). That operation seems to be extremely expensive.

That's nothing. It's even worse if you have to transform the UTF-8 
strings that libxml2 delivers into Python unicode strings.:)

By the way, I'm at least one of the persons Fredrik has been mailing 
with as concerning the speed comparisons, as I've been implement the 
ElementTree API on top of libxml2. This now works, without having to 
clean up your memory after yourself, and with unicode strings, etc. You 
can also do xpath and XSLT a lot more easily with lxml.etree, though 
especially XSLT support is still coming together.

lxml.etree is likely to be a lot slower than a more low-level binding at 
various operations, but it's a ton more convenient (aka "Pythonic"). You 
can do things like this:

 >>> from lxml import etree
 >>> tree = etree.parse('ot.xml')
 >>> tree.xpath('(//v)[5]/text()')
[u'And God called the light Day, and the darkness he called Night. And 
the evening and the morning were the first day.\n']

or, even this:

 >>> result = tree.xpath('(//v)[5]')
 >>> result[0].text = 'The day and night verse.'
 >>> tree.xpath('(//v)[5]/text()')
[u'The day and night verse.']

i.e. the result of xpath queries are ElementTree style objects and the 
whole XML tree is navigable using the ElementTree API.

Regards,

Martijn