[XML-SIG] Re: Re: Re: cElementTree 0.8 (january 11, 2005)
Martijn Faassen
faassen at infrae.com
Fri Jan 14 18:23:41 CET 2005
Daniel Veillard wrote:
> On Fri, Jan 14, 2005 at 11:43:04AM +0100, Fredrik Lundh wrote:
>
>>Daniel Veillard wrote:
>>
>>> Seriously, with respect to performances one of the trouble I have seen when
>>>doing a bit of profiling is that interning strings, i.e. the process of
>>>taking string coming from C and turning them into Python string objects,
>>>to be extremely costly, I don't know if it's the hash function or the way
>>>the string hash works but it was one of the biggest cost when I tried
>>>(with python 2.3 or 2.2 I can't remember precisely when it was).
>>
>>in python, conversion and interning and hash calculations are three different
>>things, so I'm not sure what your problem really was. but I'm curious. can
>>you elaborate?
>
> You have a python function calling a native function. That function returns
> a string. That C string is translated to a Python string by the wrapper
> using PyString_FromString(). That operation seems to be extremely expensive.
That's nothing. It's even worse if you have to transform the UTF-8
strings that libxml2 delivers into Python unicode strings.:)
By the way, I'm at least one of the persons Fredrik has been mailing
with as concerning the speed comparisons, as I've been implement the
ElementTree API on top of libxml2. This now works, without having to
clean up your memory after yourself, and with unicode strings, etc. You
can also do xpath and XSLT a lot more easily with lxml.etree, though
especially XSLT support is still coming together.
lxml.etree is likely to be a lot slower than a more low-level binding at
various operations, but it's a ton more convenient (aka "Pythonic"). You
can do things like this:
>>> from lxml import etree
>>> tree = etree.parse('ot.xml')
>>> tree.xpath('(//v)[5]/text()')
[u'And God called the light Day, and the darkness he called Night. And
the evening and the morning were the first day.\n']
or, even this:
>>> result = tree.xpath('(//v)[5]')
>>> result[0].text = 'The day and night verse.'
>>> tree.xpath('(//v)[5]/text()')
[u'The day and night verse.']
i.e. the result of xpath queries are ElementTree style objects and the
whole XML tree is navigable using the ElementTree API.
Regards,
Martijn
More information about the XML-SIG
mailing list