[Tutor] ElementTree: finding a tag with specific attribute
Kent Johnson
kent37 at tds.net
Sat Sep 17 00:48:57 CEST 2005
Kent Johnson wrote:
> Bernard Lebel wrote:
>
>>Hello,
>>
>>With ElementTree, can you search a tag under an Element by not only
>>specifying the tag label, but also some tag attribute values? That
>>would be in the case where I have several tags with the same label but
>>with various attribute values.
>
> FLASH: I just found PDIS XPath which evaluates XPath expressions against ElementTree trees!
> http://pdis.hiit.fi/pdis/download/
Here is a complete program that reads your file and uses pdis.xpath to dig out the value of a single parameter:
from elementtree import ElementTree
from pdis.xpath import compile
doc = ElementTree.parse('Camera_Root_bernard.xml')
path = compile('/root/sceneobject[@type="CameraRoot"]/localproperties/property[@name="Visibility"]/parameters/parameter[@scriptname="shdw"]')
node = path.evaluate(doc.getroot())[0]
print node
print node.text
>
>
>>I'm looking for something a bit like BeautifulSoup, like:
>>
>>oTag = oElement.find( 'taglabel', { 'value' : 'xx' } )
>>
>>
>>Btw in case you wonder, I don't use BeautifulSoup because somehow it
>>takes 20-30 seconds to parse a 2000-line xml file, and I don't know
>>why. ElementTree is proving very performing.
>
>
> Would you send me privately a copy of your file and your code that reads it with BS? I'm curious why this takes so long.
I took a bit of a look at this using the Python profiler. If anyone is interested, here is the main program to generate the profile results:
import BeautifulSoup, profile
sFile = r'Camera_Root_bernard.xml'
def reader():
oFile = file( sFile, 'r' )
oSoup = BeautifulSoup.BeautifulStoneSoup( oFile.read() )
profile.run('reader()', 'profile.out')
This creates a file called profile.out that can be analyzed with pstats.Stats:
>>> from pstats import Stats
>>> s=Stats('profile.out')
>>> s.sort_stats('cum')
<pstats.Stats instance at 0x009BF918>
>>> s.print_stats()
Here is an excerpt from the output. It doesn't work very well in email unfortunately. The most notable thing is the staggering number of times some functions are called. The first column (ncalls) is the total number of calls of a function. The second column (tottime) is the total time spent in the function, not counting the time spent in lower-level functions.
If you look at the list, for a while the functions are being called 777 times. This is probably the number of start tags in the document. But when you get to recursiveChildGenerator(), all of a sudden it is called 898655 times, over 1000 times for each call to _fetch()! This is a staggering number of calls, it is called 8 times for every character in the file!
I gave up trying to understand why this is happening, I would need to spend more time understanding the code...
Kent
Fri Sep 16 17:12:22 2005 profile.out
9095825 function calls (9095048 primitive calls) in 80.402 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 80.402 80.402 profile:0(reader())
1 0.000 0.000 80.398 80.398 <string>:1(?)
1 0.002 0.002 80.397 80.397 F:\Tutor\bsTest\reader.py:5(reader)
1 0.000 0.000 80.395 80.395 C:\Python24\lib\site-packages\BeautifulSoup.py:633(__init__)
1 0.000 0.000 80.395 80.395 C:\Python24\lib\site-packages\BeautifulSoup.py:687(feed)
1 0.000 0.000 80.381 80.381 C:\Python24\lib\sgmllib.py:86(feed)
1 0.041 0.041 80.381 80.381 C:\Python24\lib\sgmllib.py:107(goahead)
777 0.123 0.000 80.093 0.103 C:\Python24\lib\sgmllib.py:229(parse_starttag)
2331/1554 0.057 0.000 79.989 0.051 :0(getattr)
777 0.013 0.000 79.844 0.103 C:\Python24\lib\sgmllib.py:304(finish_starttag)
777 0.024 0.000 79.763 0.103 C:\Python24\lib\site-packages\BeautifulSoup.py:817(unknown_starttag)
777 0.079 0.000 79.646 0.103 C:\Python24\lib\site-packages\BeautifulSoup.py:769(_smartPop)
3108 0.051 0.000 79.575 0.026 C:\Python24\lib\site-packages\BeautifulSoup.py:676(__getattr__)
777 0.019 0.000 79.496 0.102 C:\Python24\lib\site-packages\BeautifulSoup.py:348(__getattr__)
777 0.010 0.000 79.467 0.102 C:\Python24\lib\site-packages\BeautifulSoup.py:467(first)
777 0.014 0.000 79.456 0.102 C:\Python24\lib\site-packages\BeautifulSoup.py:477(fetch)
777 10.923 0.014 79.443 0.102 C:\Python24\lib\site-packages\BeautifulSoup.py:168(_fetch)
898655 21.556 0.000 38.801 0.000 C:\Python24\lib\site-packages\BeautifulSoup.py:525(recursiveChildGenerator)
301476 10.316 0.000 23.791 0.000 C:\Python24\lib\site-packages\BeautifulSoup.py:233(_matches)
2998523 14.816 0.000 14.816 0.000 :0(isinstance)
602953 4.356 0.000 6.985 0.000 C:\Python24\lib\site-packages\BeautifulSoup.py:541(isList)
1206683 5.852 0.000 5.852 0.000 :0(hasattr)
905237 3.231 0.000 3.231 0.000 :0(len)
601431 3.022 0.000 3.022 0.000 :0(range)
599875 2.201 0.000 2.201 0.000 :0(pop)
605655 2.153 0.000 2.153 0.000 :0(append)
301476 1.080 0.000 1.080 0.000 :0(callable)
More information about the Tutor
mailing list