[newbie] Why can't find elements? What's the difference between // and .//?
data:image/s3,"s3://crabby-images/fb1c4/fb1c4548d2bd8fea23256ca435536d0faf51fc48" alt=""
Hello, A couple more newbie questions: 1. Using findall or xpath, what's the difference between "//" and ".//"? I see both in examples. https://www.w3schools.com/xml/xpath_syntax.asp 2. Why does the following script fail finding the elements I wish to remove from the tree? ********************* SCRIPT import lxml.etree as et from lxml import html import sys #Exit if no input file on command line arguments = len(sys.argv) - 1 if arguments != 1: print ("Must pass one KML file") sys.exit() inputfile = sys.argv[1] parser = et.XMLParser(remove_blank_text=True) tree = et.parse(inputfile,parser) root = tree.getroot() #=========== Remove namespace prefixes #https://stackoverflow.com/questions/60486563/remove-namespace-from-xml-with-... for elem in root.getiterator(): # For elements, replace qualified name with localname if not(type(elem) == et._Comment): elem.tag = et.QName(elem).localname # Remove attributes that are in a namespace for attr in elem.attrib: if "{" in attr: elem.attrib.pop(attr) # Remove unused namespace declarations et.cleanup_namespaces(root) #=========== Remove blocs/elements: snippet, LookAt, Style, StyleMap #https://stackoverflow.com/questions/22560862/elementtree-findall-or-operator tracks = root.findall('.//snipet|.//LookAt|.//Style|.//StyleMap') if len(tracks): print("Elements found: snippet, LookAt, Style, StyleMap") #remove elements else: print("No elements found: snippet, LookAt, Style, StyleMap") #print(et.dump(root)) Here's the input data: ********************* DATA <?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2"> <Document> <snippet>Created lun. août 30 21:02:41 2021</snippet> <LookAt> <longitude>1.151909</longitude> <latitude>44.789630</latitude> <range>322203.100706</range> </LookAt> <Style id="track_n"> <IconStyle> <scale>.5</scale> <Icon> <href>https://earth.google.com/images/kml-icons/track-directional/track-none.png</href> </Icon> </IconStyle> <LabelStyle> <scale>0</scale> </LabelStyle> </Style> <StyleMap id="track"> <Pair> <key>normal</key> <styleUrl>#track_n</styleUrl> </Pair> <Pair> <key>highlight</key> <styleUrl>#track_h</styleUrl> </Pair> </StyleMap> <Folder> <name>Tracks</name> <Folder> <name>Some track</name> <snippet/> <Placemark> <name>Path</name> <styleUrl>#lineStyle</styleUrl> <LineString> <coordinates> 0.156119,45.649872,101.25 0.156389,45.649822,100.25 </coordinates> </LineString> </Placemark> </Folder> </Folder> </Document> </kml> Thank you.
data:image/s3,"s3://crabby-images/fb1c4/fb1c4548d2bd8fea23256ca435536d0faf51fc48" alt=""
Found it: findall() simply doesn't support that syntax, while xpath() works: #BAD for el in root.findall('./snippet/*|./LookAt/*|./Style/*|./StyleMap/*'): for el in root.xpath('.//snippet/*|.//LookAt/*|.//Style/*|.//StyleMap/*'): print(el.tag, el.text) On 12/09/2021 19:14, Gilles wrote:
Hello,
A couple more newbie questions:
1. Using findall or xpath, what's the difference between "//" and ".//"? I see both in examples.
https://www.w3schools.com/xml/xpath_syntax.asp
2. Why does the following script fail finding the elements I wish to remove from the tree?
********************* SCRIPT import lxml.etree as et from lxml import html import sys
#Exit if no input file on command line arguments = len(sys.argv) - 1 if arguments != 1: print ("Must pass one KML file") sys.exit()
inputfile = sys.argv[1] parser = et.XMLParser(remove_blank_text=True) tree = et.parse(inputfile,parser) root = tree.getroot()
#=========== Remove namespace prefixes #https://stackoverflow.com/questions/60486563/remove-namespace-from-xml-with-...
for elem in root.getiterator(): # For elements, replace qualified name with localname if not(type(elem) == et._Comment): elem.tag = et.QName(elem).localname
# Remove attributes that are in a namespace for attr in elem.attrib: if "{" in attr: elem.attrib.pop(attr) # Remove unused namespace declarations et.cleanup_namespaces(root)
#=========== Remove blocs/elements: snippet, LookAt, Style, StyleMap #https://stackoverflow.com/questions/22560862/elementtree-findall-or-operator
tracks = root.findall('.//snipet|.//LookAt|.//Style|.//StyleMap') if len(tracks): print("Elements found: snippet, LookAt, Style, StyleMap") #remove elements else: print("No elements found: snippet, LookAt, Style, StyleMap")
#print(et.dump(root))
Here's the input data: ********************* DATA <?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2"> <Document> <snippet>Created lun. août 30 21:02:41 2021</snippet> <LookAt> <longitude>1.151909</longitude> <latitude>44.789630</latitude> <range>322203.100706</range> </LookAt> <Style id="track_n"> <IconStyle> <scale>.5</scale> <Icon> <href>https://earth.google.com/images/kml-icons/track-directional/track-none.png</href>
</Icon> </IconStyle> <LabelStyle> <scale>0</scale> </LabelStyle> </Style> <StyleMap id="track"> <Pair> <key>normal</key> <styleUrl>#track_n</styleUrl> </Pair> <Pair> <key>highlight</key> <styleUrl>#track_h</styleUrl> </Pair> </StyleMap> <Folder> <name>Tracks</name> <Folder> <name>Some track</name> <snippet/> <Placemark> <name>Path</name> <styleUrl>#lineStyle</styleUrl> <LineString> <coordinates> 0.156119,45.649872,101.25 0.156389,45.649822,100.25 </coordinates> </LineString> </Placemark> </Folder> </Folder> </Document> </kml>
Thank you.
_______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-leave@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: codecomplete@free.fr
data:image/s3,"s3://crabby-images/fb1c4/fb1c4548d2bd8fea23256ca435536d0faf51fc48" alt=""
On 12/09/2021 23:31, Stefan Behnel wrote:
Am 12. September 2021 19:14:40 MESZ schrieb Gilles:
tracks = root.findall('.//snipet|.//LookAt|.//Style|.//StyleMap') Just use
root.iter('snippet', 'LookAt', 'Style', 'StyleMap')
It's certainly faster, and I also find it nicer to read.
Much better. Thanks! On 13/09/2021 00:21, Terry Brown wrote:
To answer your question (1):
//foo "recursively" searches the whole document from the root for foo elements regardless of the element on which .xpath() is invoked. .//foo "recursively" searches only the descendents of the element on which .xpath() was invoked.
I don't think W3 schools explains that well, I usually prefer refs. like https://developer.mozilla.org/en-US/docs/Web/XPath <https://developer.mozilla.org/en-US/docs/Web/XPath> or https://www.w3.org/TR/xpath-31/ <https://www.w3.org/TR/xpath-31/>
Here's some code that illustrates:
from lxml import etree
xml = """ <root> <node id="a"> <value>One</value> <node id="b"> <value>Two</value> </node> </node> <node id="c"> </node> </root>"""
dom = etree.fromstring(xml) for node in dom.xpath("//node"): print(node.get("id")) print(" //", node.xpath("//value/text()")) print(".//", node.xpath(".//value/text()")) Thanks! As a newbie, ithe different ways of finding elements is a bit confusing.
participants (2)
-
Gilles
-
Stefan Behnel