Re: Why does it fail cleaning GPX file?
data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi,
The reason is that lxml (sensibly) uses fully qualified tag names in Clark notation (see http://www.jclark.com/xml/xmlns.htm)
Here, your actual xml elements are e.g. "{http://www.topografix.com/GPX/1/1}cmt" (in Clark notation), not plain "cmt", since all elements without a namespace prefix due to this namespace declaration xmlns="http://www.topografix.com/GPX/1/1" belong to said namespace. So you must iter() over the proper fully-qualified names. Here's a slightly adapted version of your example code that hopefully shows what's going on and how to do that: # modified element cleaning sample xmldata = """<?xml version="1.0"?> <gpx version="1.1" creator="GPSBabel - http://www.acme.com" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <metadata> ┆<time>2022-07-21T20:29:48.309Z</time> ┆<bounds minlat="44.456597300" minlon="3.007453400" maxlat="45.803699000" maxlon="5.251047400"/> </metadata> <wpt lat="45.042569200" lon="5.040802200"> ┆<name>way/4749044</name> ┆<cmt>landuse=cemetery</cmt> ┆<desc>landuse=cemetery</desc> ┆<link href="http://osm.org/browse/way/4749044"/> </wpt> </gpx> """ import lxml.etree as et parser = et.XMLParser(remove_blank_text=True, strip_cdata=False) root = et.fromstring(xmldata, parser) # Show the fully qualified tag names (Clark-Notation) for el in root.iter(): print(el.tag) # Remove by fully-qualified Clark tag name or using a wildcard, if appropriate # (unqualified link will not get removed here since this doesn't exist - left in for # demo purposes) for el in root.iter('{http://www.topografix.com/GPX/1/1}cmt','{*}desc','link'): ┆parent = el.getparent() ┆parent.remove(el) print(et.tostring(root, pretty_print=True, encoding='unicode')) (I skipped the file reading/writing for my convenience; you might also want to look at the pathlib standard library module for you file/path name constructions, which is nice for handling such stuff, sometimes) HTH, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
participants (2)
-
Gilles
-
Holger.Joukl@LBBW.de