data:image/s3,"s3://crabby-images/e7f64/e7f64feb778a2bf04c8349e3092e9b190ca59aa9" alt=""
Hi, I am new with LXML and I have a problem after parsing my element : its architecture seems to have changed if I remove (or replace) the last child. Here is my code and a screen of the console output. I have looked up for solution but I still can't figure what I am doing wrong. I would really appreciate someone's help ! (I am using LXML 3.2.1 with Python 2.6 on Windows) ############################################################################################# from lxml import etree from copy import deepcopy def Write( file, element ): f = open( file, 'w' ) f.write( etree.tostring( element, xml_declaration=True, encoding="ISO-8859-1", pretty_print = True ) ) f.close() return 1 def ReadAndReturn( file ): lookup = etree.ElementDefaultClassLookup() parser = etree.XMLParser(recover = True) parser.set_element_class_lookup( lookup ) mainTree = etree.parse( file, parser ) return mainTree # create a root element with 3 children root = etree.Element( "root" ) root.append( etree.Element( "child1" ) ) child2 = etree.SubElement( root, "child2" ) child2.text = 'CHILD2' child3 = etree.SubElement( root, "child3" ) child3.text = 'CHILD3' print "\n--- INITIAL ROOT ---" print( etree.tostring( root, pretty_print=True ) ) # remove last child root2 = deepcopy( root ) root2.remove( root2[2] ) print "--- ROOT WITHOUT LAST CHILD / BEFORE WRITING ---" print( etree.tostring( root2, pretty_print=True ) ) # write initial root (3 children) and read the file filename = 'test.tst' status = Write( filename, root ) tree = ReadAndReturn( filename ) # remove last child from the read element root3 = deepcopy( tree.getroot() ) root3.remove( root3[2] ) print "--- ROOT WITHOUT LAST CHILD / AFTER WRITING AND PARSING ---" print( etree.tostring( root3, pretty_print=True ) ) #############################################################################################
data:image/s3,"s3://crabby-images/f456d/f456d99adf8976ed9e43b908659d2775041cec72" alt=""
On 15.05.2013, at 14:42, Arnaud RIVET <arnaud.rivet@cdr.hutchinson.fr> wrote: Hi,
I am new with LXML and I have a problem after parsing my element : its architecture seems to have changed if I remove (or replace) the last child.
What output do you expect? Except from some more whitespace, inserted during tostring because of the pretty_print=true setting, the output seems to be the same. By writing with pretty_print and then parsing the result, the whitespace becomes part of the XML document, in this case child2.tail is probably '\n '. And this indents the </root> in your third case, because child3 is gone.
from lxml import etree as e
s = e.XML('<a><b/><c/></a>') e.tostring(s,pretty_print=True) '<a>\n <b/>\n <c/>\n</a>\n' t = e.XML(e.tostring(s,pretty_print=True))
s[0].tail t[0].tail '\n '
If you do not want this behaviour, do not use pretty_print. jens
data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi,
Von: Arnaud RIVET <arnaud.rivet@cdr.hutchinson.fr> I am new with LXML and I have a problem after parsing my element : its architecture seems to have changed if I remove (or replace) the last child. [...] print "--- ROOT WITHOUT LAST CHILD / AFTER WRITING AND PARSING ---" print( etree.tostring( root3, pretty_print=True ) )
print "'%s'" % root[1].tail 'None' print "'%s'" % etree.fromstring(etree.tostring(root,
############################################################################################# pretty_print=True))[1].tail ' '
See http://lxml.de/tutorial.html#elements-contain-text for text & tail attributes in the lxml docs. Not related to lxml (but for once I couldn't resist ;-)):
def Write( file, element ): f = open( file, 'w' ) f.write( etree.tostring( element, xml_declaration=True, encoding="ISO-8859-1", pretty_print = True ) ) f.close() return 1
- consider following Python style guide (PEP 8) recommendations wrt function/method names, whitespace and parentheses (unless your codebase or organization dictates otherwise). E.g. rather use write() instead of Write(),... - consider using context management for file access to make sure your file gets properly closed, whatever might fail in the code after opening the file:
with open('foo.bar', "w") as f: ... f.write('foobar') ... f <closed file 'foo.bar', mode 'w' at 0x39f7b0>
Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
participants (3)
-
Arnaud RIVET
-
Holger Joukl
-
jens quade