Re: [lxml-dev] c14n, pretty printing and diffing

For those interested by the iterparse method, the following is much better: sourceTree = ElementTree.iterparse(open(inputDir+'/'+file, 'r'), events=("start", "end")) for event, elem in sourceTree: if event == "start": i = "\n" + depth*" " depth += 1 outputFile.write('%s<%s' % (i,elem.tag)) if len(elem.items()): attrs = elem.items() attrs.sort() outputFile.write(' ') outputFile.write(' '.join(['%s="%s"' % (a[0],a[1]) for a in attrs if a[0] != 'size'])) outputFile.write('>') if elem.text and elem.text.strip(): outputFile.write(elem.text.strip('\n').encode('utf-8')) if event == "end": outputFile.write('%s</%s>' % (i,elem.tag)) if elem.tail and elem.tail.strip(): outputFile.write(elem.tail.strip('\n').encode('utf-8')) depth -= 1 elem.clear() because when event == 'start' then len(elem) is always 0, and I don't how to guess if the element will have some content in order to produce en empty tag (or not). Therefore,the above code always produce an element end tag even when there is no content.
"Olivier Collioud" <Olivier.Collioud@wipo.int> 12/02/08 7:26 am
Thanks Stephan.
I prefer visual diffing : the ones provided by Eclipse, TkDiff or WinMerge. I did not fin any doc or usage example of lxml.usedoctest, could you please give some pointer ? Let me share my simple (because I do not use any namespace, PI, comment...) solution based on iterparse: depth = 0 sourceTree = ElementTree.iterparse(open(inputFile, 'r'), events=("start", "end")) for event, elem in sourceTree: if event == "start": i = "\n" + depth*" " depth += 1 outputFile.write('%s<%s' % (i,elem.tag)) if len(elem.items()): attrs = elem.items() attrs.sort() outputFile.write(' ') outputFile.write(' '.join(['%s="%s"' % (a[0],a[1]) for a in attrs if a[0] != 'size'])) if elem.text and elem.text.strip(): outputFile.write('>%s' % elem.text.strip('\n').encode('utf-8')) elif len(elem): outputFile.write('>') if event == "end": if (elem.text and elem.text.strip()) or len(elem): outputFile.write('%s</%s>' % (i,elem.tag)) else: outputFile.write('/>') if elem.tail and elem.tail.strip(): outputFile.write(elem.tail.strip('\n').encode('utf-8')) depth -= 1 elem.clear() Olivier.
Stefan Behnel <stefan_ml@behnel.de> 11/02/08 7:56 pm >>> Hi,
Olivier Collioud wrote:
I would like to use my favourite text diffing tool to compare XML files.
Which is not lxml.html.diff, I assume? (I'm not sure how HTML specific that is, BTW). Also, for doctests, there is lxml.usedoctest that you can import (the lxml web pages use it for doctests).
Is their a way to produce a pretty printed canonical version of my XML files using lxml ?
Not using the c14n interface (libxml2 doesn't support it). Serialising by hand is not too hard, though. You can look at ElementTree._write() for an example: http://svn.effbot.org/public/elementtree/elementtree/ElementTree.py Stefan _______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. _______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev ------ World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using.

"Olivier Collioud" wrote: I did not fin any doc or usage example of lxml.usedoctest, could you please give some pointer ?
As I said, you can just import it, like all the doctests on the webpage do. http://codespeak.net/lxml/lxml2.html#new-modules Here is an example: http://codespeak.net/lxml/objectify.html Stefan
participants (2)
-
Olivier Collioud
-
Stefan Behnel