The issue here is that lxml uses some kind of heuristic to determine if the whitespace already present in a document is meaningful and should be preserved as is, or if it can be changed during prettyprinting. When you add a node through addnext(), there is no whitespace between the previous w and the new one, and lxml will try to preserve that fact.
 
I don’t know the exact heuristic lxml uses. It might be possible to add text nodes to the .tail attribute of the previous word in order to trigger propper prettyprinting, but I am not sure what lxml would expect.
 
Another solution is to remove all the semantically unimportant whitespace while parsing the original document, modify the tree, and add new whitespace using pretty_print when serializing the final tree. You can do that by creating a custom parser with myparser = etree.XMLParser(remove_blank_text=True), and pass that to etree.parse() or etree.XML as parser=myparser.
 
Frederik
 
 
 
Gesendet: Donnerstag, 18. Juli 2019 um 03:02 Uhr
Von: "Martin Mueller" <martinmueller@northwestern.edu>
An: "lxml@lxml.de" <lxml@lxml.de>
Betreff: [lxml] a question about pretty_print

There is somethingI don’t understand about the behaviour of the pretty_print function (or is it a method??)

 

I work exclusively with linguistically annotated texts where every token is wrapped in a <w> element. And pretty_print does a nice job with it.  I often edit these files, updating, splitting, joining, or deleting particular element. If I create another element and use ‘addnext’ to insert it as a right sibling, pretty_print fails and doesn’t print it in a new line. Something like

 

            <w lemma="the" pos="d" xml:id="b2afn-048-a-0570" ana="the/d">The</w>
            <w xml:id="b2afn-048-a-0580" lemma="" pos="zz" ana="/zz"></w>

 

Becomes

 

 <w lemma="there" pos="av" xml:id="b2afn-048-a-0570" ana="the/d">There</w>
            <w xml:id="b2afn-048-a-0580" lemma="be" pos="vvb" ana="/zz" >be</w><w xml:id="b2afn-048-a-0581" lemma="n2" pos="n2" reg="ravens">rauyns</w>

 

As I write this, it occurs to me that this may have nothing with pretty_print but with what addnext does or doesn’t do.  But is there a routine that would guarantee that  newly inserted element would by default display with the same indentation as its left sibling?

 

MM

_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml