Wrapping some elements in a text in a new element
data:image/s3,"s3://crabby-images/d5859/d5859e89788ed2836a0a4ecbda4a1f9d4a69b9e7" alt=""
I am working on a corpus of plays where not all speeches have been properly wrapped in <sp> elements. This happens mostly when a character enters and the speaker label is implicit in the stage direction, as in the following fragment of two speeches, where the second speech is properly wrapped but the first is not: <stage><hi>Enter</hi> Simplo <hi>and</hi> Nouerindo.</stage> <p><hi>Mounsieur Proberio,</hi> you are welcome home.</p> <sp who="A18423_proberio"> <speaker>Pro.</speaker> <p>That's more then you know <hi>Segnieur Simplo,</hi> what countrey shalbe my home?</p> </sp> I have found a solution that works, but I am not sure whether it is the best. I iterate over <div> elements and look for <p> elements that are preceded by <stage>. the I do the following: 1. I establish the index position of the div child <p> 2. I create an <sp> element 3. I append the div child <p> to the <sp> element 4. I insert the <sp> element into the index position of the div child <p> The code looks like dom = etree.parse(filepath) for element in dom.iter(tei + 'div'): for child in element: if child.tag == tei + 'p' \ and child.getprevious().tag == tei + 'stage': index= element.index(child) sp = etree.Element(tei +'sp') sp.append(child) element.insert(index,sp) print(etree.tostring(dom, encoding='unicode', pretty_print=True), file=fileout) Is there a simpler way of doing this? Martin Mueller Professor emeritus of English and Classics Northwestern University
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Martin Mueller, 22.02.2014 19:38:
This can be simplified to for element in dom.iterfind('//{ns}div/{ns}p'.format(ns=tei)):
There is an .addnext() method on Elements that you can use: # replace <child/> by <sp><child/></sp> sp = etree.Element(tei +'sp') child.addnext(sp) sp.append(child) That code is worth a comment, though, so not sure "simpler" is the right word... Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Martin Mueller, 22.02.2014 19:38:
This can be simplified to for element in dom.iterfind('//{ns}div/{ns}p'.format(ns=tei)):
There is an .addnext() method on Elements that you can use: # replace <child/> by <sp><child/></sp> sp = etree.Element(tei +'sp') child.addnext(sp) sp.append(child) That code is worth a comment, though, so not sure "simpler" is the right word... Stefan
participants (2)
-
Martin Mueller
-
Stefan Behnel