![](https://secure.gravatar.com/avatar/5ba2b798557e5c814b218bff59661461.jpg?s=120&d=mm&r=g)
Martin Mueller wrote at 2023-6-8 04:02 +0000:
I use lxml to work with a large collection of TEI-encoded texts(66,000) that are linguistically annotated. Each token is wrapped in a <w> or <pc> element with a unique ID and various attributes. I can march through the texts at the lowest level of <w> and <pc> elements without paying any attention to the discursive structure of higher elements. I just do
for w in tree.iter(tei + �w�, tei + �pc�: if x: do this if y: do that
But now I want to create a concordance in which tokens meeting some condition are pulled out and surrounded with seven words on either side. I do this with itersiblings(), but that is a tricky operation. The next <w> token may not be a sibling but a child of a higher level sibling. Remembering that �elements are lists� you have patterns like
[a, b, c, [d, e, f] g, h, i, [k, l, m, n]
Apparently, the sequence of `w` and `pc` elements (in document order) is essential. You already have a solution to determine this sequence. If you have any element, you can determine its `parent` and therefore (recursively) the path to the element. If you have elements `e1` and `e2`, you can then determine the deepest common ancestor. Maybe, that helps you to solve your problem.