converting text and spans to an ElementTree
attn.steven.kuo at gmail.com
attn.steven.kuo at gmail.com
Tue May 22 11:27:39 EDT 2007
On May 21, 11:02 pm, Steven Bethard <steven.beth... at gmail.com> wrote:
> I have some text and a list of Element objects and their offsets, e.g.::
>
> >>> text = 'aaa aaa aaabbb bbbaaa'
> >>> spans = [
> ... (etree.Element('a'), 0, 21),
> ... (etree.Element('b'), 11, 18),
> ... (etree.Element('c'), 18, 18),
> ... ]
>
> I'd like to produce the corresponding ElementTree. So I want to write a
> get_tree() function that works like::
>
> >>> tree = get_tree(text, spans)
> >>> etree.tostring(tree)
> '<a>aaa aaa aaa<b>bbb bbb<c /></b>aaa</a>'
>
> Perhaps I just need some more sleep, but I can't see an obvious way to
> do this. Any suggestions?
>
It seems you're looking to construct an Interval Tree:
http://en.wikipedia.org/wiki/Interval_tree
--
Hope this helps,
Steven
More information about the Python-list
mailing list