lxml removing tag, keeping text order

Stefan Behnel stefan_ml at behnel.de
Sat Oct 25 05:21:39 EDT 2008


Tim Arnold schrieb:
> Hi,
> Using lxml to clean up auto-generated xml to validate against a dtd; I need 
> to remove an element tag but keep the text in order. For example
> s0 = '''
> <option>
>   <optional> first text
>     <someelement>ladida</someelement>
>     <emphasis>emphasized text</emphasis>
>     middle text
>     <anotherelement/>
>     last text
>   </optional>
> </option>'''
> 
> I want to get rid of the <emphasis> tag but keep everything else as it is; 
> that is, I need this result:
> 
> <option>
>   <optional> first text
>     <someelement>ladida</someelement>
>     emphasized text
>     middle text
>     <anotherelement/>
>     last text
>   </optional>
> </option>

There's a drop_tag() method in lxml.html (lxml/html/__init__.py) that does
what you want. Just copy the code over to your code base and adapt it as needed.

Stefan



More information about the Python-list mailing list