[XML-SIG] elementtree and uncomplete parsing

Stefan Behnel stefan_ml at behnel.de
Sat Jun 21 07:39:17 CEST 2008


jeanmarc.chourot at free.fr wrote:
> <node>
> This text <thistag> is completely crap </thistag> because <anothertag> blabla
> </anothertag>
> </node>
> <node>
> This is another <thisnotag> node </thisnotag> with <anothertaggy> random tags
> </anothertaggy>
> </node>
> I would like to retrieve what is between the tags <node> ...</node> into
> strings, the "subelements" being considered as simple string and not processed
> by elelement tree.

You are trying to make an XML parser not parse XML, that's bound to fail.

> In other words, this could be badly formed HTML  not processed embeded into
> well formed xml tags.

If you really have something like "embedded HTML", it must be escaped in your
data to be parsable. There is no way an XML parser can return what you want
without modifying your 'data' (at least loosing whitespace etc.).

I think the easiest option (if you have it) is to talk to the idiots who sent
you the data and have them fix it.


More information about the XML-SIG mailing list