[XML-SIG] elementtree and uncomplete parsing
Stefan Behnel
stefan_ml at behnel.de
Sat Jun 21 07:39:17 CEST 2008
Hi,
jeanmarc.chourot at free.fr wrote:
> <node>
> This text <thistag> is completely crap </thistag> because <anothertag> blabla
> </anothertag>
> </node>
> <node>
> This is another <thisnotag> node </thisnotag> with <anothertaggy> random tags
> </anothertaggy>
> </node>
>
> I would like to retrieve what is between the tags <node> ...</node> into
> strings, the "subelements" being considered as simple string and not processed
> by elelement tree.
You are trying to make an XML parser not parse XML, that's bound to fail.
> In other words, this could be badly formed HTML not processed embeded into
> well formed xml tags.
If you really have something like "embedded HTML", it must be escaped in your
data to be parsable. There is no way an XML parser can return what you want
without modifying your 'data' (at least loosing whitespace etc.).
I think the easiest option (if you have it) is to talk to the idiots who sent
you the data and have them fix it.
Stefan
More information about the XML-SIG
mailing list