How to parse XHTML with xml.parsers.xmlproc?
pahartik at sci.fi
Mon Sep 17 12:59:12 CEST 2001
Alex Martelli writes:
> "Paavo Hartikainen" <pahartik at sci.fi> wrote in message
> news:873d5mo5cz.fsf at zazu.vip.fi...
>> I am sure it is just because it does not know XHTML DTD. What does
>> it try to parse against anyway if no DTD is defined? Is there some
>> kind of default XML DTD?
> It doesn't matter. You have a <meta> tag that is not closed.
> NO DTD will ever make that document valid XML, period.
> Add a </meta> right after the <meta>, or change the closing >
> of the meta itself into />.
All meta in that document tags are self-closing ones. I am filtering
with HTML-Tidy before and after processing within Python.
And now after further testing it turns out that in my test cases, I
actually injected two non-self-closing meta tags into the head of the
document before using XML parser on it. They were hidden by HTML-Tidy
filtering the document as soon as it got written to target file.
>> Maybe this whole XML thing is just too complicated for me and I
>> should find something else to play with...
> This part is not too hard: *well-formed* comes before *valid*.
This is a good point. Maybe documentation I have seen just makes it
look somewhat cryptic. So many strange terms and acronyms. Until now
I was under the impression that well-formedness is also defined
somehow in DTD.
"pienena / Paavo "Rainbow Rat" Hartikainen
minusta / E-mail: pahartik at sci.fi
tulee / URL: http://www.sci.fi/~pahartik/
rotta" / EFnet: pahartik at #Atari and #LionKing
More information about the Python-list