How to parse XHTML with xml.parsers.xmlproc?

Paavo Hartikainen pahartik at sci.fi
Mon Sep 17 06:59:12 EDT 2001


Alex Martelli writes:

> "Paavo Hartikainen" <pahartik at sci.fi> wrote in message
> news:873d5mo5cz.fsf at zazu.vip.fi...

>> I am sure it is just because it does not know XHTML DTD.  What does
>> it try to parse against anyway if no DTD is defined?  Is there some
>> kind of default XML DTD?

> It doesn't matter.  You have a <meta> tag that is not closed.
> NO DTD will ever make that document valid XML, period.

> Add a </meta> right after the <meta>, or change the closing >
> of the meta itself into />.

All meta in that document tags are self-closing ones.  I am filtering
with HTML-Tidy before and after processing within Python.

...

And now after further testing it turns out that in my test cases, I
actually injected two non-self-closing meta tags into the head of the
document before using XML parser on it.  They were hidden by HTML-Tidy
filtering the document as soon as it got written to target file.

>> Maybe this whole XML thing is just too complicated for me and I
>> should find something else to play with...

> This part is not too hard: *well-formed* comes before *valid*.

This is a good point.  Maybe documentation I have seen just makes it
look somewhat cryptic.  So many strange terms and acronyms.  Until now
I was under the impression that well-formedness is also defined
somehow in DTD.

-- 
 "pienena   /  Paavo "Rainbow Rat" Hartikainen
  minusta  /  E-mail: pahartik at sci.fi
  tulee   /  URL: http://www.sci.fi/~pahartik/
  rotta" /  EFnet: pahartik at #Atari and #LionKing



More information about the Python-list mailing list