HTML Parser chokes on WordHTML...

Steven Taschuk staschuk at telusplanet.net
Sat May 3 00:05:46 EDT 2003


A couple oversights in my previous comments:

Quoth I:
> Quoth Harald Massa:
  [...]
> Strictly speaking, anything inside <!-- --> is a comment and the
> parser should ignore it.

This is mostly true in XML, excepting <![CDATA[ ... ]]> sections
(which are, however, rarely used in HTML).

And as Andrew Clover pointed out, in SGML it is possible for
elements to be implicitly CDATA, by virtue of a declaration to
that effect in the DTD.

  [...]
> > again, <![if !suportLists]> does not look great, but should be legal
> > HTMl - should'nt it? 
> 
> No: <![if ...]> isn't legal HTML, so HTMLParser quite properly
> rejects it.  The <! is legal only for starting a DOCTYPE
> declaration (and inside a DTD, which is not usually present in an
> HTML document).

... and for starting <![CDATA[ ... ]]>, and, in SGML, <![INCLUDE[
... ]]> and <![EXCLUDE[ ... ]]>, and perhaps other things I've
forgotten about.  But again, these are rarely used in HTML.

-- 
Steven Taschuk              Aral: "Confusion to the enemy, boy."
staschuk at telusplanet.net    Mark: "Turn-about is fair play, sir."
                             -- _Mirror Dance_, Lois McMaster Bujold





More information about the Python-list mailing list