How to parse XHTML with xml.parsers.xmlproc?
pahartik at sci.fi
Tue Sep 18 05:21:13 CEST 2001
Paavo Hartikainen writes:
> Now I seem to still have a problem with validating to DTD, while
> parsing alone without validating works already.
Validation part was broken in older version (0.5.1-5) of python-xml
package. It took some Debian knowledge and pushing around to build
newer python-xml (0.6.6-2) package (from Debian woody source package)
on Debian potato system. Doing the same thing with python-distutils
package, which does not seem to exist in potato at all, did not
require any tuning. After upgrading python-xml package from potato
version to woody one, problem disappeared.
> This is what fails in my test case:
So this time there was nothing wrong with my xmltest.py code.
> Complete, stand-alone simplified test case is available at
> <URL:http://www.sci.fi/~pahartik/files/xmltest.tar.gz> for now,
> including Python code, XHTML file, DTD catalog and related DTD
I had to fix DTD/catalog file since it wants to also have DTD files
included from main DTD file listed or they will not be found.
However, when I point to catalog file like this:
cat = catalog.xmlproc_catalog("DTD/catalog", catalog.CatParserFactory())
Validator reaches the main DTD file (xhtml1-strict.dtd) just fine, but
DTD files included from within that file get searched from "DTD/DTD/".
Is this expected behaviour? This is what my "DTD/catalog" looks like:
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" xhtml1-strict.dtd
PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" xhtml-lat1.ent
PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" xhtml-symbol.ent
PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" xhtml-special.ent
I would think it should try to read "DTD/xhtml-lat1.ent" instead of
"DTD/DTD/xhtml-lat1.ent" and so on...
My quick hack to get over it was to create symbolic link like this:
ln -s ./ DTD/DTD
I will update my test case archive and leave it to my site, maybe it
could help someone else to get started with python-xml.
"pienena / Paavo "Rainbow Rat" Hartikainen
minusta / E-mail: pahartik at sci.fi
tulee / URL: http://www.sci.fi/~pahartik/
rotta" / EFnet: pahartik at #Atari and #LionKing
More information about the Python-list