XML: Doctype http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Ralf Schmitt ralf at brainbot.com
Tue Jun 15 16:16:38 CEST 2004

"Thomas Guettler" <guettli at thomas-guettler.de> writes:

> Hi,
> I want to parse XHTML.
> The doctype is http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd.
> When I try to parse it with SAX. The parser
> tries to connect via httplib to www.w2.org.
> I downloaded all necessary DTDs and changed the
> link to "xhtml1-transitional.dtd", which is now read
> from the local filesystem.
> One thing I don't like: I need to change the xml file
> by hand (remove http://www.w3c.org....). Is there
> a way to tell the parser, that it should look into
> the local filesystem before trying to download them?

Here's some example code I posted a few days ago, which does exaxtly
what you want. 

from xml.sax import saxutils, handler, make_parser, xmlreader
class Handler(handler.ContentHandler):
    def resolveEntity(self, publicid, systemid):
        print "RESOLVE:", publicid, systemid
        return open(systemid[systemid.rfind('/')+1:], "rb")
    def characters(self, s):
        print repr(s)
doc = r'''<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

h = Handler()
parser = make_parser()


RESOLVE: -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
RESOLVE: -//W3C//ENTITIES Latin 1 for XHTML//EN xhtml-lat1.ent
RESOLVE: -//W3C//ENTITIES Symbols for XHTML//EN xhtml-symbol.ent
RESOLVE: -//W3C//ENTITIES Special for XHTML//EN xhtml-special.ent

> Regards,
>  Thomas

brainbot technologies ag
boppstrasse 64 . 55118 mainz . germany
fon +49 6131 211639-1 . fax +49 6131 211639-2
http://brainbot.com/  mailto:ralf at brainbot.com

More information about the Python-list mailing list