XML: Doctype http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Ralf Schmitt
ralf at brainbot.com
Tue Jun 15 10:16:38 EDT 2004
"Thomas Guettler" <guettli at thomas-guettler.de> writes:
> Hi,
>
> I want to parse XHTML.
>
> The doctype is http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd.
>
> When I try to parse it with SAX. The parser
> tries to connect via httplib to www.w2.org.
>
> I downloaded all necessary DTDs and changed the
> link to "xhtml1-transitional.dtd", which is now read
> from the local filesystem.
>
> One thing I don't like: I need to change the xml file
> by hand (remove http://www.w3c.org....). Is there
> a way to tell the parser, that it should look into
> the local filesystem before trying to download them?
Here's some example code I posted a few days ago, which does exaxtly
what you want.
--------------------------------
from xml.sax import saxutils, handler, make_parser, xmlreader
class Handler(handler.ContentHandler):
def resolveEntity(self, publicid, systemid):
print "RESOLVE:", publicid, systemid
return open(systemid[systemid.rfind('/')+1:], "rb")
def characters(self, s):
print repr(s)
doc = r'''<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<HTML>
ä
</HTML>
'''
h = Handler()
parser = make_parser()
parser.setContentHandler(h)
parser.setEntityResolver(h)
parser.feed(doc)
parser.close()
-------
Output:
RESOLVE: -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
RESOLVE: -//W3C//ENTITIES Latin 1 for XHTML//EN xhtml-lat1.ent
RESOLVE: -//W3C//ENTITIES Symbols for XHTML//EN xhtml-symbol.ent
RESOLVE: -//W3C//ENTITIES Special for XHTML//EN xhtml-special.ent
u'\n'
u'\xa0'
u'\xe4'
u'\n'
>
> Regards,
> Thomas
--
brainbot technologies ag
boppstrasse 64 . 55118 mainz . germany
fon +49 6131 211639-1 . fax +49 6131 211639-2
http://brainbot.com/ mailto:ralf at brainbot.com
More information about the Python-list
mailing list