[Tutor] man pages parsing (still)

Kent Johnson kent37 at tds.net
Tue Sep 12 00:45:25 CEST 2006


Tiago Saboga wrote:
> Ok, the guilty line (279) has a "©" that was probably defined in the dtd, 
> but as it doesn't know what is the right dtd... But wait... How does python 
> read the dtd? It fetches it from the net? I tried it (disconnected) and the 
> answer is yes, it fetches it from the net. So that's the problem!
> 
> But how do I avoid it? I'll search. But if you can spare me some time, you'll 
> make me a little happier. 
> 
> [1] - The line is as follows:
> <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
>                    "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">

I'm just guessing, but I think if you find the right combination of 
handlers and feature settings you can at least make it just pass through 
the external entities without looking up the DTDs.

Take a look at these pages for some hints:
http://www.cafeconleche.org/books/xmljava/chapters/ch07s02.html#d0e10350
http://www.cafeconleche.org/books/xmljava/chapters/ch06s11.html

They are talking about Java but the SAX interface is a cross-language 
standard so the names and semantics should be the same.

Kent



More information about the Tutor mailing list