[Tutor] man pages parsing (still)
Kent Johnson
kent37 at tds.net
Tue Sep 12 00:45:25 CEST 2006
Tiago Saboga wrote:
> Ok, the guilty line (279) has a "©" that was probably defined in the dtd,
> but as it doesn't know what is the right dtd... But wait... How does python
> read the dtd? It fetches it from the net? I tried it (disconnected) and the
> answer is yes, it fetches it from the net. So that's the problem!
>
> But how do I avoid it? I'll search. But if you can spare me some time, you'll
> make me a little happier.
>
> [1] - The line is as follows:
> <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
I'm just guessing, but I think if you find the right combination of
handlers and feature settings you can at least make it just pass through
the external entities without looking up the DTDs.
Take a look at these pages for some hints:
http://www.cafeconleche.org/books/xmljava/chapters/ch07s02.html#d0e10350
http://www.cafeconleche.org/books/xmljava/chapters/ch06s11.html
They are talking about Java but the SAX interface is a cross-language
standard so the names and semantics should be the same.
Kent
More information about the Tutor
mailing list