Parsing SGML document in Python program
eric.brunel at pragmadev.com
Wed Oct 16 17:15:51 CEST 2002
Ilya Shambat wrote:
> Hello all,
> I need to be able to parse an SGML document in a Python program. I
> need to know the syntax on how to do that. The project involves using
> a DTD, passed as a command line argument, to read all the SGML files
> that exist in a directory. Does anybody know how this is done?
There is a sgmllib module in the standard library, but it's not a full SGML
parser. SGML has a lot of funky possibilities that are quite hard to parse
and that were apparently not considered in the sgmllib module. I never used
it, but as far as I can see from the docs, it doesn't use a DTD, so it's
really not a SGML parser (XML seems to live well without a DTD, but doing
so in SGML may be considered as heretic ;-). It may however be usable if
your documents are really simple.
I had once to do that and I couldn't find a parser directly usable in
Python. Maybe it has changed (just check the Vaults of Parnassus for it).
The solution I used at the time was to rely on an external parser that gave
easy to parse results. The one I used was nsgmls, part of James Clark's SP
project. You may find it @ http://www.jclark.com/sp/ ; just test it and
you'll see that its output is really easy to get back into Python.
- Eric Brunel <eric.brunel at pragmadev.com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com
More information about the Python-list