Field in SGML file

Peter Flynn peter at silmaril.ie
Fri Nov 1 18:07:35 EST 2002


Eric Brunel wrote:
> Ilya Shambat wrote:
> 
>>I need to be able to read a field in an SGML file with Python. Do you
>>know how to do that?
> 
> 
> What are you calling a "field"? Is it an element? An element's attribute? 
> And what is your SGML file like? Is it a simple, Xml-like file, or does it 
> uses SGML's funky features like shortref's or usemap's?
> 
> If your file is simple enough, you may find it simpler to write your own 
> "micro-parser", doing exactly what you want to do. Parsing a full-featured 
> SGML file is *really* complicated but if the file's simple enough, writing 
> your own parser is probably the best solution.
> 
> If you file is too complicated, you'd better use an external parser and 
> analyse the parser's results with Python. I personally used nsgmls quite a 
> lot (http://www.jclark.com/sp/): it's fast, easy to use (at least if you 
> know what SGML is about...) and its results is easy to analyse.

I'd recommend using nsgmls no matter what the file size or complexity.
Get it from http://www.jclark.com/sp/index.htm

I don't know how you call external binaries from within Python, but you
need to issue the command string

	nsgmls -cCATALOG [sgml-dec] filename

where CATALOG is the name of a catalog file used to resolve entity
references and [sgml-dec] is an optional SGML Declaration filename.
You need to ask the people supplying the SGML file if you need these
or not (and if you do, they must supply you with them).

This will produce ESIS output (http://xml.coverpages.org/WG8-n931a.html)
which is very easily usable in a scripting language, and from which you
can extract the data you need.

///Peter




More information about the Python-list mailing list