SGML to Python memory tree

François Pinard pinard at
Wed May 17 14:03:21 CEST 2000

Hi, gang.  Once started on sharing little pieces of code :-).

For the Translation Project, I have some Python code that reads `nsgmls'
output into a memory tree.  It does not process attributes, as I did
not have any in my little application.  This code is surprisingly short,
given what it does.  (It had to work for Python 1.5.1, that's why it works
around the missing `LIST.pop()').

def _(text):
    return text

def read_sgml_file(name):
    stack = []
    current = []
    # Avoid docbk30, which raises some unanalysed interference.
    for line in os.popen('SGML_CATALOG_FILES= nsgmls %s' % name).readlines():
        if line[0] == '(':
            current = [string.lower(line[1:-1])]
        if line[0] == ')':
            element = tuple(current)
            current = stack[-1]
            del stack[-1]
        if line[0] == '-':
            line = line[1:-1]
            line = string.replace(line, '\\n', '\n')
            line = string.replace(line, '\\011', '\t')
            line = string.rstrip(line)
        if line[0] == 'C':
            return current[0]
    sys.stderr.write(_("SGML in `%s' is not conformant.\n") % name)

François Pinard

More information about the Python-list mailing list