[Tutor] parsing--is this right?

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Tue, 11 Jun 2002 14:17:32 -0700 (PDT)


On Tue, 11 Jun 2002, Danny Yoo wrote:

> And this is the source of the shallowness: we're just calling str().
> But instead of directly calling str() on each piece in between, the
> trick is to apply toXML() again to each inner piece!  That way, we
> guarantee that the inner lists are also transformed properly:
>

[code of toXML()]
> def toXML(structure):
>    if type(structure) == type([]):
>        tag = structure[0][1:]  ## secondary slice removes leading '/'
>        text_pieces = [str(s) for s in structure[1:]]
>        return "<%(tag)s>%(text)s</%(tag)s>" % \
>               { 'tag' : tag,
>                 'text' : ''.join(text_pieces) }
>    else:
>        return str(structure)

> ###
> >>> def toXMLDeeply(structure):
> ...     if type(structure) == type([]):
> ...         tag = structure[0][1:]  ## secondary slice removes leading '/'
> ...         text_pieces = [toXML(s) for s in structure[1:]]
> ...         return "<%(tag)s>%(text)s</%(tag)s>" % \
> ...                { 'tag' : tag,
> ...                  'text' : ''.join(text_pieces) }
> ...     else:
> ...         return str(structure)
> ...
> >>> print toXMLDeeply(parsed_doc)
> <footnote><i>anitalicizedword</i><i>maybeanotheritalicizedword</i>text</footnote>
> ###
>
>
> The transformation wasn't perfect, because my parsing step had wiped out
> whitespace in my parsed_doc, so that needs to be fixed.  Still, it almost
> works.  *grin*


But note that if you have three levels of nesting, you'll still get weird
output, because the third level won't be marked up properly.  Yikes!

Ummm... I'll pretend that I didn't goof up, and say that if you understand
what's happening, you'll know how to fix this bug.  *grin*