[Tutor] parsing--is this right?
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Tue, 11 Jun 2002 14:17:32 -0700 (PDT)
On Tue, 11 Jun 2002, Danny Yoo wrote:
> And this is the source of the shallowness: we're just calling str().
> But instead of directly calling str() on each piece in between, the
> trick is to apply toXML() again to each inner piece! That way, we
> guarantee that the inner lists are also transformed properly:
>
[code of toXML()]
> def toXML(structure):
> if type(structure) == type([]):
> tag = structure[0][1:] ## secondary slice removes leading '/'
> text_pieces = [str(s) for s in structure[1:]]
> return "<%(tag)s>%(text)s</%(tag)s>" % \
> { 'tag' : tag,
> 'text' : ''.join(text_pieces) }
> else:
> return str(structure)
> ###
> >>> def toXMLDeeply(structure):
> ... if type(structure) == type([]):
> ... tag = structure[0][1:] ## secondary slice removes leading '/'
> ... text_pieces = [toXML(s) for s in structure[1:]]
> ... return "<%(tag)s>%(text)s</%(tag)s>" % \
> ... { 'tag' : tag,
> ... 'text' : ''.join(text_pieces) }
> ... else:
> ... return str(structure)
> ...
> >>> print toXMLDeeply(parsed_doc)
> <footnote><i>anitalicizedword</i><i>maybeanotheritalicizedword</i>text</footnote>
> ###
>
>
> The transformation wasn't perfect, because my parsing step had wiped out
> whitespace in my parsed_doc, so that needs to be fixed. Still, it almost
> works. *grin*
But note that if you have three levels of nesting, you'll still get weird
output, because the third level won't be marked up properly. Yikes!
Ummm... I'll pretend that I didn't goof up, and say that if you understand
what's happening, you'll know how to fix this bug. *grin*