sgmllib & parsing problem

Harvest T. Moon h4rv3st at web.de
Thu Aug 30 09:05:21 EDT 2001


i'm writing a client for the JammerIM system in Python (for BeOS, if anyone
cares) and the system is based on XML-pieces.
i have subclassed SGMLParser from sgmllib and everything is working fine,
but _one_ thing screws up the whole parser:
mostly the pieces come in ordinary tags-structure like

<message from="..">
    <body>Test</body>
    <blabla>asd</blabla>
        <nothing>important</nothing>
        <really>stupid</really>
</message>

that works fine and i get myself a nice structured MsgObject with
.SubElements() etc, all right.
but somtimes some tags don't deliver content only attributes so they come as

<strange id="0815" thread="123" />

which is quite clear to me that there is no closing tag, but SGMLParser
doesn't see the ending '/' and assumes it's an ordinary start-tag, so it
never gets closed and the whole object is down the drain as "strange"
survives in the stack forever.

how can i make SGMLParser to see that '/' or handle the whole tag with '/'
at the end as a standalone-tag?

regards,
Harvest T. Moon





More information about the Python-list mailing list