sgmlParser infinite loop? How to empty and re-user parser object?
bernie at 3captus.com
Sat Mar 23 01:17:22 CET 2002
Nick Arnett wrote:
> Anyone know of circumstances under which sgmlParser will hang, presumably in
> an infinite (well, exceeding my patience, anyway) loop? I don't seem to be
> able to reliably reproduce this, but occasionally during processing of a
> large number of pages, I seem to get stuck in it. I'm doing very simple
> parsing, basically just extracting the contents of tables. I'll re-try the
> same set of documents and it'll hang in a different spot. If it weren't so
> unpredictable and infrequent, I'd dig into it with the debugger...
My record is 1300 web pages and I did not get the problem you've
> Still fairly new to Python... I'm wondering if I should be re-using a parser
> object for each document I'm processing in a loop -- and wondering if the
> fact that I'm not is causing these freezes. But if I call it without
> re-instantiating it, I get the same text parsed again... and I can't see how
> to tell it to not do that. Calling reset doesn't seem to do the trick,
> even though I seem to have the appropriate reset method that calls the
> parent reset.
I am not reusing the parser object in my script either (but you can).
You should consider post the program fragment you used.
> Thanks for tips.
> narnett at mccmedia.com
> (408) 904-7198
There are three schools of magic. One: State a tautology, then ring
the changes on its corollaries; that's philosophy. Two: Record many
facts. Try to find a pattern. Then make a wrong guess at the next
fact; that's science. Three: Be aware that you live in a malevolent
Universe controlled by Murphy's Law, sometimes offset by Brewster's
Factor; that's engineering.
-- Robert A. Heinlein
More information about the Python-list