HTMLParser tag contents

Grant Griffin g2 at
Wed May 10 16:31:29 EDT 2000

Paul Prescod wrote:
> Grant Griffin wrote:
> >
> > Therefore, for Python 1.6, I would like to recommend that SGMLParser be
> > modified to provide a method called "get_tag_contents" (or whatever)
> > which can be called at the point of any "end_xxx" to convey the tag's
> > contents (which would include not only text but contained tags and their
> > text.)  (The reason SGMLParser has to be modified is that its index into
> > its "rawdata" array is local to its parser routine.)
> You could be parsing a 100MB HTML/SGML document 1 K at a time. I don't
> think you want SGMLLIB to keep around the entire 100MB "just in case"
> you ask for the contents of the BODY tag.

I guess some sort of size limit could be put in it to cover all but
extreme cases.

Then again, maybe you're right: maybe the solution I had posted was
best. ;-)

   -exception-ly y'rs,

p.s.  BTW, how long do you 'spose it would take for somebody to _read_ a
web page containing 100MB HTML?!  (Lemme see...carry the six...that
would take...waitaminute...about... ;-)

Grant R. Griffin                                       g2 at
Publisher of dspGuru                 
Iowegian International Corporation

More information about the Python-list mailing list