[wwwsearch-general] (no subject)
John J Lee
jjl at pobox.com
Sun Aug 31 07:05:05 EDT 2008
On Fri, 29 Aug 2008, bruce wrote:
> Hi john.
>
> Thanks for your reply. I tried your suggestion of using RobustFactory, and
> still get a badly maligned html back!!! The html is listed below. I would
That's expected -- this affects the parsing of the HTML. It does not
modify the HTML.
> have thought that the mech process, would have interpreted the
> "http-equiv="refresh" Unfortunately, mechanize apparently isn't able to
> handle a "<meta http-equiv="refresh" url="/foo/..."> when it's inside the
> <body> of the html...
Yes, only the head element is read (albeit with a slightly fuzzy
definition of "head element").
In a theoretical future unstable branch, that might change, but currently
mechanize doesn't try all that hard to work well with bad HTML.
Currently, you have to work around this kind of issue. You can perform
the refresh manually, or modify the HTML and call .set_response(), or
replace the HTTPEquivProcessor with your own (you could use
HTTPEquivProcessor itself -- you can pass a parser factory function to its
constructor).
John
More information about the Python-list
mailing list