[wwwsearch-general] (no subject)

John J Lee jjl at pobox.com
Sun Aug 31 13:05:05 CEST 2008


On Fri, 29 Aug 2008, bruce wrote:

> Hi john.
>
> Thanks for your reply. I tried your suggestion of using RobustFactory, and
> still get a badly maligned html back!!! The html is listed below. I would

That's expected -- this affects the parsing of the HTML.  It does not 
modify the HTML.


> have thought that the mech process, would have interpreted the
> "http-equiv="refresh" Unfortunately, mechanize apparently isn't able to
> handle a "<meta http-equiv="refresh" url="/foo/..."> when it's inside the
> <body> of the html...

Yes, only the head element is read (albeit with a slightly fuzzy 
definition of "head element").

In a theoretical future unstable branch, that might change, but currently 
mechanize doesn't try all that hard to work well with bad HTML.

Currently, you have to work around this kind of issue.  You can perform 
the refresh manually, or modify the HTML and call .set_response(), or 
replace the HTTPEquivProcessor with your own (you could use 
HTTPEquivProcessor itself -- you can pass a parser factory function to its 
constructor).


John




More information about the Python-list mailing list