[Tutor] Strategy to read a redirecting html page

Hugo Arts hugo.yoshi at gmail.com
Wed Jun 1 01:34:26 CEST 2011


On Wed, Jun 1, 2011 at 1:00 AM, Karim <karim.liateni at free.fr> wrote:
>
> Hello,
>
> I am having issue in reading a html page which is redirected to a new page.
> I get the first warning/error message page and not the redirection one.
> Should I request a second time the same url page or Should I loop forever
> until the
> page content is the correct (by parsing it) one?
> Do you have a better strategy or perhaps some modules deal w/ that issue?
> I am using python 2.7.1 on Linux ubuntu 11.04 and the modules urllib2,
> urllib, etc...
> The webpage is secured but I registered a password manager.
>

urllib2 works at the HTTP level, so it can't catch redirects that
happen at the HTML level unfortunately. You'll have to parse the page,
look for a <meta http-equiv="refresh" tag, and fetch the URL from it.
That's a pretty simple parsing job, probably doable with regexes. But
you're free to use a proper html parser of course.

Hugo


More information about the Tutor mailing list