urllib[2]/redirection help

David Abrahams david.abrahams at rcn.com
Tue Feb 19 09:30:35 EST 2002


Hi,

I'm trying to solve what should be a simple problem with Python, but I'm
afraid my lack of knowledge of www protocols is making the solution elusive.
I'm trying to move a mailing list from yahoogroups to mailman, so I want to
collect an archive of past messages. A friend of mine wrote a simple script
a few months ago which would download the yahoogroups web pages for the
messages using urllib. Unfortunately, yahoogroups added periodic redirection
to pages containing advertisements, so the old script doesn't work. The
nature of the beast is that if you visit
http://groups.yahoo.com/group/boost/message/1000 in your web browser, you'll
often end up at
http://groups.yahoo.com/group/boost/auth?done=%2Fgroup%2Fboost%2Fmessage%2F1
000 instead, a page containing an advertisement. The latter page contains a
link to "/group/boost/message/1000" which always takes you to the right
place. It looks to me as though that link needs to be in the context of the
ad page in order to work properly, because I can't figure out how to make
urllib retrieve the right one. I'm sure I'm just missing something simple.

Any help appreciated,
Dave


--
+---------------------------------------------------------------+
                  David Abrahams
      C++ Booster (http://www.boost.org)               O__  ==
      Pythonista (http://www.python.org)              c/ /'_ ==
  resume: http://users.rcn.com/abrahams/resume.html  (*) \(*) ==
          email: david.abrahams at rcn.com
+---------------------------------------------------------------+





More information about the Python-list mailing list