[Tutor] Proxy

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Wed Aug 18 19:51:38 CEST 2004



On Wed, 18 Aug 2004, Ashkan Aliabadi wrote:

> Our university doesn't have wget installed on UNIX systems, so as a good
> programming practice, I managed to write it myself. As you know, it's
> quite simple, all it needs is a call to urllib.urlretrieve(urladdress).
> I added some featuers indeed, but there are two things I'm stuck in!
> First, we have proxy servers here, and I don't know how to transfer data
> through them ?!! How can i make my prog to transfer data through proxy
> servers ?!!!!


[text cut]

Hi Ashkan,

Breathe.  *grin*


According to the documentation in:

    http://www.python.org/doc/lib/module-urllib.html

you can pass in an additional "proxy" argument to urllib.urlopen().  Can
you use urlopen() instead of urlretrieve()?  Also, if you set an
environmental variable called 'http_proxy', Python should pick that up.



> Secondly, how on Earth can I make it work in the background? I though a
> simple call to thread.start_new_thread(function,args) would do the trick
> (as it's the case when you are running the prog in the interpreter line
> by line ) but when I'm about to run it from command line, it seems as my
> program finishes before the new thread catches the file from the
> internet, if you know what I mean ;D.


> imoprt thread, urllib
> thread.start_new_thread(urllib.urlretrieve,('http://www.python.org',))


Instead of 'thread', you may want to use the 'threading' module --- it's a
higher-level interface for threads.

###
import threading, urllib
mythread = threading.Thread(target = urllib.urlretrieve,
                            args = ('http://www.python.org/',))
mythread.start()
###


You can ask the system to wait for that thread to finish, by using wait():

###
mythread.wait()
###


But are you sure that your thread isn't finishing?  The main program
thread will normally wait for all its child threads to finish, unless
those child threads are "daemons".  Odd.


urllib.urlretrieve() does write out the retrieved file to disk, but uses
some wacky name by default.  Perhaps urllib.urlretrieve() is writing the
file, but not in the place that you're expecting.  In fact, on my system,
it dumps the file out to '/tmp'!

For your purposes of implementing a wget-like utility, it might be better
to try passing the desired filename explicitely to urlretrieve() as an
additional 'filename' parameter.


Best of wishes to you.




More information about the Tutor mailing list