python proxy checker ,change to threaded version
r0g at
Mon Dec 7 03:09:03 EST 2009
elca wrote:
> Hello ALL,
> i have some python proxy checker .
> and to speed up check, i was decided change to mutlthreaded version,
> and thread module is first for me, i was tried several times to convert to
> thread version
> and look for many info, but it not so much easy for novice python programmar
> .
> if anyone can help me really much appreciate!!
> thanks in advance!
> import urllib2, socket
> socket.setdefaulttimeout(180)
> # read the list of proxy IPs in proxyList
> proxyList = open('listproxy.txt').read()
> def is_bad_proxy(pip):
> try:
> proxy_handler = urllib2.ProxyHandler({'http': pip})
> opener = urllib2.build_opener(proxy_handler)
> opener.addheaders = [('User-agent', 'Mozilla/5.0')]
> urllib2.install_opener(opener)
> req=urllib2.Request('') # <---check whether
> proxy alive
> sock=urllib2.urlopen(req)
> except urllib2.HTTPError, e:
> print 'Error code: ', e.code
> return e.code
> except Exception, detail:
> print "ERROR:", detail
> return 1
> return 0
> for item in proxyList:
> if is_bad_proxy(item):
> print "Bad Proxy", item
> else:
> print item, "is working"
The trick to threads is to create a subclass of threading.Thread, define
the 'run' function and call the 'start()' method. I find threading quite
generally useful so I created this simple generic function for running
things in threads...
def run_in_thread( func, func_args=[], callback=None, callback_args=[] ):
import threading
class MyThread ( threading.Thread ):
def run ( self ):
# Call function
if function_args:
result = function(*function_args)
result = function()
# Call callback
if callback:
if callback_args:
callback(result, *callback_args)
You need to pass it a test function (+args) and, if you want to get a
result back from each thread you also need to provide a callback
function (+args). The first parameter of the callback function receives
the result of the test function so your callback would loo something
like this...
def cb( result, item ):
if result:
print "Bad Proxy", item
print item, "is working"
And your calling loop would be something like this...
for item in proxyList:
run_in_thread( is_bad_proxy, func_args=[ item ], cb, callback_args=[
item ] )
Also, you might want to limit the number of concurrent threads so as not
to overload your system, one quick and dirty way to do this is...
import time
if threading.activeCount() > 9: time.sleep(1)
Note, this is a far from exact method but it works well enough for one
off scripting use
Hope this helps.
Suggestions from hardcore pythonistas on how to my make run_in_thread
function more elegant are quite welcome also :)
Roger Heathcote
More information about the Python-list
mailing list