[Chicago] threads and xmlrpc?

Tim Gebhardt tim at gebhardtcomputing.com
Fri Jan 30 21:13:59 CET 2009


On Fri, Jan 30, 2009 at 1:19 PM, Lukasz Szybalski <szybalski at gmail.com>wrote:
>
> I see.
> Looking at this example on threads:
> http://www.ibm.com/developerworks/aix/library/au-threadingpython/index.html
>
> this is implemented withing the thread.each thread calls....:
> url = urllib2.urlopen(host)
>
> Looking at the xmlrpc examples the way I connect is:
> http://docs.python.org/library/xmlrpclib.html
> pypi=xmlrpclib.ServerProxy(XML_RPC_SERVER)
>
> My question is: is urlopen(xyz) similar to serverproxy(xyz) ? If yes
> then I can use it within the thread and issue that in each thread. But
> if its not and this will make at the end 5000 active connections then
> that won't work .
>
> How would I know if serverproxy returns a instance of the class vs
> request object?
>
> Thanks a lot,
> Lucas
>

I looked back on your original email in your original problem and have you
tried doing what you're doing without threads?  Is there some particular
reason why you need 8 threads?  Have you tried with just 1, then 2, then 3,
etc.?

I'm asking because if PyPI has Keep-Alive enabled on their webserver you'll
almost certainly get the peak performance with a single thread hitting their
single endpoint.  The error you indicated (connection reset by peer)
indicates that the connection was purposely (either deliberately or
inadvertently) terminated.

The webserver or a proxy or a firewall on their end may be set up to reject
too many connection attempts in a certain time window.  Or they could have
Keep-Alive enabled and the connection is open for too long so they're
severing it.

Like I mentioned before in my last email I used to screen scrape a lot of
stuff with some Python scripts.  I used to screen scrape financial news
sites in hopes of one day turning that into an automated trading system.  I
very aggressively scraped those sites and I would get a lot of errors as
well, stuff like connection reset by peer.  Eventually I tuned down my
aggressiveness and the errors pretty much went away.

If you're not paying for the information or don't have an SLA with PyPI then
there's no obligation to serve you that information in a timely or reliable
manner.  In that case you may want to try delaying the requests and only
using a single thread.  See if the errors go away.  The error you're getting
doesn't indicate that there's something wrong on your end, it indicates
something on PyPI's end or your ISP transmission of your data.

-Tim Gebhardt
tim at gebhardtcomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20090130/76c911f6/attachment.htm>


More information about the Chicago mailing list