The 2 connections per host is defined in the HTTP RFC:<div><a href="http://www.faqs.org/rfcs/rfc2068.html">http://www.faqs.org/rfcs/rfc2068.html</a><br></div><div><br></div><div>See section 8.1.4.</div><div><br></div><div>
The RFC says "should limit 2 connections per server" and a lot of http client libraries obey this. I know for a fact that the .NET web client class does. I don't know what python does for sure so I'd hate to comment.</div>
<div><br></div><div>This is one of the reasons why a lot of HTTP client libraries implement the "request" object instances as a factory rather than just instantiate the class directly:</div><div><br></div><div><div>
>>> import urllib2</div><div>>>> f = urllib2.urlopen('<a href="http://www.python.org/">http://www.python.org/</a>') #Returns a Request object</div><div>>>> print f.read(100)</div><div><br>
</div><div>Rather than:</div><div>>>> import urllib2</div><div>>>> r = urllib2.Request("<a href="http://www.python.org">http://www.python.org</a>")</div><div>>>> print r.open().read(100)</div>
<div><br></div><div>The Java and .NET HTTP client libraries I've used all implement it in a similar way because it's easier to set up stuff like connection limits and keep-alive.</div><div><br></div><div>In any case, from my python web scraping days with httplib2, I found that I would reduce the number of timeouts and request errors if I waited for 1 second after every request to a particular host.</div>
<div><br></div><div>-Tim Gebhardt</div><div><a href="mailto:tim@gebhardtcomputing.com">tim@gebhardtcomputing.com</a></div></div><div><br></div><div><br><div class="gmail_quote">On Thu, Jan 29, 2009 at 10:46 PM, Lukasz Szybalski <span dir="ltr"><<a href="mailto:szybalski@gmail.com">szybalski@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">On Thu, Jan 29, 2009 at 9:02 AM, Tim Gebhardt <<a href="mailto:tim@gebhardtcomputing.com">tim@gebhardtcomputing.com</a>> wrote:<br>
> If xmlrpc obeys the HTTP standard connection limit, you're limited to 2<br>
> concurrent connections per host.<br>
<br>
Could you point me to some docs on this. What I am comparing it to is<br>
an apache server which can handle 100+ requests per second with no<br>
problems. With Project Gutenberg we are talking about TB of data. With<br>
Pypi we are talking about <kb per request and maybe about ~3kb per<br>
second. So I think I should be able to achieve bandwidth of about<br>
20kb/s minimum without anybody noticing any performance hits.<br>
<br>
I've emailed pypi, but if there are other things to consider, or you<br>
might know why such a low throughput on xmlrpc I would be interested<br>
to know more.<br>
<br>
Thanks,<br>
Lucas<br></blockquote></div><br></div>