[Catalog-sig] start on static generation, and caching - apache config.

Tue Jul 10 00:16:03 CEST 2007

> Lately, it's has often taken minutes.  This has been the major problem. 
> At the best of times. well, I don't know when those are. :)
> 
> ATM, requests for http://www.python.org/pypi/zc.buildout takes about 1/3
> second. 

Ok. By "ATM", you mean July 9, 14:09 GMT?

Please take a look at

http://ximinez.python.org/munin/localdomain/localhost.localdomain-load.html

That was the most significant spike in the load today, and I surely
would like to know what was causing it.

> Requests for
> http://cheeseshop.python.org/packages/2.5/z/zc.buildout/zc.buildout-1.0.0b28-py2.5.egg
> take about 2.5 seconds.

That is a static file, not going through PyPI. It's 168kiB, so that
means you download with 67kB/s.

> Requests for http://www.python.org/pypi/ take
> about 10 seconds.

Why does that matter for setuptools? Does setuptools ever look at this
page?

> I would say that these times are too long.

Which of these precisely? Given that the actual file downloads in 2.5s,
why is it important that the access to the page referring to it is 1/3s?

>> and how long should it
>> take so that you would consider it fast enough?
> 
> IMO, it needs to be much much faster.  If we were serving pages
> staticially, we would be able to serve thousands of requests per
> second.  There's nothing about this application that would make doing
> that hard.

I looked at the load preceding your message. Counting 1000 requests
backwards from 14:09, we are at 16:07. So this system receives roughly
1000 requests per minute in its peak load, and it seems to be able to
handle them (although the performance degrades at that point).

Of these requests, 853 came from a single machine (x.y.237.218), which
appears to be an extraordinarily "big" client of PyPI. 45 requests
came from msnbot, 13 from Google, 44 requests from setuptools (from
different machines), and the rest from various web browsers and
crawlers.

Also, there is a significant difference between throughput and latency:
1000 requests per second is a throughput requirement, whereas "faster
than 0.3s" is a latency requirement. They are somewhat unrelated, see
below.

>> It's difficult to implement a system if the requirements are
>> unknown to those implementing it.
> 
> I'm sorry, I've been talking about setuptools all along.  I thought the
> use case was understood.

I understand the use case, I just don't understand the performance
requirements resulting out of it. If it's an automated build, why do
you care if the page download completes in 0.3s or in 0.01s (it won't
be much faster because of network roundtrip times).

> Also, I thought it was pretty obvious that the
> performance we've been seeing lately is totally unacceptable.

Define "lately". I never personally saw "totally unacceptable
performance". Whenever I access the system, it behaves completely
reasonable, much faster than any other web pages.

There were only two instances of "totally unacceptable performance",
which were when the system was overloaded, and thrashing. I have
since fixed these cases; they cannot occur again. So I don't think
it is possible that the current installation shows "totally
unacceptable" performance.

> If this was an application that had to be served dynamically (and of
> course, parts of it are), then it would be much more interesting to
> discuss targets for dynamic delivery.  The performance-critical parts of
> this application -- the pages that setuptools uses, can readily be
> served statically, so it makes no sense not to do so.

Except that somebody needs to implement that, of course.

Regards,
Martin