[Catalog-sig] start on static generation, and caching - apache config.

Jim Fulton jim at zope.com
Tue Jul 10 16:40:48 CEST 2007

On Jul 9, 2007, at 6:16 PM, Martin v. Löwis wrote:
> Ok. By "ATM", you mean July 9, 14:09 GMT?

Whenever I sent the note,

> Please take a look at
> http://ximinez.python.org/munin/localdomain/localhost.localdomain- 
> load.html
> That was the most significant spike in the load today, and I surely
> would like to know what was causing it.

Maybe someone was trying to mirror pypi because it is too slow. :/  I  
suspect that there is a lot of this going on.

>> Requests for
>> http://cheeseshop.python.org/packages/2.5/z/zc.buildout/ 
>> zc.buildout-1.0.0b28-py2.5.egg
>> take about 2.5 seconds.
> That is a static file, not going through PyPI. It's 168kiB, so that
> means you download with 67kB/s.

OK. So I guess that is reasonable.  I'll note that in the long term,  
we'll probably want to create mirrors to get better locality and this  
faster downloads and to prevent excessive bandwith consumption for  

>> Requests for http://www.python.org/pypi/ take
>> about 10 seconds.
> Why does that matter for setuptools? Does setuptools ever look at this
> page?

Phillip answered this.

>> I would say that these times are too long.
> Which of these precisely? Given that the actual file downloads in  
> 2.5s,
> why is it important that the access to the page referring to it is  
> 1/3s?

I guess all of them except the download.  Really, in the long run, I  
think the download time is too long too.  But that isn't my immediate  

BTW, the problem is exacerbased by packages like zc.buildout that  
include full documentation in their package pages.  Although even  
packages that don't do that seem to take about a third of a second.

>>> and how long should it
>>> take so that you would consider it fast enough?
>> IMO, it needs to be much much faster.  If we were serving pages
>> staticially, we would be able to serve thousands of requests per
>> second.  There's nothing about this application that would make doing
>> that hard.
> I looked at the load preceding your message. Counting 1000 requests
> backwards from 14:09, we are at 16:07. So this system receives roughly
> 1000 requests per minute in its peak load, and it seems to be able to
> handle them (although the performance degrades at that point).

You can expect one of 2 things to happen:

- We'll fix the PyPI performance problems and load will increase  
dramatically, or

- We won't fix the problems and people will create alternate  
indexes.  This is already happening.  If that happens, the load will  
likely still increase, although not as rapidly.


>>> It's difficult to implement a system if the requirements are
>>> unknown to those implementing it.
>> I'm sorry, I've been talking about setuptools all along.  I  
>> thought the
>> use case was understood.
> I understand the use case, I just don't understand the performance
> requirements resulting out of it. If it's an automated build, why do
> you care if the page download completes in 0.3s or in 0.01s (it won't
> be much faster because of network roundtrip times).

Two reasons:

- People wait for these builds.  A build will usually make *many*  
(tens or hundreds) of requests for pypi checking for new versions of  
software.  If there  are no new versions, which will be the common  
case, then nothing will be downloaded.  I'm most interested in  
speeding up the checking.  Of course, a requests for http:// 
www.python.org/pypi/  will usually be done once per build if any of  
the packages in in the build aren't in pypi  (only once because  
setuptools caches pages internally).  It would be nice to find a way  
to stop doing this.

- If performance degrades, as it has often lately, then the times are  
much longer.  In fact, requests over the last few weeks have often  
timed out, making work grind to a halt.  It't imporant to realize  
that demand will increase substantially, so whatwver we do needs to  
be scalable.

>> Also, I thought it was pretty obvious that the
>> performance we've been seeing lately is totally unacceptable.
> Define "lately". I never personally saw "totally unacceptable
> performance". Whenever I access the system, it behaves completely
> reasonable, much faster than any other web pages.

I've seen requests take minutes and time out with proxy errors many  
times over the last few weeks.  We, ZC, and many people we work with  
are at the point of building private indexes to get around the  
horrible performance.

> There were only two instances of "totally unacceptable performance",
> which were when the system was overloaded, and thrashing. I have
> since fixed these cases; they cannot occur again. So I don't think
> it is possible that the current installation shows "totally
> unacceptable" performance.

Maybe others can chime in.

>> If this was an application that had to be served dynamically (and of
>> course, parts of it are), then it would be much more interesting to
>> discuss targets for dynamic delivery.  The performance-critical  
>> parts of
>> this application -- the pages that setuptools uses, can readily be
>> served statically, so it makes no sense not to do so.
> Except that somebody needs to implement that, of course.

And happily, someone is.

I've realized this morning, in responding to a note from Philipp von  
Weitershausen that we really should take a step back and think about  
an index to support setuptools, or, failing that, rethink the ways  
we're using PyPI in light of the way setuptools works.


Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

More information about the Catalog-SIG mailing list