[Catalog-sig] start on static generation, and caching - apache config.
Jim Fulton
jim at zope.com
Tue Jul 10 16:40:48 CEST 2007
On Jul 9, 2007, at 6:16 PM, Martin v. Löwis wrote:
...
> Ok. By "ATM", you mean July 9, 14:09 GMT?
Whenever I sent the note,
> Please take a look at
>
> http://ximinez.python.org/munin/localdomain/localhost.localdomain-
> load.html
>
> That was the most significant spike in the load today, and I surely
> would like to know what was causing it.
Maybe someone was trying to mirror pypi because it is too slow. :/ I
suspect that there is a lot of this going on.
>
>> Requests for
>> http://cheeseshop.python.org/packages/2.5/z/zc.buildout/
>> zc.buildout-1.0.0b28-py2.5.egg
>> take about 2.5 seconds.
>
> That is a static file, not going through PyPI. It's 168kiB, so that
> means you download with 67kB/s.
OK. So I guess that is reasonable. I'll note that in the long term,
we'll probably want to create mirrors to get better locality and this
faster downloads and to prevent excessive bandwith consumption for
python.org.
>
>> Requests for http://www.python.org/pypi/ take
>> about 10 seconds.
>
> Why does that matter for setuptools? Does setuptools ever look at this
> page?
Phillip answered this.
>> I would say that these times are too long.
>
> Which of these precisely? Given that the actual file downloads in
> 2.5s,
> why is it important that the access to the page referring to it is
> 1/3s?
I guess all of them except the download. Really, in the long run, I
think the download time is too long too. But that isn't my immediate
concern.
BTW, the problem is exacerbased by packages like zc.buildout that
include full documentation in their package pages. Although even
packages that don't do that seem to take about a third of a second.
>>> and how long should it
>>> take so that you would consider it fast enough?
>>
>> IMO, it needs to be much much faster. If we were serving pages
>> staticially, we would be able to serve thousands of requests per
>> second. There's nothing about this application that would make doing
>> that hard.
>
> I looked at the load preceding your message. Counting 1000 requests
> backwards from 14:09, we are at 16:07. So this system receives roughly
> 1000 requests per minute in its peak load, and it seems to be able to
> handle them (although the performance degrades at that point).
You can expect one of 2 things to happen:
- We'll fix the PyPI performance problems and load will increase
dramatically, or
- We won't fix the problems and people will create alternate
indexes. This is already happening. If that happens, the load will
likely still increase, although not as rapidly.
...
>>> It's difficult to implement a system if the requirements are
>>> unknown to those implementing it.
>>
>> I'm sorry, I've been talking about setuptools all along. I
>> thought the
>> use case was understood.
>
> I understand the use case, I just don't understand the performance
> requirements resulting out of it. If it's an automated build, why do
> you care if the page download completes in 0.3s or in 0.01s (it won't
> be much faster because of network roundtrip times).
Two reasons:
- People wait for these builds. A build will usually make *many*
(tens or hundreds) of requests for pypi checking for new versions of
software. If there are no new versions, which will be the common
case, then nothing will be downloaded. I'm most interested in
speeding up the checking. Of course, a requests for http://
www.python.org/pypi/ will usually be done once per build if any of
the packages in in the build aren't in pypi (only once because
setuptools caches pages internally). It would be nice to find a way
to stop doing this.
- If performance degrades, as it has often lately, then the times are
much longer. In fact, requests over the last few weeks have often
timed out, making work grind to a halt. It't imporant to realize
that demand will increase substantially, so whatwver we do needs to
be scalable.
>> Also, I thought it was pretty obvious that the
>> performance we've been seeing lately is totally unacceptable.
>
> Define "lately". I never personally saw "totally unacceptable
> performance". Whenever I access the system, it behaves completely
> reasonable, much faster than any other web pages.
I've seen requests take minutes and time out with proxy errors many
times over the last few weeks. We, ZC, and many people we work with
are at the point of building private indexes to get around the
horrible performance.
> There were only two instances of "totally unacceptable performance",
> which were when the system was overloaded, and thrashing. I have
> since fixed these cases; they cannot occur again. So I don't think
> it is possible that the current installation shows "totally
> unacceptable" performance.
Maybe others can chime in.
>> If this was an application that had to be served dynamically (and of
>> course, parts of it are), then it would be much more interesting to
>> discuss targets for dynamic delivery. The performance-critical
>> parts of
>> this application -- the pages that setuptools uses, can readily be
>> served statically, so it makes no sense not to do so.
>
> Except that somebody needs to implement that, of course.
And happily, someone is.
I've realized this morning, in responding to a note from Philipp von
Weitershausen that we really should take a step back and think about
an index to support setuptools, or, failing that, rethink the ways
we're using PyPI in light of the way setuptools works.
Jim
--
Jim Fulton mailto:jim at zope.com Python Powered!
CTO (540) 361-1714 http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
More information about the Catalog-SIG
mailing list