[Catalog-sig] start on static generation, and caching - apache config.
Phillip J. Eby
pje at telecommunity.com
Sun Jul 8 21:33:36 CEST 2007
At 07:37 PM 7/8/2007 +0200, Martin v. Löwis wrote:
> > If they're effectively static, why can't Apache cache them?
>
>That's easy to answer: nobody told Apache to do that
>(and I don't know how to tell it to).
>
>René's approach currently is to generate the files explicitly
>on disk, and then have Apache return them always from disk.
>
> > Shouldn't
> > we be able to simply add Last-Modified/If-Modified support to the PyPI
> > output, and enable Apache's disk caching for non-logged-in users?
>
>How precisely would that work? I.e. what software should put what
>header into what place, and how would the cache then find out that
>the real data have changed?
I was under the impression that when Apache caching is enabled, it
can add an If-Modified-Since header to incoming requests, and in the
event that the dynamic content hasn't changed, use its cached version
of the response. I am not an expert on this, however.
If it does do this, then PyPI would check for an If-Modified-Since
header and compare it to the modified date for the page, and return a
"not changed" response if appropriate.
> > While that's not necessarily as fast as static page generation, it's a
> > lot less complex to get right, and it saves the main piece of CPU load:
> > i.e., doing SQL queries and actually generating the page.
>
>I'm not convinced yet that this is where the time is spent (seeing
>actual profiling data would convince me).
I thought Rene' had done such profiling, as he said it was the
templates that were taking most of the CPU.
> > Pages that pertain to more than one package might be a bit more complex
> > to do this on, but if I understand correctly it's mainly the
> > package-specific pages we're concerned with here, correct?
>
>I'm not convinced of that, either.
Well, I thought those were the ones we were caching.
It may be that I'm making too many assumptions, but if those
assumptions are correct, then the whole thing gets a lot easier to
prove correct, compared to a static cache, due to fewer moving
parts. If most CPU time is spent rendering package-specific pages,
then this approach would fix the problem using the fewest changed
parts and extra code to maintain.
More information about the Catalog-SIG
mailing list