[Catalog-sig] start on static generation, and caching - apache config.

Phillip J. Eby pje at telecommunity.com
Sun Jul 8 21:33:36 CEST 2007


At 07:37 PM 7/8/2007 +0200, Martin v. Löwis wrote:
> > If they're effectively static, why can't Apache cache them?
>
>That's easy to answer: nobody told Apache to do that
>(and I don't know how to tell it to).
>
>René's approach currently is to generate the files explicitly
>on disk, and then have Apache return them always from disk.
>
> > Shouldn't
> > we be able to simply add Last-Modified/If-Modified support to the PyPI
> > output, and enable Apache's disk caching for non-logged-in users?
>
>How precisely would that work? I.e. what software should put what
>header into what place, and how would the cache then find out that
>the real data have changed?

I was under the impression that when Apache caching is enabled, it 
can add an If-Modified-Since header to incoming requests, and in the 
event that the dynamic content hasn't changed, use its cached version 
of the response.  I am not an expert on this, however.

If it does do this, then PyPI would check for an If-Modified-Since 
header and compare it to the modified date for the page, and return a 
"not changed" response if appropriate.


> > While that's not necessarily as fast as static page generation, it's a
> > lot less complex to get right, and it saves the main piece of CPU load:
> > i.e., doing SQL queries and actually generating the page.
>
>I'm not convinced yet that this is where the time is spent (seeing
>actual profiling data would convince me).

I thought Rene' had done such profiling, as he said it was the 
templates that were taking most of the CPU.


> > Pages that pertain to more than one package might be a bit more complex
> > to do this on, but if I understand correctly it's mainly the
> > package-specific pages we're concerned with here, correct?
>
>I'm not convinced of that, either.

Well, I thought those were the ones we were caching.

It may be that I'm making too many assumptions, but if those 
assumptions are correct, then the whole thing gets a lot easier to 
prove correct, compared to a static cache, due to fewer moving 
parts.  If most CPU time is spent rendering package-specific pages, 
then this approach would fix the problem using the fewest changed 
parts and extra code to maintain.



More information about the Catalog-SIG mailing list