[Catalog-sig] start on static generation, and caching - apache config.

"Martin v. Löwis" martin at v.loewis.de
Sun Jul 8 22:00:44 CEST 2007


> I was under the impression that when Apache caching is enabled, it can
> add an If-Modified-Since header to incoming requests, and in the event
> that the dynamic content hasn't changed, use its cached version of the
> response.  I am not an expert on this, however.

Where would it add that? The (F)CGI script doesn't see any headers,
except for those communicated in environment variables. AFAICT,
there is non for if-modified-since.

If you were thinking of mod_cache: it will expire entries after
CacheDefaultExpire (default 1h), unless an Expires or Last-Modified
header is in the original response. In the latter case,
CacheLastModifiedFactor is used to determine an expiry period
(default 10% since last-modified).

>> I'm not convinced yet that this is where the time is spent (seeing
>> actual profiling data would convince me).
> 
> I thought Rene' had done such profiling, as he said it was the templates
> that were taking most of the CPU.

I saw that he said that its taking most of the CPU, however, he didn't
say he did profiling.

I now did, and found that the parsing of the templates takes some time,
so it now caches the parsed templates.

>> > Pages that pertain to more than one package might be a bit more complex
>> > to do this on, but if I understand correctly it's mainly the
>> > package-specific pages we're concerned with here, correct?
>>
>> I'm not convinced of that, either.
> 
> Well, I thought those were the ones we were caching.

Not "were caching", but "going to cache". As I said before, I'm
unconvinced that this is were the load goes; as a consequence,
I'm unconvinced that generating static pages will improve things.

Of course, if Rene completes this project, and the static
pages don't actually break anything, it shouldn't hurt to use them;
then we will see what the saving is (there surely will be *some*
saving, and it might be that those who complain about the performance
most will see a performance increase assuming that they are primarily
interested in the static pages).

> It may be that I'm making too many assumptions, but if those assumptions
> are correct, then the whole thing gets a lot easier to prove correct,
> compared to a static cache, due to fewer moving parts.  If most CPU time
> is spent rendering package-specific pages, then this approach would fix
> the problem using the fewest changed parts and extra code to maintain.

My biggest concern is whether there can be a reliable computation of
"has this changed". If that predicate gives an incorrect response,
it doesn't matter much whether Apache does its own caching, or whether
the static page fail to be regenerated.

Regards,
Martin



More information about the Catalog-SIG mailing list