[Catalog-sig] Caching and expires for the cheese shop.

René Dudfield renesd at gmail.com
Tue Jun 26 02:03:29 CEST 2007


here's some examples of caching, and expires config you can add to
your apache instance to improve performance.

First I talk about caching stuff, then at the bottom there is some
expires config for apache.



You'll need to tweak settings quite a bit probably to get optimal performance.

I volunteered to get this working before, but I was never sent the
cheese shop apache config so I could test it.

It's a very simple change, which can give good performance increases.

CacheRoot "/var/tmp/proxy2/cheeseshop"
CacheEnable disk /
CacheSize 4000000
# CacheMinFileSize setting this so that 403 forbidden pages are not cached.
CacheMinFileSize 400
CacheDirLevels 5
CacheDirLength 3
#CacheGcInterval 4
CacheMaxExpire 24
CacheLastModifiedFactor 0.1
CacheDefaultExpire 1
#CacheForceCompletion 100

You may need to add some last modified headers to the cheese shop
output.  Since it doesn't appear that is being done yet.

It appears updates are only happening 10 times a day.  So caching of
the cheese shop should give a big increase.

If someone wants to be my hands on the server for an hour or so, I
could do any tweaking necessary.

I'll look at making a patch to cheese shop to put in last modified
headers.  feedparser.py has pretty good last modified handling if
someone wants to look there as an example.

If someone could help me with the cheeseshop code, that'd be great.
But if not, I'll dig into and do a complete patch.

Here is the pseudo code for adding Last-Modified handling to cheese shop.

def get_when_changed_header_from_http_client(request):

    if (request['If-Modified-Since'])
        // Split the If-Modified-Since (Netscape < v6 gets this wrong)
        modified_since = request['If-Modified-Since'].split(";")

        // Turn the client request If-Modified-Since into a timestamp
        modified_since = feedparser._parse_date(modified_since[0])
         // Set modified since to 0
         modified_since = 0

def get_date_for_most_recently_changed_part_of_page():
    # would probably do a database look up to see when this page last changed.
    # would need select statements for each type of page.
    #  eg, main page, single project page, category page etc.

modified_since = get_when_changed_header_from_http_client(request)
when_changed = get_date_for_most_recently_changed_part_of_page()

if (when_changed <= modified_since):
    return header('HTTP/1.1 304 Not Modified');

how_long_pages_can_be_wrong_for = "10 minutes"

set_expires_header(now() + how_long_pages_can_be_wrong_for)




# Setting up the expires stuff should be ok, if changes to images,
javascript, and css are not frequent.  If they do change, then
references to the external scripts should change too.
#  eg, add variables like style.css?r=801  that way browsers have to
download new ones.

# Setting the expires stuff can make it so that web browsers don't
even attempt to download stuff they already have.

ExpiresActive On
ExpiresByType image/gif A604800
ExpiresByType image/png A604800
ExpiresByType image/jpeg A604800
#ExpiresByType text/* A86400
ExpiresByType text/css A604800
ExpiresByType text/javascript A604800
ExpiresByType application/x-javascript A604800
ExpiresByType application/x-shockwave-flash A604800

More information about the Catalog-SIG mailing list