
For some reason, it might be spending most of my time sitting at the end of a long thin piece of string connecting me to the rest of the universe, but it might also have been watching people pound on my RSS feeds every five minutes, I've been trying to write cache-friendly Nevow resources.
This involves setting two HTTP headers, "Last-Modifed" and "ETag". At the moment I'm setting both of these headers using my data source (files in the file system). However, this has left me with a bit of a quandry about my docFactory templates.
When my templates change, so should my Last-Modified and ETag headers. Otherwise clients using caches will see my old templates more or less indefinitely, at least on pages I don't subsequently change, because their conditional GET requests complete with correct If-Modified-Since and If-None-Match headers will tell the server never to send a fresh copy of the data.
So, I'm faced with the problem of dating my templates or otherwise detecting when they change and I can't think of a good way.
Some thoughts:
1. use file timestamps on the template files
Pros: Fits OK with the way I deal with the rest of the website data
Cons: Reduces flexibility. I can't think of a good way to do this with Stan templates. I also can't think of a good way to do this without restarting the server when my templates change. (I do currently do this, but would prefer not to.)
2. generate the ETag header based on a hash of page contents
Pros: As best I can tell, this is how the ETag header is really meant to be generated, ideally it signals octect equality and should change if, for example, Nevow for some reason starts pretty-printing output.
Cons: rend.Page.renderHTTP seems to make this really hard -- even if you set the bufferedflag = True, rend.Page.afterRender doesn't seem to have any way to access the result of the render. (Correct me if I'm wrong.) Also, this doesn't help with the Last-Modified date, which means I'm not helping HTTP/1.0 caches very much, unless I store the date the hash changed somewhere.
3. store the templates in some kind of object store and date-stamp them there.
Pros: This might well let me change templates without restarting the server.
Cons: It imposes a maintainence burden whereby I have to update the objet database with new templates. I like to have a copy of my website and templates on two different servers, and as best I can tell, no object database is going to like being copied to a remote server without me killing all associated processes on the remote server first, so there's a deployment problem.
4. hash the template so that a changed template means a changed hash
Pros: This is probably nearly as good as hashing the page content, accuracy-wise.
Cons: I don't have any idea how to hash a DocFactory object effectively. Hashing the DocFactory still leaves me vulnerable to changes in Nevow's rendering. Hashing the DocFactory won't tell me to update Last-Modified unless I store the date that the DocFactory changed somewhere.
Anyone got any thoughts or has anyone solved this problem before? Help with implementing 2 (how do I get the page contents in order to hash them) or 4 (how can I hash a DocFactory object) also appreciated.
-Mary

On Sat, Aug 07, 2004, Mary Gardiner wrote:
- generate the ETag header based on a hash of page contents
Pros: As best I can tell, this is how the ETag header is really meant to be generated, ideally it signals octect equality and should change if, for example, Nevow for some reason starts pretty-printing output.
Cons: rend.Page.renderHTTP seems to make this really hard -- even if you set the bufferedflag = True, rend.Page.afterRender doesn't seem to have any way to access the result of the render.
The attached seems to do what I want here, I'll be interested to hear about what horrors I am inflicting upon myself by doing this.
-Mary

On Sun, Aug 08, 2004, Mary Gardiner wrote:
The attached seems to do what I want here, I'll be interested to hear about what horrors I am inflicting upon myself by doing this.
Well the first is that flatten won't return while there are Deferreds lurking around in the stan tree (hence the existence of rend.deferflatten, I guess).
Revision 2 attached. This one works in my test case...
-Mary

On Aug 8, 2004, at 1:26 AM, Mary Gardiner wrote:
On Sat, Aug 07, 2004, Mary Gardiner wrote:
- generate the ETag header based on a hash of page contents
Pros: As best I can tell, this is how the ETag header is really meant to be generated, ideally it signals octect equality and should change if, for example, Nevow for some reason starts pretty-printing output.
Cons: rend.Page.renderHTTP seems to make this really hard -- even if you set the bufferedflag = True, rend.Page.afterRender doesn't seem to have any way to access the result of the render.
The attached seems to do what I want here, I'll be interested to hear about what horrors I am inflicting upon myself by doing this.
Generally, ETags are never actually generated by hashing the page content. That's just too expensive. To do that, you need to actually generate the content before you can tell that you shouldn't send it to the user. What a waste of resources!
Apache (and new-web) generate ETags from files via: ETag("%X-%X-%X" % (st.st_ino, st.st_size, st.st_mtime), weak = (time.time() - st.st_mtime <= 1))
That "good enough" guarantees that the file still has the same content (the etag is weak if the last-modified is recent, because the file could be modified twice within the last second and then you wouldn't know).
So, for dynamically generated pages, I'd do something similar (e.g. your #1). Not sure what you mean about about reducing flexibility or what you mean about it making you need to restart the server. If you have to restart the server every time you change a template (e.g. for stan), use the server start time as part of the ETag. Otherwise, use the template modtime. The file-based doc-factories do store mtime, although it's currently in a private attribute. That will get updated when the template is reloaded.
James

On Sun, Aug 08, 2004, James Y Knight wrote:
So, for dynamically generated pages, I'd do something similar (e.g. your #1). Not sure what you mean about about reducing flexibility or what you mean about it making you need to restart the server. If you have to restart the server every time you change a template (e.g. for stan), use the server start time as part of the ETag. Otherwise, use the template modtime. The file-based doc-factories do store mtime, although it's currently in a private attribute. That will get updated when the template is reloaded.
Of course you're right about the expense of generating the content, although in some cases, if bandwidth is an issue, the generation time might be an acceptable expense against the saving in bandwidth.
Ok, I didn't know the file based templates already had this data. It sounds like that is the way to go then. It's a shame in a way, stan is quite a nice way to template :)
Thanks,
-Mary

On Aug 8, 2004, at 2:41 AM, Mary Gardiner wrote:
Ok, I didn't know the file based templates already had this data. It sounds like that is the way to go then. It's a shame in a way, stan is quite a nice way to template :)
I don't understand the issue with using stan? Why can't you use the server start time?
James

On Sun, Aug 08, 2004, James Y Knight wrote:
I don't understand the issue with using stan? Why can't you use the server start time?
Because in practice I restart it about as often as Google recrawls my site (the code base is still moving), and Google is a major reason I want to send 304s.
In any case, I've set it up so it uses the docFactory's ._mtime attribute if available and the server start time if not. In my case, since there's an advantage to using the template modified time I'm using file templates. (Also I didn't know they dynamically reloaded!)
-Mary

On Aug 8, 2004, at 5:03 PM, Mary Gardiner wrote:
On Sun, Aug 08, 2004, James Y Knight wrote:
I don't understand the issue with using stan? Why can't you use the server start time?
Because in practice I restart it about as often as Google recrawls my site (the code base is still moving), and Google is a major reason I want to send 304s.
Ah, interesting. Does google do something special with 304s besides saving bandwidth?
James

On Sun, Aug 08, 2004, James Y Knight wrote:
Ah, interesting. Does google do something special with 304s besides saving bandwidth?
I have no idea, they may over time adjust their recrawl rate depending on how many 304s they get I suppose, but I don't have any evidence of this. It's the bandwidth I'm interested in. (It's an Australian thing, you ever pay 14c/MB, all of a sudden every 200 in your logs starts reading '$$$".)
-Mary
participants (2)
-
James Y Knight
-
Mary Gardiner