[PATCH] nonbuffered cache
Hello, After klive pages become larger and the db become larger too, I realized that enabling my PageCache methods was destroing the very nice and useful nonbuffered mode that allows showing the fragment of pages that are already rendered while nevow keeps working to finishing rendering the whole page. The nonbuffered mode makes interactive behaviour completely different for klive (while it's almost invisible for other projects that renders much more quickly because of much smaller pages). Disabling the cache is a no way, a simple ab2 would DoS the server without cache, it wouldn't even survive ./ . So I modified the my cache patches to support nonbuffered mode. That was very easy. I cleaned up the code a bit too. I added some db queries to take several seconds to complete to block the page rendering a few times, and it was real fun to open 3 browsers and see the bar on the right enlarging every few seconds on all three browsers at the same time after each query returned. Clicking reload also re-display only the part of the page already rendered and then it waits nevow to complete. Performance is excellent as usual (it greatly exceeds the performance of the network, it'd require gigabit ethernet to the internet to saturate the link): andrea@opteron:~> ab2 -n2000 -c 200 http://localhost:8818/ This is ApacheBench, Version 2.0.41-dev <$Revision: 1.121.2.12 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 200 requests Completed 400 requests Completed 600 requests Completed 800 requests Completed 1000 requests Completed 1200 requests Completed 1400 requests Completed 1600 requests Completed 1800 requests Finished 2000 requests Server Software: TwistedWeb/SVN-Trunk Server Hostname: localhost Server Port: 8818 Document Path: / Document Length: 166396 bytes Concurrency Level: 200 Time taken for tests: 7.210944 seconds Complete requests: 2000 Failed requests: 0 Write errors: 0 Total transferred: 335883918 bytes HTML transferred: 335625492 bytes Requests per second: 277.36 [#/sec] (mean) Time per request: 721.094 [ms] (mean) Time per request: 3.605 [ms] (mean, across all concurrent requests) Transfer rate: 45487.94 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 104 547.7 0 3003 Processing: 204 563 470.2 390 1978 Waiting: 110 334 68.9 329 544 Total: 204 667 716.4 390 3625 Percentage of the requests served within a certain time (ms) 50% 390 66% 449 75% 498 80% 530 90% 1959 95% 1977 98% 3528 99% 3543 100% 3625 (longest request) Even while the benchmark runs the server is accessible immediately. This delivered 45Mbytes/sec of payload (not megabit), with 277 requests per second. Without it such an ab2 command DoS the server. You can see the effect online as usual on klive website. I recommend applying to nevow CVS so others can use it too. If you don't change your code to enable it with the following API this patch *cannot* break anything even if it would be completely broken, so there are no excuses for not applying it ;), this is in the _obviously_ safe category. Here the API to enable it in your code. NOTE: you cannot use nevow_carryover/IHand with this, it makes no sense to cache forms output anyway. Recommended use is with cached_page_class and a LIFETIME of 5 seconds (the above benchmark has a lifetime of 5 sec). class forever_cached_page_class(rend.Page): cache = True class cached_page_class(forever_cached_page_class): lifetime = LIFETIME class cached_page_class(rend.Page): cache = True lifetime = LIFETIME max_cache_size = 20*1024*1024 (LIFETIME in seconds) I consider this a must-have for most nevow sites, thanks! Index: Nevow/nevow/util.py =================================================================== --- Nevow/nevow/util.py (revision 4051) +++ Nevow/nevow/util.py (working copy) @@ -132,6 +132,7 @@ from twisted.python import failure from twisted.python.failure import Failure from twisted.python import log + from twisted.internet import reactor except ImportError: class Deferred(object): pass Index: Nevow/nevow/rend.py =================================================================== --- Nevow/nevow/rend.py (revision 4051) +++ Nevow/nevow/rend.py (working copy) @@ -491,6 +491,62 @@ self.children[name] = child +class PageCache(object): + def __init__(self): + self.__db = {} + def cacheIDX(self, ctx): + return str(url.URL.fromContext(ctx)) + def __storeCache(self, cacheIDX, c): + self.__db[cacheIDX] = c + def __deleteCache(self, cacheIDX): + del self.__db[cacheIDX] + def __deleteCacheData(self, cacheIDX, page): + size = self.__db[cacheIDX][1] + assert len(self.__db[cacheIDX][0]) == size + page.subCacheSize(size) + self.__deleteCache(cacheIDX) + def __lookupCache(self, cacheIDX): + return self.__db.get(cacheIDX) + def getCache(self, ctx, request): + cacheIDX = self.cacheIDX(ctx) + c = self.__lookupCache(cacheIDX) + + if c is None: + c = ['', (util.Deferred(), request)] + self.__storeCache(cacheIDX, c) + def writer(buf): + c[0] += buf + for d, r in c[1:]: + r.write(buf) + return None, writer + elif isinstance(c, list): + d = util.Deferred() + request.write(c[0]) + c.append((d, request)) + return d, None + + return c[0], None + def cacheRendered(self, ctx, result, page): + cacheIDX = self.cacheIDX(ctx) + defer_list = self.__lookupCache(cacheIDX) + assert isinstance(defer_list[1][0], util.Deferred) + data = defer_list[0] + size = len(data) + if page.canCache(size): + # overwrite the deferred with the data + timer = None + if page.lifetime > 0: + timer = util.reactor.callLater(page.lifetime, + self.__deleteCacheData, cacheIDX, page) + page.addCacheSize(size) + self.__storeCache(cacheIDX, (data, size, timer, )) + else: + self.__deleteCache(cacheIDX) + for d,r in defer_list[1:]: + d.callback(result) + +_CACHE = PageCache() + class Page(Fragment, ConfigurableFactory, ChildLookupMixin): """A page is the main Nevow resource and renders a document loaded via the document factory (docFactory). @@ -504,8 +560,23 @@ afterRender = None addSlash = None + cache = False + lifetime = 0 + max_cache_size = None + __cache_size = 0 + flattenFactory = lambda self, *args: flat.flattenFactory(*args) + def addCacheSize(self, size): + assert self.canCache(size) + self.__cache_size += size + def subCacheSize(self, size): + self.__cache_size -= size + assert self.__cache_size >= 0 + def canCache(self, size): + return self.max_cache_size is None or \ + self.__cache_size + size <= self.max_cache_size + def renderHTTP(self, ctx): if self.beforeRender is not None: return util.maybeDeferred(self.beforeRender,ctx).addCallback( @@ -530,7 +601,18 @@ if self.afterRender is not None: return util.maybeDeferred(self.afterRender,ctx) - if self.buffered: + if self.cache: + cache, writer = _CACHE.getCache(ctx, request) + if cache: + return cache + + assert not self.buffered + assert self.afterRender is None + + def finisher(result): + _CACHE.cacheRendered(ctx, result, self) + return result + elif self.buffered: io = StringIO() writer = io.write def finisher(result):
On Thu, Jan 12, 2006 at 02:24:31PM +0100, Andrea Arcangeli wrote:
Hello,
Cool, the patch looks good. I'm going to apply it as soon as there are unittests. Some ideas for unittests is to have a counter on the page and check that it never updates while the pages are cached and similar things. There's also my cache module that needs unittests, if you feel like writing a 'general' test for caching frameworks then I would apply my own caching module that is currently here: http://divmod.org/trac/ticket/291 I suggest also adding a ticket to trac for your patch. After this I'll apply the patch indeed and happily.
Server Software: TwistedWeb/SVN-Trunk Server Hostname: localhost Server Port: 8818
Document Path: / Document Length: 166396 bytes
Concurrency Level: 200 Time taken for tests: 7.210944 seconds Complete requests: 2000 Failed requests: 0 Write errors: 0 Total transferred: 335883918 bytes HTML transferred: 335625492 bytes Requests per second: 277.36 [#/sec] (mean) Time per request: 721.094 [ms] (mean) Time per request: 3.605 [ms] (mean, across all concurrent requests) Transfer rate: 45487.94 [Kbytes/sec] received
Impressive. -- Valentino Volonghi aka Dialtone Now Running MacOSX 10.4 Blog: http://vvolonghi.blogspot.com http://weever.berlios.de
On Thu, Jan 12, 2006 at 05:06:12PM +0100, Valentino Volonghi wrote:
Cool, the patch looks good. I'm going to apply it as soon as there are unittests. Some ideas for unittests is to have a counter on the page and check that it never updates while the pages are cached and similar things.
The ideal unittest would depend on the timings too. Are unittest allowed to depend on the timigs? However writing the unittest is going to be more difficult than writing the feature ;)
I suggest also adding a ticket to trac for your patch.
Ok. BTW, in the meantime I did a minor cleanup, the first deferred in the list isn't needed, it can be replaced by Null, that saves some bytes at runtime (nothing measurable but anyway). Thanks!
On Thu, Jan 12, 2006 at 08:47:22PM +0100, Andrea Arcangeli wrote:
The ideal unittest would depend on the timings too. Are unittest allowed to depend on the timigs? However writing the unittest is going to be more difficult than writing the feature ;)
Create a deferred, create a timer, return the deferred from the test, the deferred will be fired when the timer triggers it.
I suggest also adding a ticket to trac for your patch.
Ok. BTW, in the meantime I did a minor cleanup, the first deferred in the list isn't needed, it can be replaced by Null, that saves some bytes at runtime (nothing measurable but anyway).
I don't think we care about bytes. The cleaner the better. -- Valentino Volonghi aka Dialtone Now Running MacOSX 10.4 Blog: http://vvolonghi.blogspot.com http://weever.berlios.de
On Thu, Jan 12, 2006 at 10:18:51PM +0100, Valentino Volonghi wrote:
I don't think we care about bytes. The cleaner the better.
Well, certainly it's a very minor optimization, but it's a bit cleaner, otherwise somebody could think such an unused deferred was actually meant to do something. Once I post updates I'll post the new version, at the moment such a small change is not worth the bandwidth ;)
On Thu, Jan 12, 2006 at 08:47:22PM +0100, Andrea Arcangeli wrote:
On Thu, Jan 12, 2006 at 05:06:12PM +0100, Valentino Volonghi wrote:
Cool, the patch looks good. I'm going to apply it as soon as there are unittests. Some ideas for unittests is to have a counter on the page and check that it never updates while the pages are cached and similar things.
The ideal unittest would depend on the timings too. Are unittest allowed to depend on the timigs? However writing the unittest is going to be more difficult than writing the feature ;)
Timings are extremely difficult to test reliably, but that doesn't mean you can't test your code. Valuable things to test include (assuming I understand the purpose of the patch correctly): - pages are still renderable/retrievable as normal with caching enabled, - invalidating the cache manually (rather than relying on timing) causes the page to be freshly rendered, - two concurrent clients for the same page work -- they get the page they expect, - and that two concurrent clients for the same page cause the page to be generated just once (i.e. request A is received, page starts being generated, a request B that occurs while request A is still unfinished should not cause a second page generation to occur). This doesn't make the test suite ensure that caches still get the N pages/sec throughput you're after -- but it does ensure that they don't break, and that at a coarse level that it's doing the right things for good performance (e.g. the last suggestion in that list ensures that someone can't simply replace your cache infrastructure with no-ops by accident and still have tests pass). There are probably many other acheivable and worthwhile tests like these you could do. It's not a complete solution, but it's a heck of a lot better than nothing! -Andrew.
participants (3)
-
Andrea Arcangeli
-
Andrew Bennetts
-
Valentino Volonghi aka Dialtone