[Twisted-Python] In memory cache in twisted
Hi, What's a good way to use a simple dictionary as a cache in twisted framework? Basically, I have this callback chain where I ultimately make a rest call (in non-blocking way using treq) to fetch some data. But before I make the call, I am using a dictionary to see if the value is available or not. But, I have noticed that the event loop gets pretty busy(sometimes, things get stuck and twisted server stops) as soon as I add this logic.. Which is pretty much @defer.inlinecallbacks def fetch(key): if key in cache: return cache[key] # else call back to treq to fetch value cache[key] = value return value This dict can grow to around 50k.. What's a good way to solve this issue?
Hi I don't see any reason to use defer.inlineCallbacks in your snippet of codes. Regards gelin yan
Good point. Thanks for responding. But, is the above way of using dictionary as cache correct? Or is there a "deffered" way of doing this? All I want is an inmemory cache that is compatible with this async paradigm? On Thu, Sep 26, 2019 at 10:45 PM Gelin Yan <dynamicgl@gmail.com> wrote:
Hi
I don't see any reason to use defer.inlineCallbacks in your snippet of codes.
Regards
gelin yan _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Waqar Khan <wk80333@gmail.com> writes:
But, is the above way of using dictionary as cache correct? Or is there a "deffered" way of doing this? All I want is an inmemory cache that is compatible with this async paradigm?
Yes, it is fine to use a single dict as a cache. Twisted is single-threaded an so only one callback is running at a time. There's no need to lock etc. (Or, is there another reason you need to cache to "be async"? Like maybe it's using memcached or so?) -- meejah
On Friday, 27 September 2019 04:38:46 CEST Waqar Khan wrote:
Hi, What's a good way to use a simple dictionary as a cache in twisted framework? Basically, I have this callback chain where I ultimately make a rest call (in non-blocking way using treq) to fetch some data. But before I make the call, I am using a dictionary to see if the value is available or not. But, I have noticed that the event loop gets pretty busy(sometimes, things get stuck and twisted server stops) as soon as I add this logic.. Which is pretty much
@defer.inlinecallbacks def fetch(key): if key in cache: return cache[key] # else call back to treq to fetch value cache[key] = value return value
This dict can grow to around 50k.. What's a good way to solve this issue?
If it gets stuck, then the cause for that is probably in the part of the code you omitted. So it would help to elaborate on how the value is fetched exactly. I can see two other problems with this caching mechanism though: 1. Items are never removed from the cache, so unless there is a limit to the number of different keys that can be used, the cache can grow indefinitely. You might want something like an LRU cache rather than a plain dictionary. https://docs.python.org/3/library/functools.html#functools.lru_cache 2. If a lot of clients are requesting the same thing, you won't see any benefits from caching until the first request completes. So you could get a pattern like this: T=0: key A requested, A is not cached, start fetch #1 of A T=1: key A requested, A is not cached, start fetch #2 of A T=2: key A requested, A is not cached, start fetch #3 of A T=3: key A requested, A is not cached, start fetch #4 of A T=4: key A requested, A is not cached, start fetch #5 of A T=5: fetch #1 of A completes and is added to the cache T=6: key A requested, A is cached, return value immediately In this example, the value for A is fetched 5 times despite the caching mechanism. If the fetching takes a long time compared to the rate at which requests are coming in, this effect gets worse at a quadratic rate: the total time spent fetching is the number of requests that come in during the fetching of the first request times the duration of the fetch. To avoid this, you could put a Deferred for the fetch operation in the cache or in a separate dictionary and if you get another request for the same key before the fetch completes, return that Deferred instead of starting another fetch. Bye, Maarten
Hi Maarten, I think you have hit the problem in the head. I do think this is feasible as I have observed that as size of cache increases, things do get better which might support your theory. Is there a simple example you can add on "put a Deferred for the fetch operation ". I am really just getting started with twisted. Thanks for all the help. On Thu, Sep 26, 2019 at 11:40 PM Maarten ter Huurne <maarten@treewalker.org> wrote:
On Friday, 27 September 2019 04:38:46 CEST Waqar Khan wrote:
Hi, What's a good way to use a simple dictionary as a cache in twisted framework? Basically, I have this callback chain where I ultimately make a rest call (in non-blocking way using treq) to fetch some data. But before I make the call, I am using a dictionary to see if the value is available or not. But, I have noticed that the event loop gets pretty busy(sometimes, things get stuck and twisted server stops) as soon as I add this logic.. Which is pretty much
@defer.inlinecallbacks def fetch(key): if key in cache: return cache[key] # else call back to treq to fetch value cache[key] = value return value
This dict can grow to around 50k.. What's a good way to solve this issue?
If it gets stuck, then the cause for that is probably in the part of the code you omitted. So it would help to elaborate on how the value is fetched exactly.
I can see two other problems with this caching mechanism though:
1. Items are never removed from the cache, so unless there is a limit to the number of different keys that can be used, the cache can grow indefinitely. You might want something like an LRU cache rather than a plain dictionary.
https://docs.python.org/3/library/functools.html#functools.lru_cache
2. If a lot of clients are requesting the same thing, you won't see any benefits from caching until the first request completes. So you could get a pattern like this:
T=0: key A requested, A is not cached, start fetch #1 of A T=1: key A requested, A is not cached, start fetch #2 of A T=2: key A requested, A is not cached, start fetch #3 of A T=3: key A requested, A is not cached, start fetch #4 of A T=4: key A requested, A is not cached, start fetch #5 of A T=5: fetch #1 of A completes and is added to the cache T=6: key A requested, A is cached, return value immediately
In this example, the value for A is fetched 5 times despite the caching mechanism. If the fetching takes a long time compared to the rate at which requests are coming in, this effect gets worse at a quadratic rate: the total time spent fetching is the number of requests that come in during the fetching of the first request times the duration of the fetch.
To avoid this, you could put a Deferred for the fetch operation in the cache or in a separate dictionary and if you get another request for the same key before the fetch completes, return that Deferred instead of starting another fetch.
Bye, Maarten
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Friday, 27 September 2019 05:48:35 CEST Waqar Khan wrote:
Hi Maarten, I think you have hit the problem in the head. I do think this is feasible as I have observed that as size of cache increases, things do get better which might support your theory.
Is there a simple example you can add on "put a Deferred for the fetch operation ". I am really just getting started with twisted. Thanks for all the help.
Unfortunately I don't think I have any code lying around that implements this kind of cache. Bye, Maarten
Hi all, Here is async in-memory cache that I've implemented for one of my projects: https://gist.github.com/IlyaSkriblovsky/5aba53b661acd49b65efeb4ce41a8b52 It properly handles problem #2 described by Maarten. But it doesn't bother with eviction because it wasn't needed at the time of writing (because key space was limited). Usage example: @defer.inlineCallbacks def load_content(url: str): # does some long-running async task such as loading url content @defer.inlineCallbacks def main(): # DeferredCache receives loader function as an argument # Loader function must return Deferred cache = DeferredCache(load_content) # this will actually download content page1 = yield cache.get('http://example.com') # this will use cached one (note that this get() also returns Deferred, but that # Deferred will be already-succeeded, so `yield` will return the content immediately) page2 = yield cache.get('http://example.com') # illustration of problem #2 described by Maarten # note there is no yield here, we are running these to get()s simultaneously deferred_page1 = cache.get('https://www.nasa.gov/') deferred_page2 = cache.get('https://www.nasa.gov/') # actually waiting for results page1, page2 = yield deferred.gatherResults([deferred_page1, deferred_page2]) # load_content() will be called only once with 'https://www.nasa.gov/' by this point -- Ilya пт, 27 сент. 2019 г. в 06:59, Maarten ter Huurne <maarten@treewalker.org>:
On Friday, 27 September 2019 05:48:35 CEST Waqar Khan wrote:
Hi Maarten, I think you have hit the problem in the head. I do think this is feasible as I have observed that as size of cache increases, things do get better which might support your theory.
Is there a simple example you can add on "put a Deferred for the fetch operation ". I am really just getting started with twisted. Thanks for all the help.
Unfortunately I don't think I have any code lying around that implements this kind of cache.
Bye, Maarten
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (5)
-
Gelin Yan
-
Ilya Skriblovsky
-
Maarten ter Huurne
-
meejah
-
Waqar Khan