Threading question .. am I doing this right?
Robert Latest
boblatest at yahoo.com
Thu Feb 24 07:08:50 EST 2022
I have a multi-threaded application (a web service) where several threads need
data from an external database. That data is quite a lot, but it is almost
always the same. Between incoming requests, timestamped records get added to
the DB.
So I decided to keep an in-memory cache of the DB records that gets only
"topped up" with the most recent records on each request:
from threading import Lock, Thread
class MyCache():
def __init__(self):
self.cache = None
self.cache_lock = Lock()
def _update(self):
new_records = query_external_database()
if self.cache is None:
self.cache = new_records
else:
self.cache.extend(new_records)
def get_data(self):
with self.cache_lock:
self._update()
return self.cache
my_cache = MyCache() # module level
This works, but even those "small" queries can sometimes hang for a long time,
causing incoming requests to pile up at the "with self.cache_lock" block.
Since it is better to quickly serve the client with slightly outdated data than
not at all, I came up with the "impatient" solution below. The idea is that an
incoming request triggers an update query in another thread, waits for a short
timeout for that thread to finish and then returns either updated or old data.
class MyCache():
def __init__(self):
self.cache = None
self.thread_lock = Lock()
self.update_thread = None
def _update(self):
new_records = query_external_database()
if self.cache is None:
self.cache = new_records
else:
self.cache.extend(new_records)
def get_data(self):
if self.cache is None:
timeout = 10 # allow more time to get initial batch of data
else:
timeout = 0.5
with self.thread_lock:
if self.update_thread is None or not self.update_thread.is_alive():
self.update_thread = Thread(target=self._update)
self.update_thread.start()
self.update_thread.join(timeout)
return self.cache
my_cache = MyCache()
My question is: Is this a solid approach? Am I forgetting something? For
instance, I believe that I don't need another lock to guard self.cache.append()
because _update() can ever only run in one thread at a time. But maybe I'm
overlooking something.
More information about the Python-list
mailing list