Request for comments on a design
TomF
tomf.sessile at gmail.com
Sat Oct 23 02:26:48 EDT 2010
I have a program that manipulates lots of very large indices, which I
implement as bit vectors (via the bitarray module). These are too
large to keep all of them in memory so I have to come up with a way to
cache and load them from disk as necessary. I've been reading about
weak references and it looks like they may be what I want.
My idea is to use a WeakValueDictionary to hold references to these
bitarrays, so Python can decide when to garbage collect them. I then
keep a key-value database of them (via bsddb) on disk and load them
when necessary. The basic idea for accessing one of these indexes is:
_idx_to_bitvector_dict = weakref.WeakValueDictionary()
def retrieve_index(idx):
if idx in _idx_to_bitvector_dict and _idx_to_bitvector_dict[idx] is
not None:
return _idx_to_bitvector_dict[idx]
else: # it's been gc'd
bv_str = bitvector_from_db[idx] # Load from bsddb
bv = cPickle.loads(bv_str) # Deserialize the string
_idx_to_bitvector_dict[idx] = bv # Re-initialize the weak
dict element
return bv
Hopefully that's not too confusing. Comments on this approach? I'm
wondering whether the weakref stuff isn't duplicating some of the
caching that bsddb might be doing.
Thanks,
-Tom
More information about the Python-list
mailing list