WeakValueDict and threadsafety

88888 Dihedral dihedral88888 at googlemail.com
Sat Dec 10 13:44:22 EST 2011


On Sunday, December 11, 2011 1:56:38 AM UTC+8, Darren Dale wrote:
> On Dec 10, 11:19 am, Duncan Booth <duncan.bo... at invalid.invalid>
> wrote:
> > Darren Dale <dsda... at gmail.com> wrote:
> > > I'm concerned that this is not actually thread-safe. When I no longer
> > > hold strong references to an instance of data, at some point the
> > > garbage collector will kick in and remove that entry from my registry.
> > > How can I ensure the garbage collection process does not modify the
> > > registry while I'm holding the lock?
> >
> > You can't, but it shouldn't matter.
> >
> > So long as you have a strong reference in 'data' that particular object
> > will continue to exist. Other entries in 'registry' might disappear while
> > you are holding your lock but that shouldn't matter to you.
> >
> > What is concerning though is that you are using `id(data)` as the key and
> > then presumably storing that separately as your `oid` value. If the
> > lifetime of the value stored as `oid` exceeds the lifetime of the strong
> > references to `data` then you might get a new data value created with the
> > same id as some previous value.
> >
> > In other words I think there's a problem here, but nothing to do with the
> > lock.
> 
> Thank you for the considered response. In reality, I am not using
> id(data). I took that from the example in the documentation at
> python.org in order to illustrate the basic approach, but it looks
> like I introduced an error in the code. It should read:
> 
> def get_data(oid):
>     with reglock:
>         data = registry.get(oid, None)
>         if data is None:
>             data = make_data(oid)
>             registry[oid] = data
>     return data
> 
> Does that look better? I am actually working on the h5py project
> (bindings to hdf5), and the oid is an hdf5 object identifier.
> make_data(oid) creates a proxy object that stores a strong reference
> to oid.
> 
> My concern is that the garbage collector is modifying the dictionary
> underlying WeakValueDictionary at the same time that my multithreaded
> code is trying to access it, producing a race condition. This morning
> I wrote a synchronized version of WeakValueDictionary (actually
> implemented in cython):
> 
> class _Registry:
> 
>     def __cinit__(self):
>         def remove(wr, selfref=ref(self)):
>             self = selfref()
>             if self is not None:
>                 self._delitem(wr.key)
>         self._remove = remove
>         self._data = {}
>         self._lock = FastRLock()
> 
>     __hash__ = None
> 
>     def __setitem__(self, key, val):
>         with self._lock:
>             self._data[key] = KeyedRef(val, self._remove, key)
> 
>     def _delitem(self, key):
>         with self._lock:
>             del self._data[key]
> 
>     def get(self, key, default=None):
>         with self._lock:
>             try:
>                 wr = self._data[key]
>             except KeyError:
>                 return default
>             else:
>                 o = wr()
>                 if o is None:
>                     return default
>                 else:
>                     return o
> 
> Now that I am using this _Registry class instead of
> WeakValueDictionary, my test scripts and my actual program are no
> longer producing segfaults.
I'll prefer to get iterators and iterables that can accept a global  signal 
called  a clock to replace these CS mess. 




More information about the Python-list mailing list