WeakValueDict and threadsafety

Darren Dale dsdale24 at gmail.com
Sat Dec 10 15:14:23 EST 2011


On Dec 10, 2:09 pm, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
> Darren Dale <dsdal... at gmail.com> wrote:
> > On Dec 10, 11:19 am, Duncan Booth <duncan.bo... at invalid.invalid>
> > wrote:
> >> Darren Dale <dsdal... at gmail.com> wrote:
> > def get_data(oid):
> >     with reglock:
> >         data = registry.get(oid, None)
> >         if data is None:
> >             data = make_data(oid)
> >             registry[oid] = data
> >     return data
>
> > Does that look better? I am actually working on the h5py project
> > (bindings to hdf5), and the oid is an hdf5 object identifier.
> > make_data(oid) creates a proxy object that stores a strong reference
> > to oid.
>
> Yes, that looks better.
>
>
>
> > Now that I am using this _Registry class instead of
> > WeakValueDictionary, my test scripts and my actual program are no
> > longer producing segfaults.
>
> I think that so far as multi-thread race conditions are concerned Python
> usually tries to guarantee that you won't get seg faults. So if you were
> getting seg faults my guess would be that either you've found a bug in the
> WeakValueDictionary implementation or you've got a bug in some of your code
> outside Python.

Have you seen Alex Martelli's answer at
http://stackoverflow.com/questions/3358770/python-dictionary-is-thread-safe
? The way I read that, it seems pretty clear that deleting items from
a dict can lead to crashes in threaded code. (Well, he says as long as
you don't performing an assignment or a deletion in threaded code,
there may be issues, but at least it shouldn't crash.)

> For example if your proxy object has a __del__ method to clean up the
> object it is proxying then you could be creating a new object with the same
> oid as one that is in the process of being destroyed (the object disappears
> from the WeakValueDictionary before the __del__ method is actually called).
>
> Without knowing anything about HDF5 I don't know if that's a problem but I
> could imagine you could end up creating a new proxy object that references
> something in the HDF5 library which you then destroy as part of cleaning up
> a previous incarnation of the object but continue to access through the new
> proxy.

We started having problems when HDF5 began recycling oids as soon as
their reference count went to zero, which was why we began using
IDProxy and the registry. The IDProxy implementation below does have a
__dealloc__ method, which we use to decrease the HDF5's internal
reference count to the oid. Adding these proxies and registry dealt
with the issue of creating a new proxy that references an old oid
(even in non-threaded code), but it created a rare (though common
enough) segfault in multithreaded code. This synchronized registry is
the best I have been able to do, and it seems to address the problem.
Could you suggest another approach?

cdef IDProxy getproxy(hid_t oid):
    # Retrieve an IDProxy object appropriate for the given object
identifier
    cdef IDProxy proxy
    proxy = registry.get(oid, None)
    if proxy is None:
        proxy = IDProxy(oid)
        registry[oid] = proxy

    return proxy


cdef class IDProxy:

    property valid:
        def __get__(self):
            return H5Iget_type(self.id) > 0

    def __cinit__(self, id):
        self.id = id
        self.locked = 0

    def __dealloc__(self):
        if self.id > 0 and (not self.locked) and H5Iget_type(self.id)
> 0 \
          and H5Iget_type(self.id) != H5I_FILE:
            H5Idec_ref(self.id)



More information about the Python-list mailing list