feature request - increment counter on write check
Originally posted as issue 6301 https://github.com/numpy/numpy/issues/6301 on github. Presumably any block of code that modifies an ndarray's buffer is wrapped in a (thread safe?) check of the writable flag. Would it be possible to hold a counter rather than a simple bool flag and then increment the counter whenever you test the flag? Hopefully this would introduce only a tiny additional overhead, but would permit pseudo-hashing to test whether a mutable array has changed since you last encountered it. Ideally this should be exposed a bit like python's __hash__ https://docs.python.org/2/reference/datamodel.html#object.__hash__method, lets say __mutablehash__, meaning a hash is returned but be warned that the object is mutable. For an ndarray, X, containing objects that themselves have a __mutablehash__ method (e.g. other ndarrays, or some user object), the X.__mutablehash__ method will need to do the full check over all constituent objects, or simply return None. Defining and API of this sort would make it possible to - for example - let pandas DataFrames also implement this interface. In terms of usage cases, the one I was motivated by was imagining improvements to the "variable explorer" in Spyder - roughly speaking, this widget's job is to always display an up-to-date summary of variables in current scope, e.g. currently it can show max/min and shape, but you could imagine also showing graphical summaries of the contents of an ndarray. If the widget could cache summaries and check which arrays have really changed it should be much faster/offer more features/be simpler internally. Note that pandas DataFrames are relevant here as an example of complex objects, containing ndarrays, which would benefit from being able to have their summaries cached. A more common/general usage case would be as a check in some kind of memoization process... #simple example... @memoize_please def hasnans(x): return np.any(np.isnan(x)) # more complex example... def convolve_fft(a,b, _cache={}): a_hash = mutablehash(a) b_hash = mutablehash(b) if a_hash not in _cache: _cache[a_hash] = fft(a) if b_hash not in _cache: _cache[b_hash] = fft(b) return ifft(_cache[a_hash] * _cache[b_hash]) A quick though on an implementation detail... I'm not sure exactly how to deal with the counter overflowing: perhaps if you treated counter==0 to mean not-writable (i.e. that would be the new version of the old write flag) then you might get some uint-wraparound checking for free (because when it wraps back around to zero the buffer ends up becoming locked)? Alternatively you could just say that no guarantee is given of wraparound being caught..though that might seriously impact on the range of possible uses. In summary... Hopefully the stuff needed to make __mutablehash__ work could be implemented simply by adding a single extra operation to the write-check (and maybe changing the footprint of the ndarray slightly to accomodate a counter). But I suspect someone will tell me that life is never that simple!
participants (5)
-
Anne Archibald
-
Daniel
-
Daniel Manson
-
Nathaniel Smith
-
Sebastian Berg