[Numpy-discussion] feature request - increment counter on write check

Fri Sep 11 09:10:48 EDT 2015

Originally posted as issue 6301 <https://github.com/numpy/numpy/issues/6301> on
github.

Presumably any block of code that modifies an ndarray's buffer is wrapped
in a (thread safe?) check of the writable flag. Would it be possible to
hold a counter rather than a simple bool flag and then increment the
counter whenever you test the flag? Hopefully this would introduce only a
tiny additional overhead, but would permit pseudo-hashing to test whether a
mutable array has changed since you last encountered it.

Ideally this should be exposed a bit like python's __hash__
<https://docs.python.org/2/reference/datamodel.html#object.__hash__>method,
lets say __mutablehash__, meaning a hash is returned but be warned that the
object is mutable.  For an ndarray, X,  containing objects that themselves
have a __mutablehash__ method (e.g. other ndarrays, or some user object),
the X.__mutablehash__ method will need to do the full check over all
constituent objects, or simply return None.  Defining and API of this sort
would make it possible to - for example - let pandas DataFrames also
implement this interface.

In terms of usage cases, the one I was motivated by was imagining
improvements to the "variable explorer" in Spyder - roughly speaking, this
widget's job is to always display an up-to-date summary of variables in
current scope, e.g. currently it can show max/min and shape, but you could
imagine also showing graphical summaries of the contents of an ndarray.  If
the widget could cache summaries and check which arrays have really changed
it should be much faster/offer more features/be simpler internally.  Note
that pandas DataFrames are relevant here as an example of complex objects,
containing ndarrays, which would benefit from being able to have their
summaries cached.

A more common/general usage case would be as a check in some kind of
memoization process...

#simple example...
@memoize_please
def hasnans(x):
   return np.any(np.isnan(x))

# more complex example...
def convolve_fft(a,b, _cache={}):
   a_hash = mutablehash(a)
   b_hash = mutablehash(b)
   if a_hash not in _cache:
      _cache[a_hash] = fft(a)
  if b_hash not in _cache:
      _cache[b_hash] = fft(b)
  return ifft(_cache[a_hash] * _cache[b_hash])

A quick though on an implementation detail...

I'm not sure exactly how to deal with the counter overflowing: perhaps if
you treated counter==0 to mean not-writable (i.e. that would be the new
version of the old write flag) then you might get some uint-wraparound
checking for free (because when it wraps back around to zero the buffer
ends up becoming locked)?  Alternatively you could just say that no
guarantee is given of wraparound being caught..though that might seriously
impact on the range of possible uses.

In summary...
Hopefully the stuff needed to make __mutablehash__ work could be
implemented simply by adding a single extra operation to the write-check
(and maybe changing the footprint of the ndarray slightly to accomodate a
counter).  But I suspect someone will tell me that life is never that
simple!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150911/2ef4f6de/attachment.html>