[Numpy-discussion] RFC: Detecting array changes (NumPy 2.0?)

Anne Archibald aarchiba at physics.mcgill.ca
Fri Mar 11 16:04:30 EST 2011


On 11 March 2011 15:34, Charles R Harris <charlesr.harris at gmail.com> wrote:
>
> On Fri, Mar 11, 2011 at 1:06 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>>
>>  On Fri, 11 Mar 2011 19:37:42 +0000 (UTC), Pauli Virtanen <pav at iki.fi>
>>  wrote:
>> > On Fri, 11 Mar 2011 11:47:58 -0700, Charles R Harris wrote:
>> > [clip]
>> >> What about views? Wouldn't it be easier to write another object
>> >> wrapping
>> >> an ndarray?
>> >
>> > I think the buffer interfaces and all other various ways Numpy
>> > provides
>> > exports for arrays make keeping tabs on modification impossible to do
>> > completely reliably.
>>
>>  Not to mention all the pain of making sure the arrays are wrapped and
>>  stay wrapped in the first place. In particular in combination with other
>>  array wrappers.
>>
>>  I wasn't saying this is absolutely needed, just that it'd be a really
>>  convenient feature helpful for caching. Sometimes, introducing fast
>>  caching this way can remove a lot of logic from the code. Introducing a
>>  Python-space visible wrapper object kind of defeats the purpose for me.
>>
>
> Well, starting with a wrapped object would allow you to experiment and
> discover what it is you really need. A smallish specialized object is
> probably a better starting point for development than a big solution.
> Operating systems do this sort of thing with the VM, but they have hardware
> assistance down at the lowest level and rather extensive structures to track
> status. Furthermore, the memory is organized into blocks and that makes it a
> lot easier to monitor than strided memory. In fact, I think you might want
> to set up your own memory subsystem and have the arrays sit on top of that.

In fact, on many systems, using malloc on large contiguous blocks of
memory returns a freshly-mmaped region. It's possible that with a
little deviousness (and, sadly, some system-specific code) one could
arrange to allocate some arrays in a way that would trigger
modification-count updating by the VM system. If you're serious about
detecting modifications, this sort of thing may be the only way to go
- a modification-detection system that misses some modifications might
be worse than none at all.

An internal numpy setup is going to be a nightmare even if all you
have to worry about is views and you're willing to allow
non-overlapping views to count as modifying each other - you'd have to
add a modification count to the ultimate base array (the one whose
deletion triggers disposal of the memory arena), and then every
modification to a view would have to walk the linked list of views all
the way up to the top to increment the modification counter. You'll
also be triggering increments of the modification counter on all sorts
of non-modifications that occur in C code. Doable, but a huge job for
dubious benefit.

Anne



More information about the NumPy-Discussion mailing list