[Numpy-discussion] RFC: Detecting array changes (NumPy 2.0?)
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Fri Mar 11 17:36:29 EST 2011
On 03/11/2011 10:04 PM, Anne Archibald wrote:
> On 11 March 2011 15:34, Charles R Harris<charlesr.harris at gmail.com> wrote:
>> On Fri, Mar 11, 2011 at 1:06 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On Fri, 11 Mar 2011 19:37:42 +0000 (UTC), Pauli Virtanen<pav at iki.fi>
>>>> On Fri, 11 Mar 2011 11:47:58 -0700, Charles R Harris wrote:
>>>>> What about views? Wouldn't it be easier to write another object
>>>>> an ndarray?
>>>> I think the buffer interfaces and all other various ways Numpy
>>>> exports for arrays make keeping tabs on modification impossible to do
>>>> completely reliably.
>>> Not to mention all the pain of making sure the arrays are wrapped and
>>> stay wrapped in the first place. In particular in combination with other
>>> array wrappers.
>>> I wasn't saying this is absolutely needed, just that it'd be a really
>>> convenient feature helpful for caching. Sometimes, introducing fast
>>> caching this way can remove a lot of logic from the code. Introducing a
>>> Python-space visible wrapper object kind of defeats the purpose for me.
>> Well, starting with a wrapped object would allow you to experiment and
>> discover what it is you really need. A smallish specialized object is
>> probably a better starting point for development than a big solution.
>> Operating systems do this sort of thing with the VM, but they have hardware
>> assistance down at the lowest level and rather extensive structures to track
>> status. Furthermore, the memory is organized into blocks and that makes it a
>> lot easier to monitor than strided memory. In fact, I think you might want
>> to set up your own memory subsystem and have the arrays sit on top of that.
> In fact, on many systems, using malloc on large contiguous blocks of
> memory returns a freshly-mmaped region. It's possible that with a
> little deviousness (and, sadly, some system-specific code) one could
> arrange to allocate some arrays in a way that would trigger
> modification-count updating by the VM system. If you're serious about
> detecting modifications, this sort of thing may be the only way to go
> - a modification-detection system that misses some modifications might
> be worse than none at all.
> An internal numpy setup is going to be a nightmare even if all you
> have to worry about is views and you're willing to allow
> non-overlapping views to count as modifying each other - you'd have to
> add a modification count to the ultimate base array (the one whose
> deletion triggers disposal of the memory arena), and then every
> modification to a view would have to walk the linked list of views all
> the way up to the top to increment the modification counter. You'll
> also be triggering increments of the modification counter on all sorts
> of non-modifications that occur in C code. Doable, but a huge job for
> dubious benefit.
Yes, you are right. For instance PEP-3118 makes it rather natural to
hold only the data pointer and object refcount for some time and only
modify the data later, and things like that can't be coded around in
NumPy no matter the effort.
Thanks for your sobering comments. I'll just keep using explicit
mechanisms in my program.
(I didn't know about the VM modification counting, but wasn't able to
find much on Google either. At any rate that is definitely overkill here.)
More information about the NumPy-Discussion