[SciPy-User] Uniquely identify array

Anne Archibald aarchiba at physics.mcgill.ca
Tue Jul 19 12:22:19 EDT 2011


This is a little more subtle than it sounds. Most python objects can
be compared for identity with "is" (e.g. "if x is None:"). This tests
for pointer equality, that is, it confirms that you have the same
dynamically-allocated heap object. This will work for arrays, but it
might be too specific for what you want: a numpy array actually
consists of two heap objects, a python object that describes the
array, and a memory arena. Slicing operations like A[::-1] are fast
because while they create a new python object, the memory arena is
untouched. So you need to decide whether what you care about is any
change at all to the array, or whether what you care about is whether
a new memory arena has been allocated.

A brief aside: people often think they care about allocation of new
arrays, but in most cases they're mistaken. malloc() is an extremely
fast operation, especially for large arrays, in which case it's
usually a direct call to the OS's mmap (and free really does free the
memory back to the system). If what you're worried about is that your
code is slower than it should be, making sure there are no extra
allocations is not the best place to look. In-place operations have
their own limitations, things like cache-coherency issues and cache
efficiency of strided memory access. This is not theoretical: I had
some code, a few years ago, that manipulated large arrays and was
slow. So I painstakingly went through and made it use in-place
operations where possible and avoid malloc()ing new arrays. Not only
did it get slower, the memory usage increased.

On the other hand, if you want to know whether you're getting slices
that allow you to modify the original array or freshly-allocated
arenas, the bluntest available instrument is to write to the one and
see if the other changes. There are some more subtle approaches that
are a little approximate, things like checking the address of the
memory arena, or the equality of the base numpy array object (be
warned that you have to traverse a tree of up pointers to get this
last). I say approximate because while A[::2] and A[1::2] share a
memory arena, and even have overlapping extents, you can modify them
independently of each other.

In short, you need to think hard about exactly what you're testing
for. But for unit tests I recommend using modifications to test for
memory sharing.

Anne

On 19 July 2011 12:04, Chris Weisiger <cweisiger at msg.ucsf.edu> wrote:
> Is there some way in Python to uniquely identify a given Numpy array?
> E.g. to get a pointer to its location in memory or something similar?
> I'm looking for some way to determine which operations will implicitly
> create new arrays, just to verify that I'm not doing anything that
> will seriously hurt my performance -- but this seems like something
> that would be generally useful to know.
>
> Unfortunately ndarrays don't allow arbitrary additions to their
> namespace; no doing "foo.myUniqueIdentifier = 1", for example.
>
> Thanks in advance!
>
> -Chris
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list