[SciPy-User] Uniquely identify array

Chris Weisiger cweisiger at msg.ucsf.edu
Tue Jul 19 12:30:28 EDT 2011


Thanks for the detailed response, both you and Robert Kern. My
immediate problem is not especially significant; I have three arrays:
one of data D, one of additive offsets O, and one of multiplicative
modifiers M. The first is of ints and the latter two of floats, and I
want to get D * M - O as ints and shove them into an existing buffer.
This is not an especially expensive operation (the dataset is
512x512), but I found myself curious about what it's doing behind the
scenes, and the most straightforward way I know of to track that kind
of thing is to track allocations. I don't expect it would make a big
difference to hyper-optimize this problem, but in the future I may
need tighter code in some other application, and I'd rather know now
than potentially go down a wrong path later.

I know more now about what Numpy's doing than I did before this
thread. Thanks for the prompt and detailed responses. :)

-Chris

On Tue, Jul 19, 2011 at 9:22 AM, Anne Archibald
<aarchiba at physics.mcgill.ca> wrote:
> This is a little more subtle than it sounds. Most python objects can
> be compared for identity with "is" (e.g. "if x is None:"). This tests
> for pointer equality, that is, it confirms that you have the same
> dynamically-allocated heap object. This will work for arrays, but it
> might be too specific for what you want: a numpy array actually
> consists of two heap objects, a python object that describes the
> array, and a memory arena. Slicing operations like A[::-1] are fast
> because while they create a new python object, the memory arena is
> untouched. So you need to decide whether what you care about is any
> change at all to the array, or whether what you care about is whether
> a new memory arena has been allocated.
>
> A brief aside: people often think they care about allocation of new
> arrays, but in most cases they're mistaken. malloc() is an extremely
> fast operation, especially for large arrays, in which case it's
> usually a direct call to the OS's mmap (and free really does free the
> memory back to the system). If what you're worried about is that your
> code is slower than it should be, making sure there are no extra
> allocations is not the best place to look. In-place operations have
> their own limitations, things like cache-coherency issues and cache
> efficiency of strided memory access. This is not theoretical: I had
> some code, a few years ago, that manipulated large arrays and was
> slow. So I painstakingly went through and made it use in-place
> operations where possible and avoid malloc()ing new arrays. Not only
> did it get slower, the memory usage increased.
>
> On the other hand, if you want to know whether you're getting slices
> that allow you to modify the original array or freshly-allocated
> arenas, the bluntest available instrument is to write to the one and
> see if the other changes. There are some more subtle approaches that
> are a little approximate, things like checking the address of the
> memory arena, or the equality of the base numpy array object (be
> warned that you have to traverse a tree of up pointers to get this
> last). I say approximate because while A[::2] and A[1::2] share a
> memory arena, and even have overlapping extents, you can modify them
> independently of each other.
>
> In short, you need to think hard about exactly what you're testing
> for. But for unit tests I recommend using modifications to test for
> memory sharing.
>
> Anne
>
> On 19 July 2011 12:04, Chris Weisiger <cweisiger at msg.ucsf.edu> wrote:
>> Is there some way in Python to uniquely identify a given Numpy array?
>> E.g. to get a pointer to its location in memory or something similar?
>> I'm looking for some way to determine which operations will implicitly
>> create new arrays, just to verify that I'm not doing anything that
>> will seriously hurt my performance -- but this seems like something
>> that would be generally useful to know.
>>
>> Unfortunately ndarrays don't allow arbitrary additions to their
>> namespace; no doing "foo.myUniqueIdentifier = 1", for example.
>>
>> Thanks in advance!
>>
>> -Chris
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list