[Numpy-discussion] finding close together points.
Christopher Barker
Chris.Barker at noaa.gov
Thu Nov 12 12:32:50 EST 2009
David Cournapeau wrote:
> I would love having a core C library of containers -
I'm all for that. However, I think that a very, very common need is
simply for a growable numpy array.
It seems this would actually be pretty darn easy (again, for someone
familiar with the code!).
IIUC, it would be exactly like a regular old ndarray, except:
1) The data block could be larger than the current size of the array:
- this would require an additional attribute to carry that size
2) There would be an "append" method, or some other name:
- this would increase the size of the array, manipulating
.dimensions, and, if needed, doing a reallocation of the data pointer.
- This is not very complex code, really, you simply allocate more
than required, choosing some (user-resettable?) scale: 25% more than
before, or whatever. re-allocating memory is actually pretty cheap these
days, so it's not all that critical how you do it.
3) we'd probably want a .fit() method that would resize the data block
to fit the current size of the array.
In fact, as I write this, I wonder if these features could simply be
added to the regular old ndarray -- if not used, they wouldn't be much
overhead.
Issues:
- views on the same data block: as the data pointer would need to be
re-allocated, you couldn't have views on the data block. Maybe this
could be handles in the same way that ndarray.resize is handles now:
you'd get an error if you try to resize an array that has a view to its
data block. This is a bit tricky, as it's really easy to create a view
without thinking about it (like by using ipython, for instance)
- You could only grow the array in the first dimension (C-order,
anyway). I think it could still be very useful, handling many of the
common cases.
- if you append to an array of more than one dimension, do you need to
give the entire appended slice? Again, restrictive, but still useful.
- Others? -- I'm not thinking of any right now.
David Cournapeau wrote:
> If you need to manipulate
> core datatypes (float, etc...), I think python lists are a significant
> cost, because of pointer chasing in particular.
Exactly, and add custom numpy dtypes as well. Putting python types
and/or tuples of python types into a list can be far more memory
intensive than putting them in a numpy array.
That's why I built a python implementation of a growable array -- it
does save memory, but it's a fair bit slower than lists -- when adding
to it in python, you've got python types already, so the additional
overhead of an extra function call, etc. is substantial. That's why I'd
like a C version.
Even more important is having one that is not only written in C (or
Cython) but can be accessed directly from C (or Cython), so you can fill
an array with core data types efficiently. Going from C-type to python
object to put it in a list is substantial overhead.
> Stefan used some tricks to avoid using python list and use basic C
> structures in some custom code.
Exactly -- and he's not he only one -- we're all re-inventing the wheel
here.
By the way -- what does the fromstring/fromfile code do when you don't
give it a size to start with?
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list