On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small).
It seems we have a lot of duplicated effort here. Pernonally, I have two needs: 1) ragged arrays 2) "growable" arrays. I have semi-complete version of both of these, which are completely independent -- not sure if it makes sense to combine them, I suppose not. But we've talked a bit about "typed list", and I'm not sure what that means -- is it something that is entirely like a python list, except that all the elements have the same type? Anyway: I've been thinking about this fromt eh opposite direction: I want a numpy array that you can append/extend. This comes from the fact that it's not uncommon to need to build up an array where you don't know how large it will be when you start. The common recommendation for doing that now is to built it up in a list, and then, when you are done, turn it into an ndarray. But that means you are limited to python types (or putting numpy scalars in a list...), and it's not very memory efficient. My version used a ndarray internally, and over allocates it a bit, using ndarray.resize() to resize. this means that you can get the data pointer if you want for Cython, etc... but also that it's getting re-allocated, so that pointer is fragile, and you don't want other arrays to have views on it. Interestingly, if you are adding one float, for example, at a time to the array, it's actually a bit faster to build it up in a list, and then make an array out of it. But it is more memory efficient and faster if you are using numpy dtypes and especially if you are extend()ing it with chunks from other arrays. I also have a not-quite finished version in Cython that statically handles the core C data types -- that should be faster, but I haven't really profiled it. I'll try to get the code up on gitHub. It would be nice to combine efforts. -CHB
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002
I realize I did not answer all Chris' questions:
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item) [0] [1 2] [3 4 5] [6 7 8 9]
print (type(L.data)) <class 'numpy.ndarray'> print(L.data.dtype) int64 print(L.data.shape) (10,)
I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
L._data *= 2 print(L) [[0], [4 8], [12 16 20], [24 28 32 36]]
On 23 Dec 2015, at 09:34, Stephan Hoyer <shoyer@gmail.com> wrote:
We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker <chris.barker@noaa.gov> wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?
- can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L) [[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array -- but what do you get when you do:
for row in L: print row
print(L.data) [0 1 2 3 4 5 6 7 8
is .data a regular old 1-d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L) [[0 1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
-CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0]) 'Hello' L[1] = "brave new world" print(L) ['Hello', 'brave new world', '!']
Nicolas
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov