On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier <Nicolas.Rougier@inria.fr> wrote:
Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small).

It seems we have a lot of duplicated effort here. Pernonally, I have two needs:

1) ragged arrays
2) "growable" arrays. 

I have semi-complete version of both of these, which are completely independent -- not sure if it makes sense to combine them, I suppose not.

But we've talked a bit about  "typed list", and I'm not sure what that means -- is it something that is entirely like a python list, except that all the elements have the same type? 

Anyway: I've been thinking about this fromt eh opposite direction: I want a numpy array that you can append/extend. This comes from the fact that it's not uncommon to need to build up an array where you don't know how large it will be when you start. The common recommendation for doing that now is to built it up in a list, and then, when you are done, turn it into an ndarray.

But that means you are limited to python types (or putting numpy scalars in a list...), and it's not very memory efficient.

My version used a ndarray internally, and over allocates it a bit, using ndarray.resize() to resize. this means that you can get the data pointer if you want for Cython, etc... but also that it's getting re-allocated, so that pointer is fragile, and you don't want other arrays to have views on it.

Interestingly, if you are adding one float, for example, at a time to the array, it's actually a bit faster to build it up in a list, and then make an array out of it.

But it is more memory efficient and faster if you are using numpy dtypes and especially if you are extend()ing it with chunks from other arrays.

I also have a not-quite finished version in Cython that statically handles the core C data types -- that should be faster, but I haven't really profiled it.

I'll try to get the code up on gitHub.

It would be nice to combine efforts.

-CHB














 
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU.
But my implementation is quite slow, especially when you add one item at a time:

>>> python benchmark.py
Python list, append 100000 items: 0.01161
Array list, append 100000 items: 0.46854
Array list, append 100000 items at once: 0.05801
Python list, prepend 100000 items: 1.96168
Array list, prepend 100000 items: 12.83371
Array list, append 100000 items at once: 0.06002



I realize I did not answer all Chris' questions:

>>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] )
>>> for item in L: print(item)
[0]
[1 2]
[3 4 5]
[6 7 8 9]

>>> print (type(L.data))
<class 'numpy.ndarray'>
>>> print(L.data.dtype)
int64
>>> print(L.data.shape)
(10,)


I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
>>> L._data *= 2
>>> print(L)
[[0], [4 8], [12 16 20], [24 28 32 36]]



> On 23 Dec 2015, at 09:34, Stephan Hoyer <shoyer@gmail.com> wrote:
>
> We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length:
> https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
>
> In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
>
> Cheers,
> Stephan
>
>
>
> On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker <chris.barker@noaa.gov> wrote:
>
> sorry for being so lazy as to not go look at the project pages, but....
>
> This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?
>
> - can you append to these arrays?
> - can it support "ragged arrrays" -- it looks like it does.
>
> >>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] )
> >>> print(L)
> [[0], [1 2], [3 4 5], [6 7 8 9]]
>
> so this looks like a ragged array -- but what do you get when you do:
>
> for row in L:
>     print row
>
>
> >>> print(L.data)
> [0 1 2 3 4 5 6 7 8
>
> is .data a regular old 1-d numpy array?
>
> >>> L = ArrayList( np.arange(10), [3,3,4])
> >>> print(L)
> [[0 1 2], [3 4 5], [6 7 8 9]]
> >>> print(L.data)
> [0 1 2 3 4 5 6 7 8 9]
>
>
> does an ArrayList act like a numpy array in other ways:
>
> L * 5
>
> L* some_array
>
> in which case, how does it do broadcasting???
>
> Thanks,
>
> -CHB
>
> >>> L = ArrayList(["Hello", "world", "!"])
> >>> print(L[0])
> 'Hello'
> >>> L[1] = "brave new world"
> >>> print(L)
> ['Hello', 'brave new world', '!']
>
>
>
> Nicolas
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@noaa.gov
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov