[Numpy-discussion] Dynamic array list implementation

Nicolas P. Rougier Nicolas.Rougier at inria.fr
Wed Dec 23 07:01:25 EST 2015


Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small).

In my case I need to ensure a contiguous storage to allow easy upload onto the GPU.
But my implementation is quite slow, especially when you add one item at a time:

>>> python benchmark.py
Python list, append 100000 items: 0.01161
Array list, append 100000 items: 0.46854
Array list, append 100000 items at once: 0.05801
Python list, prepend 100000 items: 1.96168
Array list, prepend 100000 items: 12.83371
Array list, append 100000 items at once: 0.06002



I realize I did not answer all Chris' questions:

>>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] )
>>> for item in L: print(item)
[0]
[1 2]
[3 4 5]
[6 7 8 9]

>>> print (type(L.data))
<class 'numpy.ndarray'>
>>> print(L.data.dtype)
int64
>>> print(L.data.shape)
(10,)


I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
>>> L._data *= 2
>>> print(L)
[[0], [4 8], [12 16 20], [24 28 32 36]]



> On 23 Dec 2015, at 09:34, Stephan Hoyer <shoyer at gmail.com> wrote:
> 
> We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length:
> https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
> 
> In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
> 
> Cheers,
> Stephan
> 
> 
> 
> On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker <chris.barker at noaa.gov> wrote:
> 
> sorry for being so lazy as to not go look at the project pages, but....
> 
> This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?
> 
> - can you append to these arrays?
> - can it support "ragged arrrays" -- it looks like it does.
> 
> >>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] )
> >>> print(L)
> [[0], [1 2], [3 4 5], [6 7 8 9]]
> 
> so this looks like a ragged array -- but what do you get when you do:
> 
> for row in L:
>     print row
> 
>  
> >>> print(L.data)
> [0 1 2 3 4 5 6 7 8 
> 
> is .data a regular old 1-d numpy array?
> 
> >>> L = ArrayList( np.arange(10), [3,3,4])
> >>> print(L)
> [[0 1 2], [3 4 5], [6 7 8 9]]
> >>> print(L.data)
> [0 1 2 3 4 5 6 7 8 9]
> 
> 
> does an ArrayList act like a numpy array in other ways:
> 
> L * 5
> 
> L* some_array
> 
> in which case, how does it do broadcasting???
> 
> Thanks,
> 
> -CHB
> 
> >>> L = ArrayList(["Hello", "world", "!"])
> >>> print(L[0])
> 'Hello'
> >>> L[1] = "brave new world"
> >>> print(L)
> ['Hello', 'brave new world', '!']
> 
> 
> 
> Nicolas
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> -- 
> 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
> 
> Chris.Barker at noaa.gov
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list