I've coded a typed dynamic list based on numpy array (needed for the glumpy project). Code is available from https://github.com/rougier/numpylist
A Numpy array list is a strongly typed list whose type can be anything that can be interpreted as a numpy data type.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9] You can add several items at once by specifying common or individual size: a single scalar means all items are the same size while a list of sizes is used to specify individual item sizes.
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9] You can also us typed list for storing strings with different sizes:
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl eof halfbaked projects of mine. But  what does "dynamic" mean?
 can you append to these arrays?  can it support "ragged arrrays"  it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array  but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1d numpy array?
L = ArrayList( np.arange(10), [3,3,4])
print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
CHB
L = ArrayList(["Hello", "world", "!"])
print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
Yes, you can append/insert/remove items. It works pretty much like a python list in fact (but with a single data type for all elements).
Nicolas
On 22 Dec 2015, at 20:19, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl eof halfbaked projects of mine. But  what does "dynamic" mean?
 can you append to these arrays?
 can it support "ragged arrrays"  it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array  but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete  it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length:
https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers,
Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but.... This sounds like it could be really useful, and maybe supercise a coupl eof halfbaked projects of mine. But  what does "dynamic" mean?
 can you append to these arrays?
 can it support "ragged arrrays"  it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array  but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1d numpy array?
L = ArrayList( np.arange(10), [3,3,4])
print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5 L* some_array in which case, how does it do broadcasting??? Thanks, CHB
L = ArrayList(["Hello", "world", "!"])
print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
 Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small).
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002
I realize I did not answer all Chris' questions:
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item)
[0] [1 2] [3 4 5] [6 7 8 9]
print (type(L.data))
<class 'numpy.ndarray'>
print(L.data.dtype)
int64
print(L.data.shape)
(10,)
I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
L._data *= 2 print(L)
[[0], [4 8], [12 16 20], [24 28 32 36]]
On 23 Dec 2015, at 09:34, Stephan Hoyer shoyer@gmail.com wrote:
We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete  it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl eof halfbaked projects of mine. But  what does "dynamic" mean?
 can you append to these arrays?
 can it support "ragged arrrays"  it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array  but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small).
It seems we have a lot of duplicated effort here. Pernonally, I have two needs:
1) ragged arrays 2) "growable" arrays.
I have semicomplete version of both of these, which are completely independent  not sure if it makes sense to combine them, I suppose not.
But we've talked a bit about "typed list", and I'm not sure what that means  is it something that is entirely like a python list, except that all the elements have the same type?
Anyway: I've been thinking about this fromt eh opposite direction: I want a numpy array that you can append/extend. This comes from the fact that it's not uncommon to need to build up an array where you don't know how large it will be when you start. The common recommendation for doing that now is to built it up in a list, and then, when you are done, turn it into an ndarray.
But that means you are limited to python types (or putting numpy scalars in a list...), and it's not very memory efficient.
My version used a ndarray internally, and over allocates it a bit, using ndarray.resize() to resize. this means that you can get the data pointer if you want for Cython, etc... but also that it's getting reallocated, so that pointer is fragile, and you don't want other arrays to have views on it.
Interestingly, if you are adding one float, for example, at a time to the array, it's actually a bit faster to build it up in a list, and then make an array out of it.
But it is more memory efficient and faster if you are using numpy dtypes and especially if you are extend()ing it with chunks from other arrays.
I also have a notquite finished version in Cython that statically handles the core C data types  that should be faster, but I haven't really profiled it.
I'll try to get the code up on gitHub.
It would be nice to combine efforts.
CHB
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002
I realize I did not answer all Chris' questions:
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item)
[0] [1 2] [3 4 5] [6 7 8 9]
print (type(L.data))
<class 'numpy.ndarray'>
print(L.data.dtype)
int64
print(L.data.shape)
(10,)
I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
L._data *= 2 print(L)
[[0], [4 8], [12 16 20], [24 28 32 36]]
On 23 Dec 2015, at 09:34, Stephan Hoyer shoyer@gmail.com wrote:
We have a type similar to this (a typed list) internally in pandas,
although it is restricted to a single dimension and far from feature complete  it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length:
https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list
from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov
wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl
eof halfbaked projects of mine. But  what does "dynamic" mean?
 can you append to these arrays?
 can it support "ragged arrrays"  it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array  but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
On Thu, Dec 24, 2015 at 10:19 AM, Chris Barker chris.barker@noaa.gov wrote:
I'll try to get the code up on gitHub.
Hey look  it's already there:
https://github.com/PythonCHB/NumpyExtras
too many gitHub accounts.....
Here is the list/growable array/ accumulator:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...
And here is the ragged array:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/ragged_arr...
I haven't touched either of these for a while  not really sure what state they are in.
CHB
It would be nice to combine efforts.
CHB
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002
I realize I did not answer all Chris' questions:
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item)
[0] [1 2] [3 4 5] [6 7 8 9]
print (type(L.data))
<class 'numpy.ndarray'>
print(L.data.dtype)
int64
print(L.data.shape)
(10,)
I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
L._data *= 2 print(L)
[[0], [4 8], [12 16 20], [24 28 32 36]]
On 23 Dec 2015, at 09:34, Stephan Hoyer shoyer@gmail.com wrote:
We have a type similar to this (a typed list) internally in pandas,
although it is restricted to a single dimension and far from feature complete  it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length:
https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list
from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov
wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl
eof halfbaked projects of mine. But  what does "dynamic" mean?
 can you append to these arrays?
 can it support "ragged arrrays"  it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array  but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov
On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854
are you preallocating any extra space? if not  it's going to be really, really pokey when adding a little bit at a time.
With my Accumulator class:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...
I preallocate a larger numpy array to start, and it gets reallocated, with some extra, when filled, using ndarray.resize()
this is quite fast.
These are settable parameters in the class:
DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16  that seems small to me.
I looked at the code in array.array (and list, I think), and it does stuff to optimize very small arrays, which I figured wasn't the usecase here :)
But I did a bunch of experimentation, and as long as you preallocate _some_ it doesn't make much difference how much :)
BTW,
I just went in an updated and tested the Accumulator class code  it needed some tweaks, but it's working now.
The cython version is in an unknown state...
some profiling:
In [11]: run profile_accumulator.py
In [12]: timeit accum1(10000)
100 loops, best of 3: 3.91 ms per loop
In [13]: timeit list1(10000)
1000 loops, best of 3: 1.15 ms per loop
These are simply appending 10,000 integers in a loop  with teh list, the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3  I think this is because you are staring with a python integer  with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything.
Then the conversion from list to array is all in C.
Note that the accumulator version is still more memory efficient...
In [14]: timeit accum2(10000)
100 loops, best of 3: 3.84 ms per loop
this version preallocated the whole internal buffer  not much faster the buffer reallocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array)
In [24]: timeit list_extend1(100000)
100 loops, best of 3: 4.15 ms per loop
In [25]: timeit accum_extend1(100000)
1000 loops, best of 3: 1.37 ms per loop
This time, the stuff is added in chunks 100 elements at a time  the chunks being created ahead of time  a list with range() the first time, and an array with arange() the second. much faster to extend with arrays...
CHB
On 28 Dec 2015, at 19:58, Chris Barker chris.barker@noaa.gov wrote:
On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier Nicolas.Rougier@inria.fr wrote: But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854
are you preallocating any extra space? if not  it's going to be really, really pokey when adding a little bit at a time.
Yes, I’m preallocating but it might not be optimal at all given your implementation is much faster. I’ll try to adapt your code. Thanks.
With my Accumulator class:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...
I preallocate a larger numpy array to start, and it gets reallocated, with some extra, when filled, using ndarray.resize()
this is quite fast.
These are settable parameters in the class:
DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16  that seems small to me.
I looked at the code in array.array (and list, I think), and it does stuff to optimize very small arrays, which I figured wasn't the usecase here :)
But I did a bunch of experimentation, and as long as you preallocate _some_ it doesn't make much difference how much :)
BTW,
I just went in an updated and tested the Accumulator class code  it needed some tweaks, but it's working now.
The cython version is in an unknown state...
some profiling:
In [11]: run profile_accumulator.py
In [12]: timeit accum1(10000)
100 loops, best of 3: 3.91 ms per loop
In [13]: timeit list1(10000)
1000 loops, best of 3: 1.15 ms per loop
These are simply appending 10,000 integers in a loop  with teh list, the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3  I think this is because you are staring with a python integer  with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything.
Then the conversion from list to array is all in C.
Note that the accumulator version is still more memory efficient...
In [14]: timeit accum2(10000)
100 loops, best of 3: 3.84 ms per loop
this version preallocated the whole internal buffer  not much faster the buffer reallocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array)
In [24]: timeit list_extend1(100000)
100 loops, best of 3: 4.15 ms per loop
In [25]: timeit accum_extend1(100000)
1000 loops, best of 3: 1.37 ms per loop
This time, the stuff is added in chunks 100 elements at a time  the chunks being created ahead of time  a list with range() the first time, and an array with arange() the second. much faster to extend with arrays...
CHB

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
On Wed, Dec 30, 2015 at 6:34 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
On 28 Dec 2015, at 19:58, Chris Barker chris.barker@noaa.gov wrote:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854
are you preallocating any extra space? if not  it's going to be
really, really pokey when adding a little bit at a time.
Yes, I’m preallocating but it might not be optimal at all given your implementation is much faster. I’ll try to adapt your code. Thanks.
sounds good  I'll try to take a look at yours soon  maybe we can merge the projects. MIne is only operational in one small place, I think.
CHB
With my Accumulator class:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...
I preallocate a larger numpy array to start, and it gets reallocated,
with some extra, when filled, using ndarray.resize()
this is quite fast.
These are settable parameters in the class:
DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16  that seems small
to me.
I looked at the code in array.array (and list, I think), and it does
stuff to optimize very small arrays, which I figured wasn't the usecase here :)
But I did a bunch of experimentation, and as long as you preallocate
_some_ it doesn't make much difference how much :)
BTW,
I just went in an updated and tested the Accumulator class code  it
needed some tweaks, but it's working now.
The cython version is in an unknown state...
some profiling:
In [11]: run profile_accumulator.py
In [12]: timeit accum1(10000)
100 loops, best of 3: 3.91 ms per loop
In [13]: timeit list1(10000)
1000 loops, best of 3: 1.15 ms per loop
These are simply appending 10,000 integers in a loop  with teh list,
the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3  I think this is because you are staring with a python integer  with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything.
Then the conversion from list to array is all in C.
Note that the accumulator version is still more memory efficient...
In [14]: timeit accum2(10000)
100 loops, best of 3: 3.84 ms per loop
this version preallocated the whole internal buffer  not much faster
the buffer reallocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array)
In [24]: timeit list_extend1(100000)
100 loops, best of 3: 4.15 ms per loop
In [25]: timeit accum_extend1(100000)
1000 loops, best of 3: 1.37 ms per loop
This time, the stuff is added in chunks 100 elements at a time  the
chunks being created ahead of time  a list with range() the first time, and an array with arange() the second. much faster to extend with arrays...
CHB

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
On Mi, 20151223 at 00:34 0800, Stephan Hoyer wrote:
We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete  it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
Probably is a bit orthogonal since I guess you want/need cython, but pythons buildin array.array should get you there pretty much as well. Of course it requires the C typecode (though that should not be hard to get) and does not support strings.
 Sebastian
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but.... This sounds like it could be really useful, and maybe supercise a coupl eof halfbaked projects of mine. But  what does "dynamic" mean?  can you append to these arrays?  can it support "ragged arrrays"  it looks like it does. >>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) >>> print(L) [[0], [1 2], [3 4 5], [6 7 8 9]] so this looks like a ragged array  but what do you get when you do: for row in L: print row >>> print(L.data) [0 1 2 3 4 5 6 7 8 is .data a regular old 1d numpy array? >>> L = ArrayList( np.arange(10), [3,3,4]) >>> print(L) [[0 1 2], [3 4 5], [6 7 8 9]] >>> print(L.data) [0 1 2 3 4 5 6 7 8 9] does an ArrayList act like a numpy array in other ways: L * 5 L* some_array in which case, how does it do broadcasting??? Thanks, CHB >>> L = ArrayList(["Hello", "world", "!"]) >>> print(L[0]) 'Hello' >>> L[1] = "brave new world" >>> print(L) ['Hello', 'brave new world', '!'] Nicolas _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion  Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
On Wed, Dec 23, 2015 at 4:31 AM, Sebastian Berg sebastian@sipsolutions.net wrote:
Probably is a bit orthogonal since I guess you want/need cython, but pythons builtin array.array should get you there pretty much as well.
I don't think it's orthogonal to cython  you can access an array.array directly from within cython  it's actually about the easiest way to get a arraylike object in Cython/C (which you can then access via a memoryview, etc).
Though I don't know there is a python object (i.e. pointer) option there. (nor text).
CHB
Of course it requires the C typecode (though that should not be hard to get) and does not support strings.
 Sebastian
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but.... This sounds like it could be really useful, and maybe supercise a coupl eof halfbaked projects of mine. But  what does "dynamic" mean?  can you append to these arrays?  can it support "ragged arrrays"  it looks like it does. >>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) >>> print(L) [[0], [1 2], [3 4 5], [6 7 8 9]] so this looks like a ragged array  but what do you get when you do: for row in L: print row >>> print(L.data) [0 1 2 3 4 5 6 7 8 is .data a regular old 1d numpy array? >>> L = ArrayList( np.arange(10), [3,3,4]) >>> print(L) [[0 1 2], [3 4 5], [6 7 8 9]] >>> print(L.data) [0 1 2 3 4 5 6 7 8 9] does an ArrayList act like a numpy array in other ways: L * 5 L* some_array in which case, how does it do broadcasting??? Thanks, CHB >>> L = ArrayList(["Hello", "world", "!"]) >>> print(L[0]) 'Hello' >>> L[1] = "brave new world" >>> print(L) ['Hello', 'brave new world', '!'] Nicolas _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion  Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (4)

Chris Barker

Nicolas P. Rougier

Sebastian Berg

Stephan Hoyer