I've coded a typed dynamic list based on numpy array (needed for the glumpy project). Code is available from https://github.com/rougier/numpy-list
A Numpy array list is a strongly typed list whose type can be anything that can be interpreted as a numpy data type.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9] You can add several items at once by specifying common or individual size: a single scalar means all items are the same size while a list of sizes is used to specify individual item sizes.
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9] You can also us typed list for storing strings with different sizes:
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?
- can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array -- but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1-d numpy array?
L = ArrayList( np.arange(10), [3,3,4])
print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
-CHB
L = ArrayList(["Hello", "world", "!"])
print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Yes, you can append/insert/remove items. It works pretty much like a python list in fact (but with a single data type for all elements).
Nicolas
On 22 Dec 2015, at 20:19, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?
- can you append to these arrays?
- can it support "ragged arrrays" -- it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array -- but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1-d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
-CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length:
https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers,
Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but.... This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?
- can you append to these arrays?
- can it support "ragged arrrays" -- it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array -- but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1-d numpy array?
L = ArrayList( np.arange(10), [3,3,4])
print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5 L* some_array in which case, how does it do broadcasting??? Thanks, -CHB
L = ArrayList(["Hello", "world", "!"])
print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small).
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002
I realize I did not answer all Chris' questions:
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item)
[0] [1 2] [3 4 5] [6 7 8 9]
print (type(L.data))
<class 'numpy.ndarray'>
print(L.data.dtype)
int64
print(L.data.shape)
(10,)
I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
L._data *= 2 print(L)
[[0], [4 8], [12 16 20], [24 28 32 36]]
On 23 Dec 2015, at 09:34, Stephan Hoyer shoyer@gmail.com wrote:
We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?
- can you append to these arrays?
- can it support "ragged arrrays" -- it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array -- but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1-d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
-CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small).
It seems we have a lot of duplicated effort here. Pernonally, I have two needs:
1) ragged arrays 2) "growable" arrays.
I have semi-complete version of both of these, which are completely independent -- not sure if it makes sense to combine them, I suppose not.
But we've talked a bit about "typed list", and I'm not sure what that means -- is it something that is entirely like a python list, except that all the elements have the same type?
Anyway: I've been thinking about this fromt eh opposite direction: I want a numpy array that you can append/extend. This comes from the fact that it's not uncommon to need to build up an array where you don't know how large it will be when you start. The common recommendation for doing that now is to built it up in a list, and then, when you are done, turn it into an ndarray.
But that means you are limited to python types (or putting numpy scalars in a list...), and it's not very memory efficient.
My version used a ndarray internally, and over allocates it a bit, using ndarray.resize() to resize. this means that you can get the data pointer if you want for Cython, etc... but also that it's getting re-allocated, so that pointer is fragile, and you don't want other arrays to have views on it.
Interestingly, if you are adding one float, for example, at a time to the array, it's actually a bit faster to build it up in a list, and then make an array out of it.
But it is more memory efficient and faster if you are using numpy dtypes and especially if you are extend()ing it with chunks from other arrays.
I also have a not-quite finished version in Cython that statically handles the core C data types -- that should be faster, but I haven't really profiled it.
I'll try to get the code up on gitHub.
It would be nice to combine efforts.
-CHB
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002
I realize I did not answer all Chris' questions:
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item)
[0] [1 2] [3 4 5] [6 7 8 9]
print (type(L.data))
<class 'numpy.ndarray'>
print(L.data.dtype)
int64
print(L.data.shape)
(10,)
I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
L._data *= 2 print(L)
[[0], [4 8], [12 16 20], [24 28 32 36]]
On 23 Dec 2015, at 09:34, Stephan Hoyer shoyer@gmail.com wrote:
We have a type similar to this (a typed list) internally in pandas,
although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length:
https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list
from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov
wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl
eof half-baked projects of mine. But -- what does "dynamic" mean?
- can you append to these arrays?
- can it support "ragged arrrays" -- it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array -- but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1-d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
-CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Dec 24, 2015 at 10:19 AM, Chris Barker chris.barker@noaa.gov wrote:
I'll try to get the code up on gitHub.
Hey look -- it's already there:
https://github.com/PythonCHB/NumpyExtras
too many gitHub accounts.....
Here is the list/growable array/ accumulator:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...
And here is the ragged array:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/ragged_arr...
I haven't touched either of these for a while -- not really sure what state they are in.
-CHB
It would be nice to combine efforts.
-CHB
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002
I realize I did not answer all Chris' questions:
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item)
[0] [1 2] [3 4 5] [6 7 8 9]
print (type(L.data))
<class 'numpy.ndarray'>
print(L.data.dtype)
int64
print(L.data.shape)
(10,)
I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.
L._data *= 2 print(L)
[[0], [4 8], [12 16 20], [24 28 32 36]]
On 23 Dec 2015, at 09:34, Stephan Hoyer shoyer@gmail.com wrote:
We have a type similar to this (a typed list) internally in pandas,
although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length:
https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
In my experience, it's several times faster than using a builtin list
from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov
wrote:
sorry for being so lazy as to not go look at the project pages, but....
This sounds like it could be really useful, and maybe supercise a coupl
eof half-baked projects of mine. But -- what does "dynamic" mean?
- can you append to these arrays?
- can it support "ragged arrrays" -- it looks like it does.
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L)
[[0], [1 2], [3 4 5], [6 7 8 9]]
so this looks like a ragged array -- but what do you get when you do:
for row in L: print row
print(L.data)
[0 1 2 3 4 5 6 7 8
is .data a regular old 1-d numpy array?
L = ArrayList( np.arange(10), [3,3,4]) print(L)
[[0 1 2], [3 4 5], [6 7 8 9]]
print(L.data)
[0 1 2 3 4 5 6 7 8 9]
does an ArrayList act like a numpy array in other ways:
L * 5
L* some_array
in which case, how does it do broadcasting???
Thanks,
-CHB
L = ArrayList(["Hello", "world", "!"]) print(L[0])
'Hello'
L[1] = "brave new world" print(L)
['Hello', 'brave new world', '!']
Nicolas
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854
are you pre-allocating any extra space? if not -- it's going to be really, really pokey when adding a little bit at a time.
With my Accumulator class:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...
I pre-allocate a larger numpy array to start, and it gets re-allocated, with some extra, when filled, using ndarray.resize()
this is quite fast.
These are settable parameters in the class:
DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16 -- that seems small to me.
I looked at the code in array.array (and list, I think), and it does stuff to optimize very small arrays, which I figured wasn't the use-case here :-)
But I did a bunch of experimentation, and as long as you pre-allocate _some_ it doesn't make much difference how much :-)
BTW,
I just went in an updated and tested the Accumulator class code -- it needed some tweaks, but it's working now.
The cython version is in an unknown state...
some profiling:
In [11]: run profile_accumulator.py
In [12]: timeit accum1(10000)
100 loops, best of 3: 3.91 ms per loop
In [13]: timeit list1(10000)
1000 loops, best of 3: 1.15 ms per loop
These are simply appending 10,000 integers in a loop -- with teh list, the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3 -- I think this is because you are staring with a python integer -- with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything.
Then the conversion from list to array is all in C.
Note that the accumulator version is still more memory efficient...
In [14]: timeit accum2(10000)
100 loops, best of 3: 3.84 ms per loop
this version pre-allocated the whole internal buffer -- not much faster the buffer re-allocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array)
In [24]: timeit list_extend1(100000)
100 loops, best of 3: 4.15 ms per loop
In [25]: timeit accum_extend1(100000)
1000 loops, best of 3: 1.37 ms per loop
This time, the stuff is added in chunks 100 elements at a time -- the chunks being created ahead of time -- a list with range() the first time, and an array with arange() the second. much faster to extend with arrays...
-CHB
On 28 Dec 2015, at 19:58, Chris Barker chris.barker@noaa.gov wrote:
On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier Nicolas.Rougier@inria.fr wrote: But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854
are you pre-allocating any extra space? if not -- it's going to be really, really pokey when adding a little bit at a time.
Yes, I’m preallocating but it might not be optimal at all given your implementation is much faster. I’ll try to adapt your code. Thanks.
With my Accumulator class:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...
I pre-allocate a larger numpy array to start, and it gets re-allocated, with some extra, when filled, using ndarray.resize()
this is quite fast.
These are settable parameters in the class:
DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16 -- that seems small to me.
I looked at the code in array.array (and list, I think), and it does stuff to optimize very small arrays, which I figured wasn't the use-case here :-)
But I did a bunch of experimentation, and as long as you pre-allocate _some_ it doesn't make much difference how much :-)
BTW,
I just went in an updated and tested the Accumulator class code -- it needed some tweaks, but it's working now.
The cython version is in an unknown state...
some profiling:
In [11]: run profile_accumulator.py
In [12]: timeit accum1(10000)
100 loops, best of 3: 3.91 ms per loop
In [13]: timeit list1(10000)
1000 loops, best of 3: 1.15 ms per loop
These are simply appending 10,000 integers in a loop -- with teh list, the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3 -- I think this is because you are staring with a python integer -- with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything.
Then the conversion from list to array is all in C.
Note that the accumulator version is still more memory efficient...
In [14]: timeit accum2(10000)
100 loops, best of 3: 3.84 ms per loop
this version pre-allocated the whole internal buffer -- not much faster the buffer re-allocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array)
In [24]: timeit list_extend1(100000)
100 loops, best of 3: 4.15 ms per loop
In [25]: timeit accum_extend1(100000)
1000 loops, best of 3: 1.37 ms per loop
This time, the stuff is added in chunks 100 elements at a time -- the chunks being created ahead of time -- a list with range() the first time, and an array with arange() the second. much faster to extend with arrays...
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Dec 30, 2015 at 6:34 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
On 28 Dec 2015, at 19:58, Chris Barker chris.barker@noaa.gov wrote:
python benchmark.py
Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854
are you pre-allocating any extra space? if not -- it's going to be
really, really pokey when adding a little bit at a time.
Yes, I’m preallocating but it might not be optimal at all given your implementation is much faster. I’ll try to adapt your code. Thanks.
sounds good -- I'll try to take a look at yours soon - maybe we can merge the projects. MIne is only operational in one small place, I think.
-CHB
With my Accumulator class:
https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...
I pre-allocate a larger numpy array to start, and it gets re-allocated,
with some extra, when filled, using ndarray.resize()
this is quite fast.
These are settable parameters in the class:
DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16 -- that seems small
to me.
I looked at the code in array.array (and list, I think), and it does
stuff to optimize very small arrays, which I figured wasn't the use-case here :-)
But I did a bunch of experimentation, and as long as you pre-allocate
_some_ it doesn't make much difference how much :-)
BTW,
I just went in an updated and tested the Accumulator class code -- it
needed some tweaks, but it's working now.
The cython version is in an unknown state...
some profiling:
In [11]: run profile_accumulator.py
In [12]: timeit accum1(10000)
100 loops, best of 3: 3.91 ms per loop
In [13]: timeit list1(10000)
1000 loops, best of 3: 1.15 ms per loop
These are simply appending 10,000 integers in a loop -- with teh list,
the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3 -- I think this is because you are staring with a python integer -- with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything.
Then the conversion from list to array is all in C.
Note that the accumulator version is still more memory efficient...
In [14]: timeit accum2(10000)
100 loops, best of 3: 3.84 ms per loop
this version pre-allocated the whole internal buffer -- not much faster
the buffer re-allocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array)
In [24]: timeit list_extend1(100000)
100 loops, best of 3: 4.15 ms per loop
In [25]: timeit accum_extend1(100000)
1000 loops, best of 3: 1.37 ms per loop
This time, the stuff is added in chunks 100 elements at a time -- the
chunks being created ahead of time -- a list with range() the first time, and an array with arange() the second. much faster to extend with arrays...
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mi, 2015-12-23 at 00:34 -0800, Stephan Hoyer wrote:
We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99
Probably is a bit orthogonal since I guess you want/need cython, but pythons buildin array.array should get you there pretty much as well. Of course it requires the C typecode (though that should not be hard to get) and does not support strings.
- Sebastian
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but.... This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean? - can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does. >>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) >>> print(L) [[0], [1 2], [3 4 5], [6 7 8 9]] so this looks like a ragged array -- but what do you get when you do: for row in L: print row >>> print(L.data) [0 1 2 3 4 5 6 7 8 is .data a regular old 1-d numpy array? >>> L = ArrayList( np.arange(10), [3,3,4]) >>> print(L) [[0 1 2], [3 4 5], [6 7 8 9]] >>> print(L.data) [0 1 2 3 4 5 6 7 8 9] does an ArrayList act like a numpy array in other ways: L * 5 L* some_array in which case, how does it do broadcasting??? Thanks, -CHB >>> L = ArrayList(["Hello", "world", "!"]) >>> print(L[0]) 'Hello' >>> L[1] = "brave new world" >>> print(L) ['Hello', 'brave new world', '!'] Nicolas _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Dec 23, 2015 at 4:31 AM, Sebastian Berg sebastian@sipsolutions.net wrote:
Probably is a bit orthogonal since I guess you want/need cython, but pythons builtin array.array should get you there pretty much as well.
I don't think it's orthogonal to cython -- you can access an array.array directly from within cython -- it's actually about the easiest way to get a array-like object in Cython/C (which you can then access via a memoryview, etc).
Though I don't know there is a python object (i.e. pointer) option there. (nor text).
-CHB
Of course it requires the C typecode (though that should not be hard to get) and does not support strings.
- Sebastian
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.
Cheers, Stephan
On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker chris.barker@noaa.gov wrote:
sorry for being so lazy as to not go look at the project pages, but.... This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean? - can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does. >>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) >>> print(L) [[0], [1 2], [3 4 5], [6 7 8 9]] so this looks like a ragged array -- but what do you get when you do: for row in L: print row >>> print(L.data) [0 1 2 3 4 5 6 7 8 is .data a regular old 1-d numpy array? >>> L = ArrayList( np.arange(10), [3,3,4]) >>> print(L) [[0 1 2], [3 4 5], [6 7 8 9]] >>> print(L.data) [0 1 2 3 4 5 6 7 8 9] does an ArrayList act like a numpy array in other ways: L * 5 L* some_array in which case, how does it do broadcasting??? Thanks, -CHB >>> L = ArrayList(["Hello", "world", "!"]) >>> print(L[0]) 'Hello' >>> L[1] = "brave new world" >>> print(L) ['Hello', 'brave new world', '!'] Nicolas _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion