Mailman 3 Dynamic array list implementation - NumPy-Discussion

newer
what would you expect A[none] to...

Dynamic array list implementation

Nicolas P. Rougier

22 Dec 2015 22 Dec '15

4:47 a.m.

I've coded a typed dynamic list based on numpy array (needed for the glumpy project). Code is available from https://github.com/rougier/numpy-list A Numpy array list is a strongly typed list whose type can be anything that can be interpreted as a numpy data type.

...

...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L) [[0], [1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9] You can add several items at once by specifying common or individual size: a single scalar means all items are the same size while a list of sizes is used to specify individual item sizes.

...

...
...
L = ArrayList( np.arange(10), [3,3,4]) print(L) [[0 1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9] You can also us typed list for storing strings with different sizes:

...

...
...
L = ArrayList(["Hello", "world", "!"]) print(L[0]) 'Hello' L[1] = "brave new world" print(L) ['Hello', 'brave new world', '!']

Nicolas

Attachments:

attachment.htm (text/html — 3.6 KB)

Show replies by date

Chris Barker

22 Dec 22 Dec

12:19 p.m.

sorry for being so lazy as to not go look at the project pages, but.... This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean? - can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.

...

...
...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L) [[0], [1 2], [3 4 5], [6 7 8 9]]

so this looks like a ragged array -- but what do you get when you do:

for row in L: print row

...

...
...
...
print(L.data) [0 1 2 3 4 5 6 7 8

is .data a regular old 1-d numpy array?

...

...
...
L = ArrayList( np.arange(10), [3,3,4])

...
print(L) [[0 1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9]

does an ArrayList act like a numpy array in other ways:

L * 5 L* some_array in which case, how does it do broadcasting??? Thanks, -CHB

...

...
...
L = ArrayList(["Hello", "world", "!"])

...
print(L[0]) 'Hello' L[1] = "brave new world" print(L) ['Hello', 'brave new world', '!']

Nicolas

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nicolas P. Rougier

1:21 p.m.

Yes, you can append/insert/remove items. It works pretty much like a python list in fact (but with a single data type for all elements). Nicolas

...

On 22 Dec 2015, at 20:19, Chris Barker wrote:

sorry for being so lazy as to not go look at the project pages, but....

This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?

- can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.

...
...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L) [[0], [1 2], [3 4 5], [6 7 8 9]]

so this looks like a ragged array -- but what do you get when you do:

for row in L: print row

...
...
...
print(L.data) [0 1 2 3 4 5 6 7 8

is .data a regular old 1-d numpy array?

...
...
...
L = ArrayList( np.arange(10), [3,3,4]) print(L) [[0 1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9]

does an ArrayList act like a numpy array in other ways:

L * 5

L* some_array

in which case, how does it do broadcasting???

Thanks,

-CHB

...
...
...
L = ArrayList(["Hello", "world", "!"]) print(L[0]) 'Hello' L[1] = "brave new world" print(L) ['Hello', 'brave new world', '!']

Nicolas

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Stephan Hoyer

23 Dec 23 Dec

1:34 a.m.

We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99 In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful. Cheers, Stephan On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker wrote:

...

sorry for being so lazy as to not go look at the project pages, but.... This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean? - can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.

...
...
...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L) [[0], [1 2], [3 4 5], [6 7 8 9]]

so this looks like a ragged array -- but what do you get when you do:

for row in L: print row

...
...
...
...
print(L.data) [0 1 2 3 4 5 6 7 8

is .data a regular old 1-d numpy array?

...
...
L = ArrayList( np.arange(10), [3,3,4])

...
print(L) [[0 1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9]

does an ArrayList act like a numpy array in other ways: L * 5 L* some_array in which case, how does it do broadcasting??? Thanks, -CHB

...
...
L = ArrayList(["Hello", "world", "!"])

...
print(L[0]) 'Hello' L[1] = "brave new world" print(L) ['Hello', 'brave new world', '!']

Nicolas

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nicolas P. Rougier

5:01 a.m.

Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small). In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:

...

...
...
python benchmark.py Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002

I realize I did not answer all Chris' questions:

...

...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item) [0] [1 2] [3 4 5] [6 7 8 9]

...

...
...
print (type(L.data)) print(L.data.dtype) int64 print(L.data.shape) (10,)

I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.

...

...
...
L._data *= 2 print(L) [[0], [4 8], [12 16 20], [24 28 32 36]]

...

On 23 Dec 2015, at 09:34, Stephan Hoyer wrote:

We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99

In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.

Cheers, Stephan

On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker wrote:

sorry for being so lazy as to not go look at the project pages, but....

This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?

- can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.

...
...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L) [[0], [1 2], [3 4 5], [6 7 8 9]]

so this looks like a ragged array -- but what do you get when you do:

for row in L: print row

...
...
...
print(L.data) [0 1 2 3 4 5 6 7 8

is .data a regular old 1-d numpy array?

...
...
...
L = ArrayList( np.arange(10), [3,3,4]) print(L) [[0 1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9]

does an ArrayList act like a numpy array in other ways:

L * 5

L* some_array

in which case, how does it do broadcasting???

Thanks,

-CHB

...
...
...
L = ArrayList(["Hello", "world", "!"]) print(L[0]) 'Hello' L[1] = "brave new world" print(L) ['Hello', 'brave new world', '!']

Nicolas

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Chris Barker

24 Dec 24 Dec

11:19 a.m.

On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:

...

Typed list in numpy would be a nice addition indeed and your cython implementation is nice (and small).

It seems we have a lot of duplicated effort here. Pernonally, I have two needs: 1) ragged arrays 2) "growable" arrays. I have semi-complete version of both of these, which are completely independent -- not sure if it makes sense to combine them, I suppose not. But we've talked a bit about "typed list", and I'm not sure what that means -- is it something that is entirely like a python list, except that all the elements have the same type? Anyway: I've been thinking about this fromt eh opposite direction: I want a numpy array that you can append/extend. This comes from the fact that it's not uncommon to need to build up an array where you don't know how large it will be when you start. The common recommendation for doing that now is to built it up in a list, and then, when you are done, turn it into an ndarray. But that means you are limited to python types (or putting numpy scalars in a list...), and it's not very memory efficient. My version used a ndarray internally, and over allocates it a bit, using ndarray.resize() to resize. this means that you can get the data pointer if you want for Cython, etc... but also that it's getting re-allocated, so that pointer is fragile, and you don't want other arrays to have views on it. Interestingly, if you are adding one float, for example, at a time to the array, it's actually a bit faster to build it up in a list, and then make an array out of it. But it is more memory efficient and faster if you are using numpy dtypes and especially if you are extend()ing it with chunks from other arrays. I also have a not-quite finished version in Cython that statically handles the core C data types -- that should be faster, but I haven't really profiled it. I'll try to get the code up on gitHub. It would be nice to combine efforts. -CHB

...

In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:

...
...
...
python benchmark.py Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002

I realize I did not answer all Chris' questions:

...
...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item) [0] [1 2] [3 4 5] [6 7 8 9]

...
...
...
print (type(L.data)) print(L.data.dtype) int64 print(L.data.shape) (10,)

I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.

...
...
...
L._data *= 2 print(L) [[0], [4 8], [12 16 20], [24 28 32 36]]

...
On 23 Dec 2015, at 09:34, Stephan Hoyer wrote:

We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99

In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.

Cheers, Stephan

On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker wrote:

sorry for being so lazy as to not go look at the project pages, but....

This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?

- can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.

...
...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L) [[0], [1 2], [3 4 5], [6 7 8 9]]

so this looks like a ragged array -- but what do you get when you do:

for row in L: print row

...
...
...
print(L.data) [0 1 2 3 4 5 6 7 8

is .data a regular old 1-d numpy array?

...
...
...
L = ArrayList( np.arange(10), [3,3,4]) print(L) [[0 1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9]

does an ArrayList act like a numpy array in other ways:

L * 5

L* some_array

in which case, how does it do broadcasting???

Thanks,

-CHB

...
...
...
L = ArrayList(["Hello", "world", "!"]) print(L[0]) 'Hello' L[1] = "brave new world" print(L) ['Hello', 'brave new world', '!']

Nicolas

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Chris Barker

11:23 a.m.

On Thu, Dec 24, 2015 at 10:19 AM, Chris Barker wrote:

...

I'll try to get the code up on gitHub.

Hey look -- it's already there: https://github.com/PythonCHB/NumpyExtras too many gitHub accounts..... Here is the list/growable array/ accumulator: https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato... And here is the ragged array: https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/ragged_arr... I haven't touched either of these for a while -- not really sure what state they are in. -CHB

...

It would be nice to combine efforts.

-CHB

...
In my case I need to ensure a contiguous storage to allow easy upload onto the GPU. But my implementation is quite slow, especially when you add one item at a time:

...
...
...
python benchmark.py Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854 Array list, append 100000 items at once: 0.05801 Python list, prepend 100000 items: 1.96168 Array list, prepend 100000 items: 12.83371 Array list, append 100000 items at once: 0.06002

I realize I did not answer all Chris' questions:

...
...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) for item in L: print(item) [0] [1 2] [3 4 5] [6 7 8 9]

...
...
...
print (type(L.data)) print(L.data.dtype) int64 print(L.data.shape) (10,)

I did not implement operations yet, but it would be a matter for transferring call to the underlying numpy data array.

...
...
...
L._data *= 2 print(L) [[0], [4 8], [12 16 20], [24 28 32 36]]

...
On 23 Dec 2015, at 09:34, Stephan Hoyer wrote:

We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99

In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.

Cheers, Stephan

On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker wrote:

sorry for being so lazy as to not go look at the project pages, but....

This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?

- can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.

...
...
...
L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) print(L) [[0], [1 2], [3 4 5], [6 7 8 9]]

so this looks like a ragged array -- but what do you get when you do:

for row in L: print row

...
...
...
print(L.data) [0 1 2 3 4 5 6 7 8

is .data a regular old 1-d numpy array?

...
...
...
L = ArrayList( np.arange(10), [3,3,4]) print(L) [[0 1 2], [3 4 5], [6 7 8 9]] print(L.data) [0 1 2 3 4 5 6 7 8 9]

does an ArrayList act like a numpy array in other ways:

L * 5

L* some_array

in which case, how does it do broadcasting???

Thanks,

-CHB

...
...
...
L = ArrayList(["Hello", "world", "!"]) print(L[0]) 'Hello' L[1] = "brave new world" print(L) ['Hello', 'brave new world', '!']

Nicolas

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Chris Barker

28 Dec 28 Dec

11:58 a.m.

On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:

...

But my implementation is quite slow, especially when you add one item at a time:

...
...
...
python benchmark.py Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854

are you pre-allocating any extra space? if not -- it's going to be really, really pokey when adding a little bit at a time. With my Accumulator class: https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato... I pre-allocate a larger numpy array to start, and it gets re-allocated, with some extra, when filled, using ndarray.resize() this is quite fast. These are settable parameters in the class: DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16 -- that seems small to me. I looked at the code in array.array (and list, I think), and it does stuff to optimize very small arrays, which I figured wasn't the use-case here :-) But I did a bunch of experimentation, and as long as you pre-allocate _some_ it doesn't make much difference how much :-) BTW, I just went in an updated and tested the Accumulator class code -- it needed some tweaks, but it's working now. The cython version is in an unknown state... some profiling: In [11]: run profile_accumulator.py In [12]: timeit accum1(10000) 100 loops, best of 3: 3.91 ms per loop In [13]: timeit list1(10000) 1000 loops, best of 3: 1.15 ms per loop These are simply appending 10,000 integers in a loop -- with teh list, the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3 -- I think this is because you are staring with a python integer -- with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything. Then the conversion from list to array is all in C. Note that the accumulator version is still more memory efficient... In [14]: timeit accum2(10000) 100 loops, best of 3: 3.84 ms per loop this version pre-allocated the whole internal buffer -- not much faster the buffer re-allocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array) In [24]: timeit list_extend1(100000) 100 loops, best of 3: 4.15 ms per loop In [25]: timeit accum_extend1(100000) 1000 loops, best of 3: 1.37 ms per loop This time, the stuff is added in chunks 100 elements at a time -- the chunks being created ahead of time -- a list with range() the first time, and an array with arange() the second. much faster to extend with arrays... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nicolas P. Rougier

30 Dec 30 Dec

7:34 a.m.

...

On 28 Dec 2015, at 19:58, Chris Barker wrote:

On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier wrote: But my implementation is quite slow, especially when you add one item at a time:

...
...
...
python benchmark.py Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854

are you pre-allocating any extra space? if not -- it's going to be really, really pokey when adding a little bit at a time.

Yes, I’m preallocating but it might not be optimal at all given your implementation is much faster. I’ll try to adapt your code. Thanks.

...

With my Accumulator class:

https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...

I pre-allocate a larger numpy array to start, and it gets re-allocated, with some extra, when filled, using ndarray.resize()

this is quite fast.

These are settable parameters in the class:

DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16 -- that seems small to me.

I looked at the code in array.array (and list, I think), and it does stuff to optimize very small arrays, which I figured wasn't the use-case here :-)

But I did a bunch of experimentation, and as long as you pre-allocate _some_ it doesn't make much difference how much :-)

BTW,

I just went in an updated and tested the Accumulator class code -- it needed some tweaks, but it's working now.

The cython version is in an unknown state...

some profiling:

In [11]: run profile_accumulator.py

In [12]: timeit accum1(10000)

100 loops, best of 3: 3.91 ms per loop

In [13]: timeit list1(10000)

1000 loops, best of 3: 1.15 ms per loop

These are simply appending 10,000 integers in a loop -- with teh list, the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3 -- I think this is because you are staring with a python integer -- with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything.

Then the conversion from list to array is all in C.

Note that the accumulator version is still more memory efficient...

In [14]: timeit accum2(10000)

100 loops, best of 3: 3.84 ms per loop

this version pre-allocated the whole internal buffer -- not much faster the buffer re-allocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array)

In [24]: timeit list_extend1(100000)

100 loops, best of 3: 4.15 ms per loop

In [25]: timeit accum_extend1(100000)

1000 loops, best of 3: 1.37 ms per loop

This time, the stuff is added in chunks 100 elements at a time -- the chunks being created ahead of time -- a list with range() the first time, and an array with arange() the second. much faster to extend with arrays...

-CHB

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Chris Barker

31 Dec 31 Dec

11:08 a.m.

On Wed, Dec 30, 2015 at 6:34 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:

...

...
On 28 Dec 2015, at 19:58, Chris Barker wrote:

...
...
...
python benchmark.py Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854

are you pre-allocating any extra space? if not -- it's going to be really, really pokey when adding a little bit at a time.

Yes, I’m preallocating but it might not be optimal at all given your implementation is much faster. I’ll try to adapt your code. Thanks.

sounds good -- I'll try to take a look at yours soon - maybe we can merge the projects. MIne is only operational in one small place, I think. -CHB

...

...
With my Accumulator class:

https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato...

...
I pre-allocate a larger numpy array to start, and it gets re-allocated,

with some extra, when filled, using ndarray.resize()

...
this is quite fast.

These are settable parameters in the class:

DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16 -- that seems small

to me.

...
I looked at the code in array.array (and list, I think), and it does

stuff to optimize very small arrays, which I figured wasn't the use-case here :-)

...
But I did a bunch of experimentation, and as long as you pre-allocate

_some_ it doesn't make much difference how much :-)

...
BTW,

I just went in an updated and tested the Accumulator class code -- it

needed some tweaks, but it's working now.

...
The cython version is in an unknown state...

some profiling:

In [11]: run profile_accumulator.py

In [12]: timeit accum1(10000)

100 loops, best of 3: 3.91 ms per loop

In [13]: timeit list1(10000)

1000 loops, best of 3: 1.15 ms per loop

These are simply appending 10,000 integers in a loop -- with teh list,

the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3 -- I think this is because you are staring with a python integer -- with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything.

...
Then the conversion from list to array is all in C.

Note that the accumulator version is still more memory efficient...

In [14]: timeit accum2(10000)

100 loops, best of 3: 3.84 ms per loop

this version pre-allocated the whole internal buffer -- not much faster

the buffer re-allocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array)

...
In [24]: timeit list_extend1(100000)

100 loops, best of 3: 4.15 ms per loop

In [25]: timeit accum_extend1(100000)

1000 loops, best of 3: 1.37 ms per loop

This time, the stuff is added in chunks 100 elements at a time -- the

chunks being created ahead of time -- a list with range() the first time, and an array with arange() the second. much faster to extend with arrays...

...
-CHB

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

23 Dec 23 Dec

5:31 a.m.

On Mi, 2015-12-23 at 00:34 -0800, Stephan Hoyer wrote:

...

We have a type similar to this (a typed list) internally in pandas, although it is restricted to a single dimension and far from feature complete -- it only has .append and a .to_array() method for converting to a 1d numpy array. Our version is written in Cython, and we use it for performance reasons when we would otherwise need to create a list of unknown length: https://github.com/pydata/pandas/blob/v0.17.1/pandas/hashtable.pyx#L99

Probably is a bit orthogonal since I guess you want/need cython, but pythons buildin array.array should get you there pretty much as well. Of course it requires the C typecode (though that should not be hard to get) and does not support strings. - Sebastian

...

In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.

Cheers, Stephan

On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker wrote:

sorry for being so lazy as to not go look at the project pages, but....

This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?

- can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.

>>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) >>> print(L) [[0], [1 2], [3 4 5], [6 7 8 9]] so this looks like a ragged array -- but what do you get when you do:

for row in L: print row

>>> print(L.data) [0 1 2 3 4 5 6 7 8 is .data a regular old 1-d numpy array?

>>> L = ArrayList( np.arange(10), [3,3,4]) >>> print(L) [[0 1 2], [3 4 5], [6 7 8 9]] >>> print(L.data) [0 1 2 3 4 5 6 7 8 9]

does an ArrayList act like a numpy array in other ways:

L * 5

L* some_array

in which case, how does it do broadcasting???

Thanks,

-CHB

>>> L = ArrayList(["Hello", "world", "!"]) >>> print(L[0]) 'Hello' >>> L[1] = "brave new world" >>> print(L) ['Hello', 'brave new world', '!']

Nicolas

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Chris Barker

24 Dec 24 Dec

11:08 a.m.

On Wed, Dec 23, 2015 at 4:31 AM, Sebastian Berg wrote:

...

Probably is a bit orthogonal since I guess you want/need cython, but pythons builtin array.array should get you there pretty much as well.

I don't think it's orthogonal to cython -- you can access an array.array directly from within cython -- it's actually about the easiest way to get a array-like object in Cython/C (which you can then access via a memoryview, etc). Though I don't know there is a python object (i.e. pointer) option there. (nor text). -CHB

...

Of course it requires the C typecode (though that should not be hard to get) and does not support strings.

- Sebastian

...
In my experience, it's several times faster than using a builtin list from Cython, which makes sense given that it needs to copy about 1/3 the data (no type or reference count for individual elements). Obviously, it uses 1/3 the space to store the data, too. We currently don't expose this object externally, but it could be an interesting project to adapt this code into a standalone project that could be more broadly useful.

Cheers, Stephan

On Tue, Dec 22, 2015 at 8:20 PM, Chris Barker wrote:

sorry for being so lazy as to not go look at the project pages, but....

This sounds like it could be really useful, and maybe supercise a coupl eof half-baked projects of mine. But -- what does "dynamic" mean?

- can you append to these arrays? - can it support "ragged arrrays" -- it looks like it does.

>>> L = ArrayList( [[0], [1,2], [3,4,5], [6,7,8,9]] ) >>> print(L) [[0], [1 2], [3 4 5], [6 7 8 9]] so this looks like a ragged array -- but what do you get when you do:

for row in L: print row

>>> print(L.data) [0 1 2 3 4 5 6 7 8 is .data a regular old 1-d numpy array?

>>> L = ArrayList( np.arange(10), [3,3,4]) >>> print(L) [[0 1 2], [3 4 5], [6 7 8 9]] >>> print(L.data) [0 1 2 3 4 5 6 7 8 9]

does an ArrayList act like a numpy array in other ways:

L * 5

L* some_array

in which case, how does it do broadcasting???

Thanks,

-CHB

>>> L = ArrayList(["Hello", "world", "!"]) >>> print(L[0]) 'Hello' >>> L[1] = "brave new world" >>> print(L) ['Hello', 'brave new world', '!']

Nicolas

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

3038

Age (days ago)

3047

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Chris Barker
Nicolas P. Rougier
Sebastian Berg
Stephan Hoyer

Dynamic array list implementation

tags

participants (4)