Hi all, I want to be able to within a loop a) apply a mathematical operation to all elements in a vector (can be done atomically) then b) pop zero or more elements from one end of the vector and c) push zero or more elements on to the other end. So far I've used a collections.deque to store my vector as it should be more efficient than a numpy array for the appending and deletion of elements. However, I was wondering whether performance could be improved through the use of a homogeneously-typed double-ended queue i.e. a linked list equivalent of numpy.ndarray. Has anyone previously considered whether it would be worth including such a thing within the numpy package? Cheers, Will
On Tue, Sep 25, 2012 at 10:03 AM, William Furnass <will@thearete.co.uk> wrote:
Hi all,
I want to be able to within a loop a) apply a mathematical operation to all elements in a vector (can be done atomically) then b) pop zero or more elements from one end of the vector and c) push zero or more elements on to the other end. So far I've used a collections.deque to store my vector as it should be more efficient than a numpy array for the appending and deletion of elements. However, I was wondering whether performance could be improved through the use of a homogeneously-typed double-ended queue i.e. a linked list equivalent of numpy.ndarray.
Implementing a ring buffer on top of ndarray would be pretty straightforward and probably work better than a linked-list implementation. -n
On 25.09.2012 11:38, Nathaniel Smith wrote:
Implementing a ring buffer on top of ndarray would be pretty straightforward and probably work better than a linked-list implementation.
Amazingly, many do not know that a ringbuffer is simply an array indexed modulus its length: foo = np.zeros(n) i = 0 while 1: foo[i % n] # access ringbuffer i += 1 Also, instead of writing a linked list, consider collections.deque. A deque is by definition a double-ended queue. It is just waste of time to implement a deque (double-ended queue) and hope it will perform better than Python's standard lib collections.deque object. Sturla
On Tue, Sep 25, 2012 at 12:31 PM, Sturla Molden <sturla@molden.no> wrote:
On 25.09.2012 11:38, Nathaniel Smith wrote:
Implementing a ring buffer on top of ndarray would be pretty straightforward and probably work better than a linked-list implementation.
Amazingly, many do not know that a ringbuffer is simply an array indexed modulus its length:
foo = np.zeros(n) i = 0 while 1: foo[i % n] # access ringbuffer i += 1
Good trick, but to be reliable I think you need to either be willing for i to overflow into a long (arbitrary width) integer, or else make sure that i is an unsigned integer and that n is 2**k where k <= sizeof(i)? Just doing i %= n on each pass through the loop might be less error-prone.
Also, instead of writing a linked list, consider collections.deque. A deque is by definition a double-ended queue. It is just waste of time to implement a deque (double-ended queue) and hope it will perform better than Python's standard lib collections.deque object.
The original poster is using collections.deque now, but wants a version that supports efficient vectorized operations. -n
On Tue, Sep 25, 2012 at 6:50 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Sep 25, 2012 at 12:31 PM, Sturla Molden <sturla@molden.no> wrote:
On 25.09.2012 11:38, Nathaniel Smith wrote:
Implementing a ring buffer on top of ndarray would be pretty straightforward and probably work better than a linked-list implementation.
Amazingly, many do not know that a ringbuffer is simply an array indexed modulus its length:
foo = np.zeros(n) i = 0 while 1: foo[i % n] # access ringbuffer i += 1
Good trick, but to be reliable I think you need to either be willing for i to overflow into a long (arbitrary width) integer, or else make sure that i is an unsigned integer and that n is 2**k where k <= sizeof(i)? Just doing i %= n on each pass through the loop might be less error-prone.
Also, instead of writing a linked list, consider collections.deque. A deque is by definition a double-ended queue. It is just waste of time to implement a deque (double-ended queue) and hope it will perform better than Python's standard lib collections.deque object.
The original poster is using collections.deque now, but wants a version that supports efficient vectorized operations.
The C++ stdlib has an efficient deque object, but it moves through memory. Hmm, it wouldn't be easy to make that work with numpy arrays what with views and all. Efficient circular lists are often implemented using powers of two so that modulo indexing can be done using a mask. Chuck
On Tue, 25 Sep 2012 07:23:54 -0600, Charles R Harris wrote:
On Tue, Sep 25, 2012 at 6:50 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Sep 25, 2012 at 12:31 PM, Sturla Molden <sturla@molden.no> wrote:
On 25.09.2012 11:38, Nathaniel Smith wrote:
Implementing a ring buffer on top of ndarray would be pretty straightforward and probably work better than a linked-list implementation.
Amazingly, many do not know that a ringbuffer is simply an array indexed modulus its length:
foo = np.zeros(n) i = 0 while 1: foo[i % n] # access ringbuffer i += 1
Good trick, but to be reliable I think you need to either be willing for i to overflow into a long (arbitrary width) integer, or else make sure that i is an unsigned integer and that n is 2**k where k <= sizeof(i)? Just doing i %= n on each pass through the loop might be less error-prone.
Also, instead of writing a linked list, consider collections.deque. A deque is by definition a double-ended queue. It is just waste of time to implement a deque (double-ended queue) and hope it will perform better than Python's standard lib collections.deque object.
The original poster is using collections.deque now, but wants a version that supports efficient vectorized operations.
The C++ stdlib has an efficient deque object, but it moves through memory. Hmm, it wouldn't be easy to make that work with numpy arrays what with views and all.
Efficient circular lists are often implemented using powers of two so that modulo indexing can be done using a mask.
What is meant by 'moves through memory'? Are the reasons for avoiding creating ndarray/ufunc-like interfaces to Cython-wrapped STL objects to do with the complexity of coding such interfaces or to efficiency of execution? Given that I'm simply wanting to go from for t in timestep_arr: for i in len(deque1): deque2[i] += some_stuff[t] * deque1[i] / some_value possibly_push_front(deque1, deque2) possibly_pop_back_(deque1, deque2) to for t in timestep_arr: deque2 += some_stuff[t] * deque1 / some_value possibly_push_front(deque1, deque2) possibly_pop_back_(deque1, deque2) would it make sense to use a cython wrapped STL deque class for deque1 and deque2 but somehow also include in __add__, __div__ and __mul__ methods to provide ufunc-like capabilities? Apologies if I'm missing something fairly obvious - I'm fairly new to Cython. Regards, Will Furnass
On Tue, Sep 25, 2012 at 4:31 AM, Sturla Molden <sturla@molden.no> wrote:
Also, instead of writing a linked list, consider collections.deque. A deque is by definition a double-ended queue. It is just waste of time to implement a deque (double-ended queue) and hope it will perform better than Python's standard lib collections.deque object.
not for insertion, deletion, etc, but there _may_ be a benefit to a class that stores the data in a homogenous data data buffer compatible with numpy: - you could use non-standard data types (uint, etc...) - It would be more memory efficient *not having to store all those python objects for each value) - you could round-trip to/from numpy arrays without data copying (or with efficient data copying...) for other operations. Whether it's worth the work would depend on teh use case, of course. Writing such a thing in Cython would be pretty easy though, particularly if you only needed to support a couple types. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (6)
-
Charles R Harris
-
Chris Barker
-
Nathaniel Smith
-
Sturla Molden
-
Will Furnass
-
William Furnass