best way for storing extensible data?
hi all, what is best way for storing data in numpy array? (amount of memory for preallocating is unknown) Currently I use just a Python list, i.e. r = [] for i in xrange(N)#N is very big ... r.append(some_value) Thx, D.
On Fri, May 18, 2007 at 03:10:23PM +0300, dmitrey wrote:
hi all, what is best way for storing data in numpy array? (amount of memory for preallocating is unknown) Currently I use just a Python list, i.e.
r = [] for i in xrange(N)#N is very big ... r.append(some_value)
In the above, you know how big you need b/c you know N ;-) so empty is a good choice: r = empty((N,), dtype=float) for i in xrange(N): r[i] = some_value empty() allocates the array, but doesn't clear it or anything (as opposed to zeros(), which would set the elements to zero). If you don't know N, then fromiter would be best: def ivalues(): while some_condition(): ... yield some_value r = fromiter(ivalues(), dtype=float) It'll act like appending to a list, where it will grow the array (by doubling, I think) when it needs to, so appending each value is amortized to O(1) time. A list though would use more memory per element as each element is a full Python object. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
On 18/05/07, David M. Cooke
It'll act like appending to a list, where it will grow the array (by doubling, I think) when it needs to, so appending each value is amortized to O(1) time. A list though would use more memory per element as each element is a full Python object.
That said, don't be afraid to use a list. The memory penalty is not high (an extra 50% or 100% or so, just what it costs to duplicate an array, and about as much as is wasted in the amortizing) and python's list-handling can be quite efficient and convenient. List comprehensions, in particular, can be a very good way to write array operations that would otherwise be cumbersome. Iterator comprehensions can fill some of the same role. Anne
On 5/18/07, David M. Cooke
On Fri, May 18, 2007 at 03:10:23PM +0300, dmitrey wrote:
hi all, what is best way for storing data in numpy array? (amount of memory for preallocating is unknown) Currently I use just a Python list, i.e.
r = [] for i in xrange(N)#N is very big ... r.append(some_value)
In the above, you know how big you need b/c you know N ;-) so empty is a good choice:
r = empty((N,), dtype=float) for i in xrange(N): r[i] = some_value
empty() allocates the array, but doesn't clear it or anything (as opposed to zeros(), which would set the elements to zero).
If you don't know N, then fromiter would be best:
def ivalues(): while some_condition(): ... yield some_value
r = fromiter(ivalues(), dtype=float)
It'll act like appending to a list, where it will grow the array (by doubling, I think)
It uses 50% overallocation. It also reallocs at the end in an attempt to give back any extra space. I'm not sure is that's actually effective though. when it needs to, so appending each value is
amortized to O(1) time. A list though would use more memory per element as each element is a full Python object.
Note that if you are pulling from an iterator, even if you know the length, fromiter may well be better since you can specify a length with the optional third argument. r = fromiter(some_value_iter, float, N) -tim -- //=][=\\ tim.hochberg@ieee.org
participants (4)
-
Anne Archibald
-
David M. Cooke
-
dmitrey
-
Timothy Hochberg