determining available space for Float32, for instance

David Socha socha at cs.washington.edu
Thu May 25 06:42:29 CEST 2006


Robert Kern wrote: 
> David Socha wrote:
> > I am looking for a way to determine the maxium array size I can 
> > allocate for arrays of Float32 values (or Int32, or Int8, 
> ...) at an 
> > arbitrary point in the program's execution.  This is needed because 
> > Python cannot allocate enough memory for all of the data we need to 
> > process, so we need to "chunk" the processing, as described below.
> > 
> > Python's memory management process makes this more 
> complicated, since 
> > once memory is allocated for Float32, it cannot be used for 
> any other 
> > data type, such as Int32.
> 
> Just for clarification, you're talking about Numeric arrays 
> here (judging from the names, you still haven't upgraded to 
> numpy), not general Python. Python itself has no notion of 
> Float32 or Int32 or allocating chunks of memory for those two 
> datatypes.

Yes, I am talking about numarray arrays, not general Python.
 
> > I'd like a solution that includes either memory that is not yet 
> > allocated, or memory that used to be allocated for that 
> type, but is 
> > no longer used.
> > 
> > We do not want a solution that requires recompiling Python, 
> since we 
> > cannot expect our end users to do that.
> 
> OTOH, *you* could recompile Python and distribute your Python 
> with your application. We do that at Enthought although for 
> different reasons. However, I don't think it will come to that.

We could, but that seems like it simply creates a secondar problems,
since then the user would have to choose between installing our version
of Python or Enthought's version, for instance. 

> > Does anyone know how to do this?
> 
> With numpy, it's easy enough to change the datatype of an 
> array on-the-fly as long as the sizes match up.
> 
> In [8]: from numpy import *
> 
> In [9]: a = ones(10, dtype=float32)
> 
> In [10]: a
> Out[10]: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  
> 1.], dtype=float32)
> 
> In [11]: a.dtype = int32
> 
> In [12]: a
> Out[12]:
> array([1065353216, 1065353216, 1065353216, 1065353216, 1065353216,
>        1065353216, 1065353216, 1065353216, 1065353216, 
> 1065353216], dtype=int32)
> 
> However, keeping track of the sizes of your arrays and the 
> size of your datatypes may be a bit much to ask.

Exactly.  Building a duplicate mechanism for tracking this informaiton
would be a sad solution.  Surely Python has access to the amount of
memory being used by the different data types.  How can I get to that
information?
 
> [snip]
> numpy (definitely not Numeric) does have a feature called 
> record arrays which will allow you to deal with your agents 
> much more conveniently:
> 
>   http://www.scipy.org/RecordArrays
> 
> Also, you will certainly want to look at using PyTables to 
> store and access your data. With PyTables you can leave all 
> of your data on disk and access arbitrary parts of it in a 
> relatively clean fashion without doing the fiddly work of 
> swapping chunks of memory from disk and back again:
> 
>   http://www.pytables.org/moin

Do RecordArrays and PyTables work well together?  

Thanks for the info.  The PyTables looks quite promising for our
application (I had been looking for an HDF5 interface, but couldn't
recall the 'HDF5' name).

David Socha
Center for Urban Simulation and Policy Analysis
University of Washington
206 616-4495





More information about the Python-list mailing list