[Numpy-discussion] Question about np.savez

David Warde-Farley dwf at cs.toronto.edu
Tue Sep 1 23:39:29 EDT 2009


On 1-Sep-09, at 10:11 PM, Jorge Scandaliaris wrote:

> David Warde-Farley <dwf <at> cs.toronto.edu> writes:
>> If you actually want to save multiple arrays, you can use
>> savez('fname', *[a,b,c]) and they will be accessible under the names
>> arr_0, arr_1, etc. and a list of these names is in the 'files'
>> attribute on the NpzFile object. To retrieve your list of arrays when
>> you load, you can just do
>>
>> mynewlist = [data[arrname] for arrname in data.files]
>>
>
> Thanks for the tip. I have realized, though, that I might need some  
> more
> flexibility than just the ability to save ndarrays. The data I am  
> dealing with
> is best kept in a hierarchical way (I could represent the structure  
> with
> ndarrays also, but I think it would be messy and difficult). I am  
> having a look
> at h5py to see if it fulfill my needs. I know there is pytables,  
> too, but from
> having a quick look it seems h5py is simpler. Am I right on this?.

I wouldn't say one is 'simpler' or 'more complicated'; they're  
different in approach. From the h5py FAQ:
	The two projects have different design goals. PyTables presents a  
database-like approach to data storage, providing features like  
indexing and fast "in-kernel" queries on dataset contents. It also has  
a custom system to represent data types.

	In contrast, h5py is an attempt to map the HDF5 feature set to NumPy  
as closely as possible. For example, the high-level type system uses  
NumPy dtype objects exclusively, and method and attribute naming  
follows Python and NumPy conventions for dictionary and array access  
(i.e. ".dtype" and ".shape" attributes for datasets, obj[name]  
indexing syntax for groups, etc).

So, if you have huge amounts of data and you want to do complicated  
queries on discontiguous subsets of it, PyTables is the clear winner.   
The types systems are quite similar but there is some extra work  
involved with PyTables. h5py, on the other hand, provides a nearly  
complete wrapping of the HDF5 C API, in addition to the NumPy  
integration.

The truth is, both of them/either of them integrate nicely with NumPy.  
They have overlapping featuresets, just different design philosophies.

David





More information about the NumPy-Discussion mailing list