[Numpy-discussion] checksum on numpy float array

Brennan Williams brennan.williams at visualreservoir.com
Sat Dec 6 20:15:25 EST 2008


OK so maybe I should....

(1) not add some sort of checksum type functionality to my read/write 
methods

      these read/write methods simply read/write numpy arrays to a 
binary file which contains one or more numpy arrays (and nothing else).

(2) replace my binary files iwith either HDF5 or PyTables

But....

my app is being used by clients on existing projects - in one case there 
are over 900 of these numpy binary files in just one project, albeit 
each file is pretty small (200KB or so)

so.. questions.....

How can I tranparently (or at least with minimum user-pain) replace my 
existing read/write methods with PyTables or HDF5?

My initial thoughts are...

(a) have an app version number and a data format version number which i 
can check against.

(b) if data format version < 1.0  then read  from old  binary files

(c) if app version number > 1.0 then write to new PyTables or HDF5 files

(d) get clients to open existing project and then save existing project 
to semi-transparently convert from old to new formats.



Francesc Alted wrote:
> A Friday 05 December 2008, Andrew Collette escrigué:
>   
>>> Another possibility would be to use HDF5 as a data container.  It
>>> supports the fletcher32 filter [1] which basically computes a
>>> chuksum for evey data chunk written to disk and then always check
>>> that the data read satifies the checksum kept on-disk.  So, if the
>>> HDF5 layer doesn't complain, you are basically safe.
>>>
>>> There are at least two usable HDF5 interfaces for Python and NumPy:
>>> PyTables[2] and h5py [3].  PyTables does have support for that
>>> right out-of-the-box.  Not sure about h5py though (a quick search
>>> in docs doesn't reveal nothing).
>>>
>>> [1] http://rfc.sunsite.dk/rfc/rfc1071.html
>>> [2] http://www.pytables.org
>>> [3] http://h5py.alfven.org
>>>
>>> Hope it helps,
>>>       
>> Just to confirm that h5py does in fact have fletcher32; it's one of
>> the options you can specify when creating a dataset, although it
>> could use better documentation:
>>
>> http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create
>> _dataset
>>     
>
> My bad.  I've searched for 'fletcher' instead of 'fletcher32'.  I 
> naively thought that the search tool in Sphinx allowed for partial name 
> finding.  In fact, it is a pity it does not.
>
> Cheers,
>
>   




More information about the NumPy-Discussion mailing list